Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loaded (and evictable) dictionaries #18

Merged
merged 4 commits into from
Jan 3, 2025

Conversation

formigarafa
Copy link
Owner

@formigarafa formigarafa commented Nov 5, 2024

With these changes dictionary files are no longer loaded with the gem. It will wait until it is used. (solves #15)
The code as is won't persist the dictionary in memory either allowing them to be evicted from memory if garbage collected (no references).
You can also keep an instance of Zxcvbn::Tester loaded to prevent reloading the dictionaries on every call. (addresses some concerns of #9)
This allow a drastic reduction on the number of retained objects in memory.

This approach also eases the path to implement configurable dictionaries without sacrificing any other features.

@formigarafa formigarafa changed the title Unloadable dictionaries Lazy loaded (and evictable) dictionaries Nov 5, 2024
Comment on lines 26 to 32
def self.zxcvbn(password, user_inputs = [])
start = (Time.now.to_f * 1000).to_i
matches = Matching.omnimatch(password, user_inputs)
result = Scoring.most_guessable_match_sequence(password, matches)
result["calc_time"] = (Time.now.to_f * 1000).to_i - start
attack_times = TimeEstimates.estimate_attack_times(result["guesses"])
attack_times.each do |prop, val|
result[prop] = val
end
result["feedback"] = Feedback.get_feedback(result["score"], result["sequence"])
result
Tester.new.zxcvbn(password, user_inputs)
end

def self.test(password, user_inputs = [])
Result.new(Zxcvbn.zxcvbn(password, user_inputs))
Tester.new.test(password, user_inputs)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making both of these use a shared, memoized Tester instance? That'd still lazy load the first time, but avoid reloading all the time.

To unload / reload on every use, an app could just instantiate a new Tester each time.

@zarqman
Copy link
Contributor

zarqman commented Nov 5, 2024

@formigarafa, I can see how this allows for lazy loading and unloading, and prepares the path for custom dictionaries. Pretty good to enable all of that through a single PR!

For our use case here, I have some hesitation about unloading / reloading the data every time. Those are two-fold:
a) added delays to signups or password changes, which are already slow due to bcrypt
b) with multiple threads (or even fibers), I believe this would end up with multiple copies of the data in memory at once (and could make it easier to DoS the server with multiple simultaneous signup attempts)

Do we have a measurement of how long a single load of the dictionaries takes?

I might suggest making the default API memoize the Tester. Then it'd lazy load just once and stay in memory. That's more backwards compatible with the existing design and seems like a better default to me.

I could also see the potential benefit to an API to force eager loading. A sketch of both memoizing by default, and an eager loading option (called simply with Zxcvbn.default_tester(eager_load: true)).

module Zxcvbn
  def self.zxcvbn(...)
    default_tester.zxcvbn(...)
  end

  def self.test(...)
    default_tester.test(...)
  end

  def self.default_tester(eager_load: false)
    @default_tester ||= Tester.new(eager_load:)
  end

  class Tester
    def initialize(eager_load: false)
      matching if eager_load
    end
  end
end

@formigarafa
Copy link
Owner Author

I have just implemented what has been pending since the beginning.
You can initialize an instance of Zxcvbn::Tester and hold it if you want to keep it loaded or use it straight from the top level module if you prefer (re)loading/unloading.

I still need to update the README to include this as is present in the other ports.
You can see the same happening with zxcvb-ruby and zxcvbn-js.

@formigarafa
Copy link
Owner Author

Hello @zarqman,

I had a read on your concerns and I decided to not move forward without giving them a good consideration. But then the end of the last year had been a bit rough (it kind of still is) and despite my previous answer I've been thinking about your points and I think they are important.

I feel that Item b (multiple instances) is of a greater concern than item a (delay added).
I say that because the testing strength process is inherently slow and other mitigation strategies that would probably be in place would help with this, too. (like: do all other validation first, rate limit on sign-up, rate limit password change per user)

In terms of multiple instances, I think this is a more concerning scenario but memoizing should have that sorted, and likely solve the other item, too.

I've been chasing after improving performance, memory usage and garbage collectability for a while. IMHO I don't think it was sensible to keep that much password test related data in memory while there is an app trying to run a business (unless of course you are on a business of password checking, of course).

I think I put myself in this position because I ported the javascript code which came with their own assumptions about memory and process management from a webpage in the browser. Back then I've decided to solve these problems later when I had everything else under control and I could match the other ports expectations. If I had the capability to implement that from the start, as I wanted, I would not even need to be thinking about how to accomodate these features, now.

My problem is that from the point of view of a lib developer I feel that makes more sense to give the power to the developer to handle the lib, more than creating a tool to handle the lib which would increase the api surface of the lib and create more possibilities to conflict and require maintenance.

From what I could infer about your perspective, I think you could create a wrapper class to initialize Zxcvbn and keep it in memory according to your requirements.

module EagerZxcvbn
  def self.tester
    @tester ||= Zxcvbn::Tester.new
  end
  
  def self.test(*)
    tester.test(*)
  end
end

# to eager load
EagerZxcvbn.tester

@formigarafa formigarafa merged commit c98ba55 into master Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants