Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize both allocated and retained memory usage #17

Merged
merged 3 commits into from
Nov 4, 2024

Conversation

zarqman
Copy link
Contributor

@zarqman zarqman commented Nov 2, 2024

Inspired by #12, #13, and #15, I decided to play with this a bit and see if memory optimization could be taken even farther. Turns out, it seems so.

Experiment 1

Freeze the strings read from frequency_lists/*.txt (great idea to make this external, BTW). See frequency_lists.rb:16.

This reduced initial memory allocations a good bit.

Experiment 2

Avoid allocating an Array for keys when computing RANKED_DICTIONARIES_MAX_WORD_SIZE.

In matching.rb:34, this change would have been to replace

word_scores.keys.max_by(&:size).size

with

max_word = 0
word_scores.each_key { |k| max_word = k.size if k.size > max_word }
max_word

This reduced initial allocations a little bit more. However, in combo with experiment 3, it doesn't seem necessary.

Experiment 3

Change storage of ranked dictionaries from a Hash to a dual Array + Set, using a new RankedDict class as a shim of sorts. The Array is used as both the original list of keys and to determine the rank aka index. The Set is used for fast lookups.

Using Array alone takes a tiny bit less memory, but is 10x slower for lookups.

By adding a new RankedDict class, almost nothing else had to change. It defines just enough methods to provide compatibility with the prior existing usage of Hash. I know one goal of the project is to minimize changes with the JS implementation and this seemed to be the least invasive way to swap out the data structures.

This reduces initial allocations even further over Experiment 1 alone and also reduces retained memory a fair bit.

Results

This PR includes experiments 1 and 3. With experiment 3, experiment 2 didn't seem to be needed since it's now operating on the array from RankedDict anyway.

Overall performance seems to be about the same.

The memory savings are notable, at least on MRI Ruby 3.3. Not sure how it might compare to other Rubies.

Initial allocations: reduced by 57%
Retained memory: reduced by 33%

Memory measurements generated with derailed bundle:objects on a local project that uses zxcvbn-rb.

Before

allocated memory by gem
  12030398  zxcvbn-rb/lib
allocated memory by file
   7940701  ~/zxcvbn-rb/lib/zxcvbn/matching.rb
   3838642  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
retained memory by gem
   7123076  zxcvbn-rb/lib
retained memory by file
   7081372  ~/zxcvbn-rb/lib/zxcvbn/matching.rb

After

allocated memory by gem
   5166998  zxcvbn-rb/lib
allocated memory by file
   3842002  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
   1073781  ~/zxcvbn-rb/lib/zxcvbn/matching.rb
retained memory by gem
   4776228  zxcvbn-rb/lib
retained memory by file
   3771576  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
    964564  ~/zxcvbn-rb/lib/zxcvbn/matching.rb

@formigarafa
Copy link
Owner

That's cool.
I have repeated your experiments here and I've found that they all really help but the Experiment 3 hold a catch: Set is implemented using Hash and it is another gem.
I think that because you ran the benchmarks on a project using zxcvbn it may have hid some of the bytes under set.
I have ran the benchmark over your PR isolated and I got this:

Measuring objects created by gems in groups [:default, "production"]
Total allocated: 8623062 bytes (100433 objects)
Total retained:  8137480 bytes (94662 objects)

allocated memory by gem
-----------------------------------
   5162314  zxcvbn-rb/lib
   3459976  lib
       652  bundler-2.4.5
        80  other
        40  derailed_benchmarks-2.2.1

allocated memory by file
-----------------------------------
   3838802  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
   3459976  ~/.asdf/installs/ruby/3.3.4/lib/ruby/3.3.0/set.rb
   1070464  ~/zxcvbn-rb/lib/zxcvbn/matching.rb
     97876  ~/zxcvbn-rb/lib/zxcvbn/adjacency_graphs.rb
     80328  ~/zxcvbn-rb/lib/zxcvbn/scoring.rb
     31536  ~/zxcvbn-rb/lib/zxcvbn/feedback.rb
     22540  ~/zxcvbn-rb/lib/zxcvbn.rb
     20768  ~/zxcvbn-rb/lib/zxcvbn/time_estimates.rb
       572  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler/runtime.rb
        80  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler.rb
        80  <internal:timev>
        40  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/derailed_benchmarks-2.2.1/bin/derailed

Which is is for sure much better than from the original from master (baseline):

Measuring objects created by gems in groups [:default, "production"]
Total allocated: 12073226 bytes (192712 objects)
Total retained:  7166201 bytes (94350 objects)

allocated memory by gem
-----------------------------------
  12072454  zxcvbn-rb/lib
       652  bundler-2.4.5
        80  other
        40  derailed_benchmarks-2.2.1

allocated memory by file
-----------------------------------
   7980884  ~/zxcvbn-rb/lib/zxcvbn/matching.rb
   3838802  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
     97876  ~/zxcvbn-rb/lib/zxcvbn/adjacency_graphs.rb
     80168  ~/zxcvbn-rb/lib/zxcvbn/scoring.rb
     31456  ~/zxcvbn-rb/lib/zxcvbn/feedback.rb
     22540  ~/zxcvbn-rb/lib/zxcvbn.rb
     20728  ~/zxcvbn-rb/lib/zxcvbn/time_estimates.rb
       572  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler/runtime.rb
        80  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler.rb
        80  <internal:timev>
        40  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/derailed_benchmarks-2.2.1/bin/derailed

But I took your findings about producing arrays from the hash keys one step further and I got this without the need for the new class RankedDict.

Measuring objects created by gems in groups [:default, "production"]
Total allocated: 7570561 bytes (99101 objects)
Total retained:  7185201 bytes (94598 objects)

allocated memory by gem
-----------------------------------
   7569789  zxcvbn-rb/lib
       652  bundler-2.4.5
        80  other
        40  derailed_benchmarks-2.2.1

allocated memory by file
-----------------------------------
   3838802  ~/zxcvbn-rb/lib/zxcvbn/frequency_lists.rb
   3479843  ~/zxcvbn-rb/lib/zxcvbn/matching.rb
     97876  ~/zxcvbn-rb/lib/zxcvbn/adjacency_graphs.rb
     78424  ~/zxcvbn-rb/lib/zxcvbn/scoring.rb
     31536  ~/zxcvbn-rb/lib/zxcvbn/feedback.rb
     22540  ~/zxcvbn-rb/lib/zxcvbn.rb
     20768  ~/zxcvbn-rb/lib/zxcvbn/time_estimates.rb
       572  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler/runtime.rb
        80  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/bundler-2.4.5/lib/bundler.rb
        80  <internal:timev>
        40  ~/.asdf/installs/ruby/3.3.4/lib/ruby/gems/3.3.0/gems/derailed_benchmarks-2.2.1/bin/derailed

Thank you so much for this contribution. I will be merging it soon.

@zarqman
Copy link
Contributor Author

zarqman commented Nov 3, 2024

Ah, you're right, I missed that the memory got reattributed to Set. Bummer. Was really hoping to get the retained memory further down, although reducing initial allocations is still a solid win.

Thanks for double-checking my work and for your ongoing work on this gem! 🥇

@zarqman
Copy link
Contributor Author

zarqman commented Nov 3, 2024

What's caught my attention is that the raw data is only 788KB on disk. So, I've experimented with this a bit more.

There is additional memory savings available, but only with a notable performance hit. In brief, this would look like storing the internal data in large, delimited strings to avoid the overhead of so many arrays or hashes, strings, and integers.

For example, such a string might look like aaaa1\t1\nbbbb\t2\n.... Each record is in the form<word> \t <rank> \n. The string is searched using a regexp.

Holding each dictionary in one giant string (ie: 6 strings total) drops the retained memory down to 1.4MB (compared to 7.2MB in the above comment). While this is almost double the raw data, some of that is from the extra \t1234\n that's being added. I got allocated memory down to 5.5MB, and this can potentially be optimized further by storing the delimited version directly on disk--I have been building it in-memory so far.

However, despite the memory savings, the test suite is about 180x slower! 🐢

Grouping each dictionary by first character (words.group_by{|w| w[0] }) helps the performance quite a bit. Each dictionary ends up with 26-43 strings, so the object count is still relatively small. In this case, the test suite is now just 10x slower.

Lastly, grouping each by the first 2 characters results in the test suite being only 5x slower.

Near as I can tell, the test suite performs approximately 315000 word lookups. On my system, the last commit above runs in 0.25 sec. 5x slower is 1.25sec, or a loss of 1sec across 315k lookups. 10x slower is 2.5sec, or a loss of 2.25sec across 315k (around .007ms per lookup).

Assuming I've measured things properly this time, this is at least an interesting result. I'm not sure how anyone feels about a 5-10x decrease in performance in exchange for reducing memory by ~5.5MB (per process, of course).

Anyway, mostly thought I'd share my results in case anyone else feels compelled to think about this more. It's a curious study into Ruby's memory management in light of working with a dictionary of 100k words.

If it's somehow super interesting and potentially worth merging, feel free to chase it or I'd also be happy to work up a follow-on PR to the present one (or append it here, if preferred).

@formigarafa
Copy link
Owner

I think with this PR you have found the most annoying part of the problem that was causing the double of the strings to be loaded. This was huge.
If there is no more of this, IMHO, it would be best focusing on stop using constants to store the data.
This will allow the gem to be loaded without allocating any of this memory and also deallocate when the job is done.
It should also allow for loading of custom dictionaries and reset to default ones.
After that we may have a better sense to reconsider what is worth about memory usage or performance.

@formigarafa formigarafa merged commit c673285 into formigarafa:master Nov 4, 2024
@formigarafa
Copy link
Owner

Hey @zarqman, have a look at PR #18. It allow you to not retain any of the dictionaries.

I also checked the number of objects loaded, there are around of 94k entries in the dictionaries which accounts for most of the gem allocations.
I could be wrong but I can only see a possibility for improving these numbers by changing significatively how the data is stored, and searched to prevent it from even being loaded. And I don't know if it would cause some other performance drawbacks with time for the calculations.

@zarqman
Copy link
Contributor Author

zarqman commented Nov 5, 2024

I've left most of my comments on #18.

I could be wrong but I can only see a possibility for improving these numbers by changing significatively how the data is stored, and searched to prevent it from even being loaded. And I don't know if it would cause some other performance drawbacks with time for the calculations.

Agreed, to reduce the memory used will require a significant change in storage and search. That's the experiment I was discussing in my last comment above. It does work, but it's a bit messy. Performance hit on my laptop was only a fraction of 1ms per test iteration. That's likely much less than the hit of loading/unloading/garbage-collecting the entire dictionary.

However, even that experiment still takes 1.4MB of memory perpetually. That's a fair bit more than the 0 MB used after unloading the dictionaries. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants