Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cache for building encoded URLs #1432

Merged
merged 7 commits into from
Nov 29, 2024
Merged

Add a cache for building encoded URLs #1432

merged 7 commits into from
Nov 29, 2024

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Nov 29, 2024

Since aiohttp web_request tends to see the same URLs over and over, its advantageous to cache building the URL objects. We currently have a cache for parsing URLs, but we did not have one for building URLs.

We didn't have a cache on URL.build because URL.build accepts a dict for the query which is not hashable. This is solved by constructing the query string before the cache is used. Note, that it may be better to avoid caching URLs with a query string in future but it didn't seem make enough difference in testing that it was worth the additional complexity.

Since aiohttp rarely URL.build with encoded=False we do not cache unencoded builds as the hit rate was < 50% in production testing. The hit rate for https://github.com/aio-libs/aiohttp/blob/1fa237ffc9e7aa70cbabb68ab64d6fe03255cbc0/aiohttp/web_request.py#L429 is nearly perfect

Additionally, and similar to #1434 every time the new URL object was created it would have a new self._cache, but caching the build, self._cache will already be initialized.

We currently have a cache for parsing URLs, but we did not have
one for building URLs because URL.build accepts a dict for the
query which is not hashable. Since web app tend to see the same
80% of URLs over and over, its advantageous to cache building
the URL object as aiohttp has to make them on every web request.
Copy link

codspeed-hq bot commented Nov 29, 2024

CodSpeed Performance Report

Merging #1432 will improve performances by 53.86%

Comparing build_lru (0d25f0b) with master (e3a282e)

Summary

⚡ 1 improvements
✅ 84 untouched benchmarks

Benchmarks breakdown

Benchmark master build_lru Change
test_url_build_encoded_with_host_and_port 397.8 µs 258.6 µs +53.86%

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.11%. Comparing base (e3a282e) to head (0d25f0b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1432      +/-   ##
==========================================
+ Coverage   96.10%   96.11%   +0.01%     
==========================================
  Files          31       31              
  Lines        5855     5873      +18     
  Branches      348      349       +1     
==========================================
+ Hits         5627     5645      +18     
  Misses        202      202              
  Partials       26       26              
Flag Coverage Δ
CI-GHA 96.11% <100.00%> (+0.01%) ⬆️
MyPy 49.40% <95.00%> (+0.14%) ⬆️
OS-Linux 99.55% <100.00%> (+<0.01%) ⬆️
OS-Windows 99.62% <100.00%> (+<0.01%) ⬆️
OS-macOS 99.30% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 99.28% <100.00%> (+<0.01%) ⬆️
Py-3.10.15 99.51% <100.00%> (+<0.01%) ⬆️
Py-3.11.10 99.51% <100.00%> (+<0.01%) ⬆️
Py-3.11.9 99.28% <100.00%> (+<0.01%) ⬆️
Py-3.12.7 99.51% <100.00%> (+<0.01%) ⬆️
Py-3.13.0 99.51% <100.00%> (+<0.01%) ⬆️
Py-3.9.13 99.24% <100.00%> (+<0.01%) ⬆️
Py-3.9.20 99.47% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.16 99.53% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.17 99.55% <100.00%> (+<0.01%) ⬆️
VM-macos-latest 99.30% <100.00%> (+<0.01%) ⬆️
VM-ubuntu-latest 99.55% <100.00%> (+<0.01%) ⬆️
VM-windows-latest 99.62% <100.00%> (+<0.01%) ⬆️
pytest 99.55% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bdraco
Copy link
Member Author

bdraco commented Nov 29, 2024

The hit rate for pre-encoded in production is great. Not so much for unencoded ... might not make sense to cache building unencoded

@bdraco bdraco closed this Nov 29, 2024
@bdraco bdraco deleted the build_lru branch November 29, 2024 16:33
@bdraco bdraco restored the build_lru branch November 29, 2024 16:35
@bdraco bdraco reopened this Nov 29, 2024
@bdraco
Copy link
Member Author

bdraco commented Nov 29, 2024

We still end up losing all the cache if they call

    @reify
    def url(self) -> URL:
        """The full URL of the request."""
        # authority is used here because it may include the port number
        # and we want yarl to parse it correctly
        return URL.build(scheme=self.scheme, authority=self.host).join(self._rel_url)

Because .join isn't cached....

edit: addressed in #1434

@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 29, 2024
@bdraco bdraco changed the title DNM: Add a cache for building URLs Add a cache for building encoded URLs Nov 29, 2024
@bdraco
Copy link
Member Author

bdraco commented Nov 29, 2024

2024-11-29 09:05:09.163 CRITICAL (SyncWorker_9) [homeassistant.components.profiler] Cache stats for lru_cache <function build_pre_encoded_url at 0x7f827e1e1260> at /usr/local/lib/python3.13/site-packages/yarl/_url.py: CacheInfo(hits=187, misses=15, maxsize=128, currsize=15)

@bdraco bdraco marked this pull request as ready for review November 29, 2024 19:06
@bdraco bdraco merged commit 0d1c8b7 into master Nov 29, 2024
46 of 48 checks passed
@bdraco bdraco deleted the build_lru branch November 29, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant