-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow empty items to be cached #8890
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce a new exception class Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cvat/apps/engine/cache.py (1)
72-74
: Consider adding a docstring to the new custom exception.
Although it's perfectly valid to define a bare exception class, a short docstring explaining its purpose and usage can improve maintainability and clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cvat/apps/engine/cache.py
(2 hunks)
🔇 Additional comments (2)
cvat/apps/engine/cache.py (2)
228-228
: LGTM: Safe extraction of the underlying bytes.
Retrieving the buffer contents with getvalue()
is concise and correct, especially given the prior check that ensures data_to_be_cached
is not None
.
223-227
: Confirm if empty buffers should also raise an exception.
Currently, the code only raises an EmptyCacheItemError
when data_to_be_cached
is None
. If data_to_be_cached
exists but contains zero bytes, it falls through the door without raising an exception, potentially leading to a no-op caching scenario. Please verify whether the correct behavior in this case should be to raise the same exception or silently skip caching.
✅ Verification successful
Let me search for more context about the empty buffer check.
Let me search for more context about the buffer handling in the cache implementation.
Let me check how the cache item is used in the codebase.
Let me check where the cache item is set and how it's processed.
Based on the code analysis, I can now provide a final response:
Empty buffers should not raise an exception
The current behavior is correct. The code specifically checks for None
values and raises EmptyCacheItemError
in that case, while allowing empty buffers (zero bytes) to proceed. Here's why:
- The
data_to_be_cached
is of typeio.BytesIO
(as defined inDataWithMime = tuple[io.BytesIO, str]
) - A
None
value indicates a complete failure to prepare the data, which should raise an exception - An empty buffer (zero bytes) is a valid state that can occur in legitimate scenarios, such as:
- Empty frame context images (when no related images exist)
- Empty preview images
- Empty chunks that are valid but contain no data
The caching mechanism handles the empty buffer case appropriately by still caching it with its checksum and timestamp, which prevents unnecessary regeneration attempts for legitimately empty data.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Inspect occurrences of getvalue() usage within the repository and see if there are additional checks for empty buffers anywhere else.
rg --multiline 'getvalue\(' -A 10
Length of output: 74472
Script:
#!/bin/bash
# Search for the implementation of EmptyCacheItemError and its usage
rg -A 5 "class EmptyCacheItemError"
# Search for any buffer length checks in the caching context
rg -A 5 "len\(.*\)" cvat/apps/engine/cache.py
Length of output: 841
Script:
#!/bin/bash
# Search for the implementation of cache item saving and buffer handling
rg -A 10 "def save" cvat/apps/engine/cache.py
# Search for BytesIO usage in cache
rg -A 10 "BytesIO" cvat/apps/engine/cache.py
Length of output: 3080
Script:
#!/bin/bash
# Search for cache item usage and validation
rg -A 10 "def get_or_set" cvat/apps/engine/cache.py
# Search for where the data_to_be_cached is prepared
ast-grep --pattern 'data_to_be_cached = $_'
Length of output: 3305
Script:
#!/bin/bash
# Search for the implementation of _get_or_set_cache_item and its usage
rg -A 15 "def _get_or_set_cache_item" cvat/apps/engine/cache.py
# Search for where the cache item is processed
rg -A 10 "_set_cache_item" cvat/apps/engine/cache.py
Length of output: 4698
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8890 +/- ##
===========================================
+ Coverage 73.89% 73.90% +0.01%
===========================================
Files 408 411 +3
Lines 44131 44175 +44
Branches 3986 3993 +7
===========================================
+ Hits 32611 32649 +38
- Misses 11520 11526 +6
|
EmptyCacheItemError
when create_callback returns None as data to be cached
cvat/apps/engine/cache.py
Outdated
@@ -356,7 +358,7 @@ def _to_data_with_mime(self, cache_item: _CacheItem) -> DataWithMime: ... | |||
def _to_data_with_mime(self, cache_item: Optional[_CacheItem]) -> Optional[DataWithMime]: ... | |||
|
|||
def _to_data_with_mime(self, cache_item: Optional[_CacheItem]) -> Optional[DataWithMime]: | |||
if not cache_item: | |||
if not cache_item or not len(cache_item[0].getbuffer()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to update all the places where empty results were not expected. I know at least 1 such place - get_frame_context_images_chunk()
. The _to_data_with_mime()
function annotations are also not correct now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it actually looks like we should just expect empty result (empty BytesIO
) in places, where empty cached items can be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it actually looks like we should just expect empty result (empty BytesIO) in places, where empty cached items can be valid.
Why? I still think that there should be a common approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably verify that data is not empty in places where it is not acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common approach to what? The cache returns exactly what was put into the cache, this is the current approach. If something puts empty items, it should expect empty items in return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I thought that you meant here not to update _to_data_with_mime
to return None when cache_item[0] is empty. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common approach to what?
Update _to_data_with_mime
logic and not patch each function where _to_data_with_mime
is called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this way the function will return what was put into the cache. Possibly, an empty BytesIO
, where it can happen.
Quality Gate passedIssues Measures |
Motivation and context
Details are described here.
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit