Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow empty items to be cached #8890

Merged
merged 6 commits into from
Jan 6, 2025
Merged

Allow empty items to be cached #8890

merged 6 commits into from
Jan 6, 2025

Conversation

Marishka17
Copy link
Contributor

@Marishka17 Marishka17 commented Dec 30, 2024

Motivation and context

Details are described here.

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • Bug Fixes
    • Improved cache error handling by introducing a specific exception for empty cache scenarios
    • Enhanced data validation when creating cache items to prevent caching of invalid or empty data

Copy link
Contributor

coderabbitai bot commented Dec 30, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes introduce a new exception class EmptyCacheItemError in the cache management module. The modification enhances error handling for cache operations by adding a specific exception to handle scenarios where empty data is attempted to be cached. The _create_and_set_cache_item method is updated to check for None data, log an informational message, and raise the new exception, making the caching process more robust against invalid or empty data inputs.

Changes

File Change Summary
cvat/apps/engine/cache.py - Added new exception class EmptyCacheItemError
- Modified _create_and_set_cache_item method to handle empty data scenarios
- Added logging and exception raising for empty cache items

Poem

🐰 In the cache of code so bright,
A rabbit hops with error's might!
Empty data shall not pass,
With EmptyCacheItemError we'll amass
A fortress of robust delight! 🏰


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cvat/apps/engine/cache.py (1)

72-74: Consider adding a docstring to the new custom exception.
Although it's perfectly valid to define a bare exception class, a short docstring explaining its purpose and usage can improve maintainability and clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26f08ce and 4ccbe06.

📒 Files selected for processing (1)
  • cvat/apps/engine/cache.py (2 hunks)
🔇 Additional comments (2)
cvat/apps/engine/cache.py (2)

228-228: LGTM: Safe extraction of the underlying bytes.
Retrieving the buffer contents with getvalue() is concise and correct, especially given the prior check that ensures data_to_be_cached is not None.


223-227: Confirm if empty buffers should also raise an exception.
Currently, the code only raises an EmptyCacheItemError when data_to_be_cached is None. If data_to_be_cached exists but contains zero bytes, it falls through the door without raising an exception, potentially leading to a no-op caching scenario. Please verify whether the correct behavior in this case should be to raise the same exception or silently skip caching.

✅ Verification successful

Let me search for more context about the empty buffer check.


Let me search for more context about the buffer handling in the cache implementation.


Let me check how the cache item is used in the codebase.


Let me check where the cache item is set and how it's processed.


Based on the code analysis, I can now provide a final response:

Empty buffers should not raise an exception

The current behavior is correct. The code specifically checks for None values and raises EmptyCacheItemError in that case, while allowing empty buffers (zero bytes) to proceed. Here's why:

  • The data_to_be_cached is of type io.BytesIO (as defined in DataWithMime = tuple[io.BytesIO, str])
  • A None value indicates a complete failure to prepare the data, which should raise an exception
  • An empty buffer (zero bytes) is a valid state that can occur in legitimate scenarios, such as:
    • Empty frame context images (when no related images exist)
    • Empty preview images
    • Empty chunks that are valid but contain no data

The caching mechanism handles the empty buffer case appropriately by still caching it with its checksum and timestamp, which prevents unnecessary regeneration attempts for legitimately empty data.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Inspect occurrences of getvalue() usage within the repository and see if there are additional checks for empty buffers anywhere else.

rg --multiline 'getvalue\(' -A 10

Length of output: 74472


Script:

#!/bin/bash
# Search for the implementation of EmptyCacheItemError and its usage
rg -A 5 "class EmptyCacheItemError"

# Search for any buffer length checks in the caching context
rg -A 5 "len\(.*\)" cvat/apps/engine/cache.py

Length of output: 841


Script:

#!/bin/bash
# Search for the implementation of cache item saving and buffer handling
rg -A 10 "def save" cvat/apps/engine/cache.py

# Search for BytesIO usage in cache
rg -A 10 "BytesIO" cvat/apps/engine/cache.py

Length of output: 3080


Script:

#!/bin/bash
# Search for cache item usage and validation
rg -A 10 "def get_or_set" cvat/apps/engine/cache.py

# Search for where the data_to_be_cached is prepared
ast-grep --pattern 'data_to_be_cached = $_'

Length of output: 3305


Script:

#!/bin/bash
# Search for the implementation of _get_or_set_cache_item and its usage
rg -A 15 "def _get_or_set_cache_item" cvat/apps/engine/cache.py

# Search for where the cache item is processed
rg -A 10 "_set_cache_item" cvat/apps/engine/cache.py

Length of output: 4698

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 73.90%. Comparing base (9a25291) to head (9ff597f).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8890      +/-   ##
===========================================
+ Coverage    73.89%   73.90%   +0.01%     
===========================================
  Files          408      411       +3     
  Lines        44131    44175      +44     
  Branches      3986     3993       +7     
===========================================
+ Hits         32611    32649      +38     
- Misses       11520    11526       +6     
Components Coverage Δ
cvat-ui 78.37% <76.78%> (-0.03%) ⬇️
cvat-server 70.10% <75.00%> (+0.03%) ⬆️

@Marishka17 Marishka17 changed the title Raise EmptyCacheItemError when create_callback returns None as data to be cached Allow empty items to be cached Dec 31, 2024
@@ -356,7 +358,7 @@ def _to_data_with_mime(self, cache_item: _CacheItem) -> DataWithMime: ...
def _to_data_with_mime(self, cache_item: Optional[_CacheItem]) -> Optional[DataWithMime]: ...

def _to_data_with_mime(self, cache_item: Optional[_CacheItem]) -> Optional[DataWithMime]:
if not cache_item:
if not cache_item or not len(cache_item[0].getbuffer()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to update all the places where empty results were not expected. I know at least 1 such place - get_frame_context_images_chunk(). The _to_data_with_mime() function annotations are also not correct now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it actually looks like we should just expect empty result (empty BytesIO) in places, where empty cached items can be valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it actually looks like we should just expect empty result (empty BytesIO) in places, where empty cached items can be valid.

Why? I still think that there should be a common approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably verify that data is not empty in places where it is not acceptable

Copy link
Contributor

@zhiltsov-max zhiltsov-max Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common approach to what? The cache returns exactly what was put into the cache, this is the current approach. If something puts empty items, it should expect empty items in return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I thought that you meant here not to update _to_data_with_mime to return None when cache_item[0] is empty. Am I missing something?

Copy link
Contributor Author

@Marishka17 Marishka17 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common approach to what?

Update _to_data_with_mime logic and not patch each function where _to_data_with_mime is called

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this way the function will return what was put into the cache. Possibly, an empty BytesIO, where it can happen.

Copy link

sonarqubecloud bot commented Jan 3, 2025

@zhiltsov-max zhiltsov-max merged commit 30fbfb2 into develop Jan 6, 2025
35 checks passed
@zhiltsov-max zhiltsov-max deleted the mk/fix_cache_issue branch January 6, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants