Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: switch to quick_xml library #101

Merged
merged 5 commits into from
Jan 22, 2025
Merged

refactor: switch to quick_xml library #101

merged 5 commits into from
Jan 22, 2025

Conversation

2bndy5
Copy link
Collaborator

@2bndy5 2bndy5 commented Jan 21, 2025

  • deserialize only what is needed from XML. cols, length, and value fields were never used and are now ignored when parsing XML
  • read file contents only once when translating byte offset to line and columns

Summary by CodeRabbit

  • Dependency Changes

    • Added quick-xml library with serialization support
    • Removed serde-xml-rs dependency
  • Code Improvements

    • Updated XML parsing and deserialization strategy
    • Refined line number calculation methods
    • Enhanced GitHub API client feedback posting logic
  • Performance

    • Optimized file content processing
    • Improved error handling for XML serialization
  • Logging Enhancements

    • Clarified log messages in file filtering checks

- deserialize only what is needed from XML. `length` and `value` fields are not used and are now ignored when parsing XML
- read file contents only once when translating byte offset to line and columns
@2bndy5 2bndy5 added dependencies Pull requests that update a dependency file enhancement New feature or request labels Jan 21, 2025
@2bndy5 2bndy5 changed the title use quick_xml refactor: switch to quick_xml library Jan 21, 2025
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 91.54930% with 6 lines in your changes missing coverage. Please review.

Project coverage is 97.58%. Comparing base (05f8b95) to head (cc494ec).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cpp-linter/src/clang_tools/clang_format.rs 85.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
- Coverage   97.84%   97.58%   -0.27%     
==========================================
  Files          14       14              
  Lines        3474     3430      -44     
==========================================
- Hits         3399     3347      -52     
- Misses         75       83       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@2bndy5 2bndy5 linked an issue Jan 21, 2025 that may be closed by this pull request
Copy link

codspeed-hq bot commented Jan 21, 2025

CodSpeed Performance Report

Merging #101 will improve performances by 79.69%

Comparing quick-xml (cc494ec) with main (05f8b95)

Summary

⚡ 1 improvements

Benchmarks breakdown

Benchmark main quick-xml Change
scan libgit2 10.3 s 5.7 s +79.69%

@2bndy5 2bndy5 marked this pull request as ready for review January 21, 2025 09:35
Copy link
Contributor

coderabbitai bot commented Jan 21, 2025

Walkthrough

The pull request introduces changes to the cpp-linter project, focusing on XML parsing and dependency management. The primary modifications involve replacing the serde-xml-rs library with quick-xml for XML serialization and deserialization. This change impacts the clang_format.rs file, updating the FormatAdvice and Replacement structs to work with the new XML parsing library. Additionally, the get_line_cols_from_offset function in the common filesystem module has been refactored to work with byte slices and use u32 types.

Changes

File Change Summary
Cargo.toml - Added quick-xml dependency (v0.37.2) with serialize feature
- Removed serde-xml-rs dependency
src/clang_tools/clang_format.rs - Updated XML deserialization attributes
- Modified FormatAdvice and Replacement structs
- Changed type from usize to u32
- Simplified replacement handling
src/common_fs/mod.rs - Refactored get_line_cols_from_offset function
- Changed input from PathBuf to byte slice
- Updated function signature and implementation
src/rest_api/github/mod.rs - Enhanced post_feedback method
- Added logic for thread comments
- Updated test create_comment function
src/rest_api/github/specific_api.rs - Minor syntactical adjustment in post_annotations method
src/clang_tools/clang_tidy.rs - Simplified handling of original_content variable in run_clang_tidy function
src/clang_tools/mod.rs - Updated type handling and control flow in ReviewComments struct and methods
src/common_fs/file_filter.rs - Modified logging message format in is_file_in_list method

Possibly related issues

  • cpp-linter/cpp-linter-rs#100: This PR directly addresses the issue of switching from serde-xml-rs to quick-xml, matching the exact recommendation in the issue description for improving XML parsing performance and using a more actively maintained library.

Poem

🐰 Hop, hop, through XML's maze,
Quick-xml dances with newfound grace
Serde steps aside, making way
For faster parsing, bright as day
A rabbit's code, now swift and light! 🚀


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63c1d68 and cc494ec.

📒 Files selected for processing (1)
  • cpp-linter/src/common_fs/mod.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: aarch64-pc-windows-msvc
  • GitHub Check: x86_64-pc-windows-msvc
  • GitHub Check: aarch64-apple-darwin
  • GitHub Check: s390x-unknown-linux-gnu
  • GitHub Check: powerpc64le-unknown-linux-gnu
  • GitHub Check: powerpc64-unknown-linux-gnu
  • GitHub Check: powerpc-unknown-linux-gnu
  • GitHub Check: armv7-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabi
  • GitHub Check: x86_64-unknown-linux-musl
  • GitHub Check: stable - x86_64-unknown-linux-gnu - node@20
  • GitHub Check: x86_64-unknown-linux-gnu
  • GitHub Check: stable - x86_64-pc-windows-msvc - node@20
  • GitHub Check: aarch64-unknown-linux-musl
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: stable - x86_64-apple-darwin - node@20
  • GitHub Check: aarch64-unknown-linux-gnu
  • GitHub Check: test (windows-latest)
  • GitHub Check: benchmark
🔇 Additional comments (3)
cpp-linter/src/common_fs/mod.rs (3)

222-228: LGTM! Clear and accurate documentation.

The function's documentation clearly explains its purpose and includes important details about offset bounds handling.


229-231: LGTM! Robust implementation with bounds checking.

The implementation is concise and includes proper bounds checking as previously suggested. The removal of column calculation aligns with the codebase's actual usage patterns, as confirmed in previous discussions.


311-329: LGTM! Comprehensive test coverage.

The test cases cover essential scenarios:

  • Basic line counting
  • Empty content
  • No newlines
  • Consecutive newlines
  • Offset bounds handling

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
cpp-linter/src/clang_tools/clang_format.rs (3)

156-156: Add test coverage for deserialization error handling

The error handling for non-UTF-8 encoded XML output is crucial, but static analysis indicates this line isn't covered by tests. Consider adding tests that provide invalid or non-UTF-8 encoded XML data to ensure the error handling works as expected.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 156-156: cpp-linter/src/clang_tools/clang_format.rs#L156
Added line #L156 was not covered by tests


165-167: Ensure file read errors are tested

The added error context when reading the original file content improves debuggability. However, static analysis reports that these lines lack test coverage. Add test cases simulating scenarios where reading the file might fail, such as missing files or permission issues, to validate the error handling logic.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 165-167: cpp-linter/src/clang_tools/clang_format.rs#L165-L167
Added lines #L165 - L167 were not covered by tests


205-222: Update tests to cover new deserialization logic

The parse_xml test has been updated to use quick_xml, but ensure it adequately covers various scenarios, including edge cases and error handling. Consider adding tests for empty inputs, malformed XML, and unexpected structures to verify robustness.

cpp-linter/src/common_fs/mod.rs (1)

315-316: Enhance test coverage for boundary conditions

In translate_byte_offset test, consider adding cases where offset is zero, at the end of the file, or beyond the file length to ensure get_line_cols_from_offset handles these scenarios correctly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05f8b95 and e5261f0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • cpp-linter/Cargo.toml (1 hunks)
  • cpp-linter/src/clang_tools/clang_format.rs (5 hunks)
  • cpp-linter/src/common_fs/mod.rs (3 hunks)
  • cpp-linter/src/rest_api/github/mod.rs (0 hunks)
  • cpp-linter/src/rest_api/github/specific_api.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • cpp-linter/src/rest_api/github/mod.rs
✅ Files skipped from review due to trivial changes (1)
  • cpp-linter/src/rest_api/github/specific_api.rs
🧰 Additional context used
🪛 GitHub Check: codecov/patch
cpp-linter/src/clang_tools/clang_format.rs

[warning] 156-156: cpp-linter/src/clang_tools/clang_format.rs#L156
Added line #L156 was not covered by tests


[warning] 165-167: cpp-linter/src/clang_tools/clang_format.rs#L165-L167
Added lines #L165 - L167 were not covered by tests

🔇 Additional comments (4)
cpp-linter/src/clang_tools/clang_format.rs (3)

5-5: Import of fs module is appropriate

Adding fs to the imports is necessary for file operations used later in the code.


41-41: Addition of Copy trait is acceptable

Deriving the Copy trait for Replacement allows for simpler value semantics since the struct is composed of types that implement Copy.


24-24: ⚠️ Potential issue

Correct the serde attribute syntax for field renaming

The attribute #[serde(rename(deserialize = "replacement"))] may not correctly specify the field rename during deserialization. The correct syntax in serde for renaming a field during deserialization is #[serde(rename = "replacement")] or, if you need to differentiate between serialization and deserialization names, use #[serde(rename(serialize = "name1", deserialize = "name2"))].

Apply this diff to fix the attribute:

-    #[serde(rename(deserialize = "replacement"))]
+    #[serde(rename = "replacement")]

Likely invalid or redundant comment.

cpp-linter/Cargo.toml (1)

28-28: Dependency update to quick-xml is appropriate

Adding quick-xml with the serialize feature aligns with the shift from serde-xml-rs and provides efficient XML parsing capabilities.

cpp-linter/src/clang_tools/clang_format.rs Outdated Show resolved Hide resolved
cpp-linter/src/clang_tools/clang_format.rs Outdated Show resolved Hide resolved
cpp-linter/src/common_fs/mod.rs Outdated Show resolved Hide resolved
Also ensure given `offset` is not out of bounds when calculating line number from byte offset.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cpp-linter/src/clang_tools/clang_format.rs (1)

41-52: Consider adding documentation for the removed fields.

While removing unused fields is good, it would be helpful to document why length and value fields were removed to maintain historical context.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5261f0 and 39d5465.

📒 Files selected for processing (6)
  • cpp-linter/src/clang_tools/clang_format.rs (7 hunks)
  • cpp-linter/src/clang_tools/clang_tidy.rs (1 hunks)
  • cpp-linter/src/clang_tools/mod.rs (5 hunks)
  • cpp-linter/src/common_fs/file_filter.rs (1 hunks)
  • cpp-linter/src/common_fs/mod.rs (3 hunks)
  • cpp-linter/src/rest_api/github/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp-linter/src/rest_api/github/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (19)
  • GitHub Check: aarch64-pc-windows-msvc
  • GitHub Check: x86_64-pc-windows-msvc
  • GitHub Check: x86_64-apple-darwin
  • GitHub Check: aarch64-apple-darwin
  • GitHub Check: s390x-unknown-linux-gnu
  • GitHub Check: powerpc64le-unknown-linux-gnu
  • GitHub Check: powerpc64-unknown-linux-gnu
  • GitHub Check: powerpc-unknown-linux-gnu
  • GitHub Check: armv7-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabi
  • GitHub Check: stable - x86_64-unknown-linux-gnu - node@20
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: stable - x86_64-pc-windows-msvc - node@20
  • GitHub Check: aarch64-unknown-linux-musl
  • GitHub Check: aarch64-unknown-linux-gnu
  • GitHub Check: benchmark
  • GitHub Check: test (windows-latest)
  • GitHub Check: stable - x86_64-apple-darwin - node@20
🔇 Additional comments (10)
cpp-linter/src/common_fs/file_filter.rs (1)

114-117: LGTM! Improved log message clarity.

The updated debug message format is clearer and more direct, making it easier to understand the ignore status of files during debugging.

cpp-linter/src/clang_tools/clang_tidy.rs (1)

279-289: LGTM! Clean refactoring of the original content handling.

The code has been simplified by using a more direct conditional assignment pattern, improving readability while maintaining the same functionality.

cpp-linter/src/clang_tools/mod.rs (3)

261-271: LGTM! Improved type safety in loop indexing.

The change from u8 to usize for array indexing aligns with Rust's best practices for array access.


Line range hint 371-384: LGTM! Simplified boolean to index conversion.

The code now uses a more idiomatic Rust approach by casting the boolean comparison directly to usize.


447-449: LGTM! Optimized tool total handling.

The code now uses a more concise approach to update tool totals with unwrap_or_default.

cpp-linter/src/clang_tools/clang_format.rs (3)

Line range hint 21-28: LGTM! Clean struct definition with appropriate attributes.

The FormatAdvice struct is well-defined with proper serialization attributes for quick-xml.


146-158: LGTM! Improved XML parsing with proper error handling.

The code now properly handles XML parsing with appropriate error context and empty XML cases.


192-197: LGTM! Added test for blank XML handling.

Good addition of a test case to verify behavior with blank XML input.

cpp-linter/src/common_fs/mod.rs (2)

230-234: LGTM! Improved offset handling with bounds checking.

The function has been simplified to only return line count and includes proper bounds checking with min(). The removal of column calculation is appropriate since it was unused in the codebase.


314-316: LGTM! Updated test case for new function signature.

The test has been properly updated to work with the new function signature and simplified return value.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cpp-linter/src/common_fs/mod.rs (2)

222-229: Clean up documentation to remove column-related references.

The documentation still mentions "column number" and "word boundary" which are no longer relevant since the function now only computes line numbers.

 /// Gets the line number for a given `offset` (of bytes) from given the
 /// byte array `contents`.
 ///
 /// This computes the line number from a buffer of bytes.
-/// In non-UTF-8 encoded files, this does not guarantee that a word
-/// boundary exists at the returned column number. However, the `offset` given to this
+/// The `offset` given to this
 /// function is expected to originate from diagnostic information provided by
 /// clang-format.

313-315: Consider enhancing test coverage.

While the current test is good, consider adding tests for:

  1. Empty file (offset 0)
  2. Offset at end of file
  3. Offset beyond file length
  4. File with no newlines
  5. File with consecutive newlines

Also, consider using a test fixture or constant for the test file path to make tests more maintainable.

Example test cases:

#[test]
fn test_line_count_edge_cases() {
    // Empty content
    assert_eq!(get_line_count_from_offset(&[], 0), 1);
    
    // No newlines
    assert_eq!(get_line_count_from_offset(b"abc", 3), 1);
    
    // Consecutive newlines
    assert_eq!(get_line_count_from_offset(b"a\n\nb", 3), 3);
    
    // Offset beyond content length
    assert_eq!(get_line_count_from_offset(b"a\nb\n", 10), 3);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39d5465 and 63c1d68.

📒 Files selected for processing (1)
  • cpp-linter/src/common_fs/mod.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (19)
  • GitHub Check: aarch64-pc-windows-msvc
  • GitHub Check: x86_64-apple-darwin
  • GitHub Check: aarch64-apple-darwin
  • GitHub Check: s390x-unknown-linux-gnu
  • GitHub Check: powerpc64le-unknown-linux-gnu
  • GitHub Check: powerpc64-unknown-linux-gnu
  • GitHub Check: powerpc-unknown-linux-gnu
  • GitHub Check: armv7-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabihf
  • GitHub Check: arm-unknown-linux-gnueabi
  • GitHub Check: stable - x86_64-unknown-linux-gnu - node@20
  • GitHub Check: x86_64-unknown-linux-gnu
  • GitHub Check: stable - x86_64-pc-windows-msvc - node@20
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: aarch64-unknown-linux-musl
  • GitHub Check: aarch64-unknown-linux-gnu
  • GitHub Check: stable - x86_64-apple-darwin - node@20
  • GitHub Check: test (windows-latest)
  • GitHub Check: benchmark
🔇 Additional comments (1)
cpp-linter/src/common_fs/mod.rs (1)

230-233: LGTM! Clean and safe implementation.

The implementation correctly handles bounds checking and efficiently counts lines. The use of split with newline detection is an appropriate approach.

@2bndy5 2bndy5 merged commit 9d782b5 into main Jan 22, 2025
44 checks passed
@2bndy5 2bndy5 deleted the quick-xml branch January 22, 2025 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

switch to quick_xml
1 participant