Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New "priority" cleanup policy #180

Merged
merged 1 commit into from
May 26, 2024
Merged

feat: New "priority" cleanup policy #180

merged 1 commit into from
May 26, 2024

Conversation

tphakala
Copy link
Owner

The priority disk cleanup feature purges recordings based on several policy rules. This helps maintain disk usage within user-defined limits while retaining valuable recordings.

This retention policy will be enabled by default.

  1. Configurable Threshold:

    • Cleanup starts when disk usage exceeds a user-defined threshold (e.g., 80%).
  2. Sorting Policies:

    • Oldest Files First: The primary criterion is to delete the oldest recordings first.
    • Species Occurrences: Files are prioritized for deletion based on the number of occurrences of each species, with species having the most recordings prioritized to be deleted first as these are often least interesting to user
    • Confidence Level: Files with lower confidence levels are prioritized for deletion over those with higher confidence levels.
    • Default to Oldest Timestamp: If other criteria are equal, the oldest files are prioritized.
  3. Minimum Clips Per Species:

    • Ensures that a minimum number of recordings per species are retained for each month.
    • This attempts to spare rare recordings from deletion
  4. Controlled Deletion Process:

    • Stops deleting files once disk usage falls below the threshold, each cleanup cycle is limited 1000 deletions
    • Cleanup cycle runs at 5 minute intervals
    • QuitChannel is monitored and cleanup loop exits cleanly if application exit is requested

Copy link
Contributor

coderabbitai bot commented May 26, 2024

Walkthrough

The recent changes enhance the audio clip retention system by introducing new cleanup modes based on age and priority. These modifications involve updating configuration settings, adding utility functions for error handling, and implementing age and priority-based cleanup capabilities in the diskmanager package. The ClipCleanupMonitor function has been adjusted to delegate cleanup operations according to the selected mode.

Changes

File Path Change Summary
internal/analysis/realtime.go Modified ClipCleanupMonitor to delegate cleanup to diskmanager based on mode (age or priority).
internal/conf/config.go Updated Settings struct to include Mode and DiskUsageLimit fields.
internal/conf/config.yaml Altered configuration for clip retention, including new eviction modes and disk usage limits.
internal/conf/defaults.go Updated default settings for audio export retention, adding new fields for retention policies.
internal/conf/utils.go Added ParsePercentage function and included errors and strconv packages for utility purposes.
internal/diskmanager/age.go Introduced functionality for age-based cleanup of clips.
internal/diskmanager/priority.go Added priority-based cleanup, including policy loading, disk usage calculation, and file sorting.
internal/diskmanager/util.go Added WriteSortedFilesToFile function to write sorted files to a text file for investigation.

Sequence Diagram(s) (Beta)

sequenceDiagram
    participant User
    participant Config
    participant ClipCleanupMonitor
    participant DiskManager

    User->>Config: Load Settings
    Config->>ClipCleanupMonitor: Provide Settings
    ClipCleanupMonitor->>DiskManager: Delegate Cleanup (based on mode)
    DiskManager->>DiskManager: Perform Age-Based Cleanup
    DiskManager->>DiskManager: Perform Priority-Based Cleanup
Loading

In fields of code, where bytes do roam,
We clean with age and priority's tome.
Disk space saved, our data bright,
Retention policies set just right.
A rabbit's work, both day and night,
To keep the system running light.

Tip

New Features and Improvements

Review Settings

Introduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments.

AST-based Instructions

CodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation.

Community-driven AST-based Rules

We are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information.

New Static Analysis Tools

We are continually expanding our support for static analysis tools. We have added support for biome, hadolint, and ast-grep. Update the settings in your .coderabbit.yaml file or head over to the settings page to enable or disable the tools you want to use.

Tone Settings

Users can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:

  • Mr. T: "You must talk like Mr. T in all your code reviews. I pity the fool who doesn't!"
  • Pirate: "Arr, matey! Ye must talk like a pirate in all yer code reviews. Yarrr!"
  • Snarky: "You must be snarky in all your code reviews. Snark, snark, snark!"

Revamped Settings Page

We have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to .coderabbit.yaml will be reflected in the settings page, and vice versa.

Miscellaneous

  • Turn off free summarization: You can switch off free summarization of PRs opened by users not on a paid plan using the enable_free_tier setting.
  • Knowledge-base scope: You can now set the scope of the knowledge base to either the repository (local) or the organization (global) level using the knowledge_base setting. In addition, you can specify Jira project keys and Linear team keys to limit the knowledge base scope for those integrations.
  • High-level summary placement: You can now customize the location of the high-level summary in the PR description using the high_level_summary_placeholder setting (default @coderabbitai summary).
  • Revamped request changes workflow: You can now configure CodeRabbit to auto-approve or request changes on PRs based on the review feedback using the request_changes_workflow setting.

Tip

Early Access Features
  • gpt-4o model for chat

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to full the review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Outside diff range and nitpick comments (6)
internal/diskmanager/util.go (1)

10-33: The implementation of WriteSortedFilesToFile is robust with proper error handling and resource management.

Consider adding more detailed logging before each major operation to aid in debugging and maintenance.

internal/conf/defaults.go (1)

41-45: The new default settings for the retention policy are correctly implemented.

Consider adding comments to explain each setting for better clarity and maintainability.

internal/conf/config.yaml (1)

40-44: The YAML configuration for the new retention settings is correctly formatted and aligns with the implementation in defaults.go.

Consider adding comments in the YAML file to explain the purpose and expected values for each new setting.

internal/analysis/realtime.go (1)

160-160: The integration of ClipCleanupMonitor into the real-time analysis workflow is correctly implemented.

Consider adding more detailed comments explaining the role of ClipCleanupMonitor in the context of the overall system.

internal/diskmanager/priority.go (2)

19-23: The Policy struct is well-defined for cleanup policies. Consider documenting the behavior when both AlwaysCleanupFirst and NeverCleanup conditions apply to the same species.


82-106: The GetAudioFiles function is comprehensive, but the debug logging is commented out. If it's no longer needed, consider removing these lines to clean up the code.

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 01f8ab0 and 318fb4b.
Files selected for processing (8)
  • internal/analysis/realtime.go (4 hunks)
  • internal/conf/config.go (1 hunks)
  • internal/conf/config.yaml (1 hunks)
  • internal/conf/defaults.go (1 hunks)
  • internal/conf/utils.go (2 hunks)
  • internal/diskmanager/age.go (1 hunks)
  • internal/diskmanager/priority.go (1 hunks)
  • internal/diskmanager/util.go (1 hunks)
Additional comments not posted (1)
internal/diskmanager/priority.go (1)

261-308: The sortFiles function implements multiple sorting priorities. Ensure that the sorting logic aligns with the intended priority rules as described in the PR objectives.

Verification successful

The sortFiles function implements the sorting priorities as follows:

  1. Oldest files first.
  2. Species with the most occurrences in the subdirectory.
  3. Confidence level.
  4. Default to oldest timestamp.

These priorities are correctly implemented in the sort.Slice function. The sorting logic aligns with the intended priority rules as described in the PR objectives.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify sorting logic matches PR objectives.
ast-grep --lang go --pattern $'sort.Slice($_, $_)'

Length of output: 2840

Comment on lines +34 to +64
// LoadPolicy loads the cleanup policies from a CSV file
func LoadPolicy(policyFile string) (*Policy, error) {
file, err := os.Open(policyFile)
if err != nil {
return nil, err
}
defer file.Close()

reader := csv.NewReader(file)
records, err := reader.ReadAll()
if err != nil {
return nil, err
}

policy := &Policy{
AlwaysCleanupFirst: make(map[string]bool),
NeverCleanup: make(map[string]bool),
}

for _, record := range records {
if len(record) != 2 {
return nil, errors.New("invalid policy record")
}
if record[1] == "always" {
policy.AlwaysCleanupFirst[record[0]] = true
} else if record[1] == "never" {
policy.NeverCleanup[record[0]] = true
}
}

return policy, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LoadPolicy function correctly handles file operations and CSV parsing. However, consider adding more robust error handling for CSV format issues and documenting expected CSV format.

Comment on lines +119 to +149
// parseFileInfo parses the file information from the file path and os.FileInfo
func parseFileInfo(path string, info os.FileInfo) (FileInfo, error) {
name := filepath.Base(info.Name())
parts := strings.Split(name, "_")
if len(parts) < 3 {
return FileInfo{}, errors.New("invalid file name format")
}

// The species name might contain underscores, so we need to handle the last two parts separately
confidenceStr := parts[len(parts)-2]
timestampStr := parts[len(parts)-1]
species := strings.Join(parts[:len(parts)-2], "_")

confidence, err := strconv.Atoi(strings.TrimSuffix(confidenceStr, "p"))
if err != nil {
return FileInfo{}, err
}

timestamp, err := time.Parse("20060102T150405Z", strings.TrimSuffix(timestampStr, ".wav"))
if err != nil {
return FileInfo{}, err
}

return FileInfo{
Path: path,
Species: species,
Confidence: confidence,
Timestamp: timestamp,
Size: info.Size(),
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseFileInfo function handles file name parsing and could fail on unexpected formats. Adding unit tests for this function would ensure robustness against various file naming conventions.

Would you like me to help by writing some unit tests for this function?

Comment on lines +151 to +259
// Convert 80% string etc. to 80.0 float64
threshold, err := conf.ParsePercentage(thresholdStr)
if err != nil {
return err
}

// Only remove files with extensions in this list
allowedExts := []string{".wav"}

if debug {
log.Printf("Starting cleanup process. Base directory: %s, Threshold: %.1f%%", baseDir, threshold)
}

// Get the current disk usage
diskUsage, err := GetDiskUsage(baseDir)
if err != nil {
return err
}

if debug {
log.Printf("Current disk usage: %.1f%%", diskUsage)
}

// If disk usage is below the threshold, no cleanup is needed
if diskUsage < threshold {
if debug {
log.Printf("Disk usage %.1f%% is below the %.1f%% threshold. No cleanup needed.", diskUsage, threshold)
}
return nil
} else {
if debug {
log.Printf("Disk usage %.1f%% is above the %.1f%% threshold. Cleanup needed.", diskUsage, threshold)
}
}

// Get the list of audio files
files, err := GetAudioFiles(baseDir, allowedExts, debug)
if err != nil {
return err
}

// Sort files by the cleanup priority and get the initial count of files per species per subdirectory
speciesMonthCount := sortFiles(files, debug)

// Debug: write sorted files to a file
if debug {
if err := WriteSortedFilesToFile(files, "file_cleanup_order.txt"); err != nil {
return err
}
}

// Delete files until disk usage is below the threshold or 100 files have been deleted
deletedFiles := 0
maxDeletions := 1000
totalFreedSpace := int64(0)
for _, file := range files {
select {
case <-quitChan:
if debug {
log.Println("Received quit signal, exiting cleanup loop.")
}
return nil
default:
// Get the subdirectory name
subDir := filepath.Dir(file.Path)
if diskUsage < threshold || deletedFiles >= maxDeletions || speciesMonthCount[file.Species][subDir] <= minClipsPerSpecies {
continue
}

if debug {
log.Printf("Deleting file: %s", file.Path)
}

// Delete the file deemed for cleanup
err = os.Remove(file.Path)
if err != nil {
return err
}

// Increment deleted files count and update species count
deletedFiles++
speciesMonthCount[file.Species][subDir]--

// Add file size to total freed space
totalFreedSpace += file.Size

// Update disk usage
diskUsage, err = GetDiskUsage(baseDir)
if err != nil {
return err
}
}
}

if debug {
log.Printf("Cleanup process completed. %d files deleted. Total space freed: %.2f MB", deletedFiles, float64(totalFreedSpace)/(1024*1024))
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PriorityBasedCleanup function is complex and handles multiple responsibilities. Consider refactoring to separate concerns, such as disk usage checking, file sorting, and file deletion into separate functions for better maintainability.

Comment on lines +67 to +80
// GetDiskUsage returns the disk usage percentage
func GetDiskUsage(baseDir string) (float64, error) {
var stat syscall.Statfs_t
err := syscall.Statfs(baseDir, &stat)
if err != nil {
return 0, err
}

total := stat.Blocks * uint64(stat.Bsize)
free := stat.Bfree * uint64(stat.Bsize)
used := total - free

return (float64(used) / float64(total)) * 100, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The GetDiskUsage function in internal/diskmanager/priority.go uses syscall.Statfs, which is platform-dependent. The search results indicate that there are platform-specific conditions in other parts of the codebase, but none directly related to the GetDiskUsage function.

  • internal/diskmanager/priority.go: Uses syscall.Statfs without platform-specific conditions.
  • Other files (e.g., internal/myaudio/capture.go, internal/httpcontroller/utils.go, internal/conf/utils.go) contain platform-specific conditions using runtime.GOOS.

To ensure cross-platform compatibility, consider adding platform-specific conditions or using a cross-platform library for disk usage.

Analysis chain

The GetDiskUsage function uses syscall.Statfs, which is platform-dependent. Ensure compatibility across different operating systems or consider using a cross-platform library.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check for platform-specific code usage.
ast-grep --lang go --pattern $'syscall.Statfs($_, $_)'

Length of output: 132



Script:

#!/bin/bash
# Description: Check for platform-specific conditions or alternative implementations in the codebase.
rg --pcre2 '(//\s*go:build|//\s*\+build|runtime.GOOS|runtime.GOARCH|syscall.Statfs)'

Length of output: 807

)

// AgeBasedCleanup removes clips from the filesystem based on their age and the number of clips per species.
// TODO: handle quit channel properly if it happens during cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Handle quit channel properly if it happens during cleanup.

Would you like assistance in implementing this feature? I can help draft the necessary changes.

Comment on lines +57 to +61
Enabled bool // true to enable audio clip retention
Mode string // retention mode, "age" or "priority"
DiskUsageLimit string // maximum disk usage percentage before retention
MinEvictionHours int // minimum number of hours to keep audio clips
MinClipsPerSpecies int // minimum number of clips per species to keep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The validateSettings function in internal/conf/validate.go currently does not include validation logic for the new fields (Enabled, Mode, DiskUsageLimit, MinEvictionHours, MinClipsPerSpecies).

  • Location to update: internal/conf/validate.go within the validateSettings function.

Please ensure that the new fields are validated to prevent misconfigurations.

Analysis chain

The new fields in the Settings struct for retention settings are clear and well-documented. Ensure that these settings are validated in the validateSettings function to prevent misconfigurations.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify validation logic for new settings.
ast-grep --lang go --pattern $'validateSettings($_)'

Length of output: 196



Script:

#!/bin/bash
# Description: Find and inspect the implementation of the validateSettings function.
ast-grep --lang go --pattern $'func validateSettings($_) {
  $$$
}'

Length of output: 68



Script:

#!/bin/bash
# Description: Search for the definition of the validateSettings function.
rg 'func validateSettings' --context 10

Length of output: 856

log.Printf("Failed to delete clip path for %s: %s\n", clip.ID, err)
}
}
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return statement inside the loop causes the function to exit prematurely after processing the first clip. Move it outside the loop.

- return err
+ // Move this line outside the loop

Committable suggestion was skipped due low confidence.

Comment on lines +183 to +193
// ParsePercentage converts a percentage string (e.g., "80%") to a float64
func ParsePercentage(percentage string) (float64, error) {
if strings.HasSuffix(percentage, "%") {
value, err := strconv.ParseFloat(strings.TrimSuffix(percentage, "%"), 64)
if err != nil {
return 0, err
}
return value, nil
}
return 0, errors.New("invalid percentage format")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of ParsePercentage is correct and robust, handling different scenarios and errors effectively.

Consider adding unit tests for ParsePercentage to cover various input scenarios. Would you like help with this?

@tphakala tphakala merged commit beb3709 into main May 26, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant