-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: New "priority" cleanup policy #180
Conversation
WalkthroughThe recent changes enhance the audio clip retention system by introducing new cleanup modes based on age and priority. These modifications involve updating configuration settings, adding utility functions for error handling, and implementing age and priority-based cleanup capabilities in the Changes
Sequence Diagram(s) (Beta)sequenceDiagram
participant User
participant Config
participant ClipCleanupMonitor
participant DiskManager
User->>Config: Load Settings
Config->>ClipCleanupMonitor: Provide Settings
ClipCleanupMonitor->>DiskManager: Delegate Cleanup (based on mode)
DiskManager->>DiskManager: Perform Age-Based Cleanup
DiskManager->>DiskManager: Perform Priority-Based Cleanup
Tip New Features and ImprovementsReview SettingsIntroduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments. AST-based InstructionsCodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation. Community-driven AST-based RulesWe are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information. New Static Analysis ToolsWe are continually expanding our support for static analysis tools. We have added support for Tone SettingsUsers can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:
Revamped Settings PageWe have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to Miscellaneous
Tip Early Access Features
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Outside diff range and nitpick comments (6)
internal/diskmanager/util.go (1)
10-33
: The implementation ofWriteSortedFilesToFile
is robust with proper error handling and resource management.Consider adding more detailed logging before each major operation to aid in debugging and maintenance.
internal/conf/defaults.go (1)
41-45
: The new default settings for the retention policy are correctly implemented.Consider adding comments to explain each setting for better clarity and maintainability.
internal/conf/config.yaml (1)
40-44
: The YAML configuration for the new retention settings is correctly formatted and aligns with the implementation indefaults.go
.Consider adding comments in the YAML file to explain the purpose and expected values for each new setting.
internal/analysis/realtime.go (1)
160-160
: The integration ofClipCleanupMonitor
into the real-time analysis workflow is correctly implemented.Consider adding more detailed comments explaining the role of
ClipCleanupMonitor
in the context of the overall system.internal/diskmanager/priority.go (2)
19-23
: ThePolicy
struct is well-defined for cleanup policies. Consider documenting the behavior when bothAlwaysCleanupFirst
andNeverCleanup
conditions apply to the same species.
82-106
: TheGetAudioFiles
function is comprehensive, but the debug logging is commented out. If it's no longer needed, consider removing these lines to clean up the code.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- internal/analysis/realtime.go (4 hunks)
- internal/conf/config.go (1 hunks)
- internal/conf/config.yaml (1 hunks)
- internal/conf/defaults.go (1 hunks)
- internal/conf/utils.go (2 hunks)
- internal/diskmanager/age.go (1 hunks)
- internal/diskmanager/priority.go (1 hunks)
- internal/diskmanager/util.go (1 hunks)
Additional comments not posted (1)
internal/diskmanager/priority.go (1)
261-308
: ThesortFiles
function implements multiple sorting priorities. Ensure that the sorting logic aligns with the intended priority rules as described in the PR objectives.Verification successful
The
sortFiles
function implements the sorting priorities as follows:
- Oldest files first.
- Species with the most occurrences in the subdirectory.
- Confidence level.
- Default to oldest timestamp.
These priorities are correctly implemented in the
sort.Slice
function. The sorting logic aligns with the intended priority rules as described in the PR objectives.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify sorting logic matches PR objectives. ast-grep --lang go --pattern $'sort.Slice($_, $_)'Length of output: 2840
// LoadPolicy loads the cleanup policies from a CSV file | ||
func LoadPolicy(policyFile string) (*Policy, error) { | ||
file, err := os.Open(policyFile) | ||
if err != nil { | ||
return nil, err | ||
} | ||
defer file.Close() | ||
|
||
reader := csv.NewReader(file) | ||
records, err := reader.ReadAll() | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
policy := &Policy{ | ||
AlwaysCleanupFirst: make(map[string]bool), | ||
NeverCleanup: make(map[string]bool), | ||
} | ||
|
||
for _, record := range records { | ||
if len(record) != 2 { | ||
return nil, errors.New("invalid policy record") | ||
} | ||
if record[1] == "always" { | ||
policy.AlwaysCleanupFirst[record[0]] = true | ||
} else if record[1] == "never" { | ||
policy.NeverCleanup[record[0]] = true | ||
} | ||
} | ||
|
||
return policy, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoadPolicy
function correctly handles file operations and CSV parsing. However, consider adding more robust error handling for CSV format issues and documenting expected CSV format.
// parseFileInfo parses the file information from the file path and os.FileInfo | ||
func parseFileInfo(path string, info os.FileInfo) (FileInfo, error) { | ||
name := filepath.Base(info.Name()) | ||
parts := strings.Split(name, "_") | ||
if len(parts) < 3 { | ||
return FileInfo{}, errors.New("invalid file name format") | ||
} | ||
|
||
// The species name might contain underscores, so we need to handle the last two parts separately | ||
confidenceStr := parts[len(parts)-2] | ||
timestampStr := parts[len(parts)-1] | ||
species := strings.Join(parts[:len(parts)-2], "_") | ||
|
||
confidence, err := strconv.Atoi(strings.TrimSuffix(confidenceStr, "p")) | ||
if err != nil { | ||
return FileInfo{}, err | ||
} | ||
|
||
timestamp, err := time.Parse("20060102T150405Z", strings.TrimSuffix(timestampStr, ".wav")) | ||
if err != nil { | ||
return FileInfo{}, err | ||
} | ||
|
||
return FileInfo{ | ||
Path: path, | ||
Species: species, | ||
Confidence: confidence, | ||
Timestamp: timestamp, | ||
Size: info.Size(), | ||
}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parseFileInfo
function handles file name parsing and could fail on unexpected formats. Adding unit tests for this function would ensure robustness against various file naming conventions.
Would you like me to help by writing some unit tests for this function?
// Convert 80% string etc. to 80.0 float64 | ||
threshold, err := conf.ParsePercentage(thresholdStr) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// Only remove files with extensions in this list | ||
allowedExts := []string{".wav"} | ||
|
||
if debug { | ||
log.Printf("Starting cleanup process. Base directory: %s, Threshold: %.1f%%", baseDir, threshold) | ||
} | ||
|
||
// Get the current disk usage | ||
diskUsage, err := GetDiskUsage(baseDir) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
if debug { | ||
log.Printf("Current disk usage: %.1f%%", diskUsage) | ||
} | ||
|
||
// If disk usage is below the threshold, no cleanup is needed | ||
if diskUsage < threshold { | ||
if debug { | ||
log.Printf("Disk usage %.1f%% is below the %.1f%% threshold. No cleanup needed.", diskUsage, threshold) | ||
} | ||
return nil | ||
} else { | ||
if debug { | ||
log.Printf("Disk usage %.1f%% is above the %.1f%% threshold. Cleanup needed.", diskUsage, threshold) | ||
} | ||
} | ||
|
||
// Get the list of audio files | ||
files, err := GetAudioFiles(baseDir, allowedExts, debug) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// Sort files by the cleanup priority and get the initial count of files per species per subdirectory | ||
speciesMonthCount := sortFiles(files, debug) | ||
|
||
// Debug: write sorted files to a file | ||
if debug { | ||
if err := WriteSortedFilesToFile(files, "file_cleanup_order.txt"); err != nil { | ||
return err | ||
} | ||
} | ||
|
||
// Delete files until disk usage is below the threshold or 100 files have been deleted | ||
deletedFiles := 0 | ||
maxDeletions := 1000 | ||
totalFreedSpace := int64(0) | ||
for _, file := range files { | ||
select { | ||
case <-quitChan: | ||
if debug { | ||
log.Println("Received quit signal, exiting cleanup loop.") | ||
} | ||
return nil | ||
default: | ||
// Get the subdirectory name | ||
subDir := filepath.Dir(file.Path) | ||
if diskUsage < threshold || deletedFiles >= maxDeletions || speciesMonthCount[file.Species][subDir] <= minClipsPerSpecies { | ||
continue | ||
} | ||
|
||
if debug { | ||
log.Printf("Deleting file: %s", file.Path) | ||
} | ||
|
||
// Delete the file deemed for cleanup | ||
err = os.Remove(file.Path) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// Increment deleted files count and update species count | ||
deletedFiles++ | ||
speciesMonthCount[file.Species][subDir]-- | ||
|
||
// Add file size to total freed space | ||
totalFreedSpace += file.Size | ||
|
||
// Update disk usage | ||
diskUsage, err = GetDiskUsage(baseDir) | ||
if err != nil { | ||
return err | ||
} | ||
} | ||
} | ||
|
||
if debug { | ||
log.Printf("Cleanup process completed. %d files deleted. Total space freed: %.2f MB", deletedFiles, float64(totalFreedSpace)/(1024*1024)) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PriorityBasedCleanup
function is complex and handles multiple responsibilities. Consider refactoring to separate concerns, such as disk usage checking, file sorting, and file deletion into separate functions for better maintainability.
// GetDiskUsage returns the disk usage percentage | ||
func GetDiskUsage(baseDir string) (float64, error) { | ||
var stat syscall.Statfs_t | ||
err := syscall.Statfs(baseDir, &stat) | ||
if err != nil { | ||
return 0, err | ||
} | ||
|
||
total := stat.Blocks * uint64(stat.Bsize) | ||
free := stat.Bfree * uint64(stat.Bsize) | ||
used := total - free | ||
|
||
return (float64(used) / float64(total)) * 100, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The GetDiskUsage
function in internal/diskmanager/priority.go
uses syscall.Statfs
, which is platform-dependent. The search results indicate that there are platform-specific conditions in other parts of the codebase, but none directly related to the GetDiskUsage
function.
internal/diskmanager/priority.go
: Usessyscall.Statfs
without platform-specific conditions.- Other files (e.g.,
internal/myaudio/capture.go
,internal/httpcontroller/utils.go
,internal/conf/utils.go
) contain platform-specific conditions usingruntime.GOOS
.
To ensure cross-platform compatibility, consider adding platform-specific conditions or using a cross-platform library for disk usage.
Analysis chain
The GetDiskUsage
function uses syscall.Statfs
, which is platform-dependent. Ensure compatibility across different operating systems or consider using a cross-platform library.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for platform-specific code usage.
ast-grep --lang go --pattern $'syscall.Statfs($_, $_)'
Length of output: 132
Script:
#!/bin/bash
# Description: Check for platform-specific conditions or alternative implementations in the codebase.
rg --pcre2 '(//\s*go:build|//\s*\+build|runtime.GOOS|runtime.GOARCH|syscall.Statfs)'
Length of output: 807
) | ||
|
||
// AgeBasedCleanup removes clips from the filesystem based on their age and the number of clips per species. | ||
// TODO: handle quit channel properly if it happens during cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Handle quit channel properly if it happens during cleanup.
Would you like assistance in implementing this feature? I can help draft the necessary changes.
Enabled bool // true to enable audio clip retention | ||
Mode string // retention mode, "age" or "priority" | ||
DiskUsageLimit string // maximum disk usage percentage before retention | ||
MinEvictionHours int // minimum number of hours to keep audio clips | ||
MinClipsPerSpecies int // minimum number of clips per species to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The validateSettings
function in internal/conf/validate.go
currently does not include validation logic for the new fields (Enabled
, Mode
, DiskUsageLimit
, MinEvictionHours
, MinClipsPerSpecies
).
- Location to update:
internal/conf/validate.go
within thevalidateSettings
function.
Please ensure that the new fields are validated to prevent misconfigurations.
Analysis chain
The new fields in the Settings
struct for retention settings are clear and well-documented. Ensure that these settings are validated in the validateSettings
function to prevent misconfigurations.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify validation logic for new settings.
ast-grep --lang go --pattern $'validateSettings($_)'
Length of output: 196
Script:
#!/bin/bash
# Description: Find and inspect the implementation of the validateSettings function.
ast-grep --lang go --pattern $'func validateSettings($_) {
$$$
}'
Length of output: 68
Script:
#!/bin/bash
# Description: Search for the definition of the validateSettings function.
rg 'func validateSettings' --context 10
Length of output: 856
log.Printf("Failed to delete clip path for %s: %s\n", clip.ID, err) | ||
} | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return statement inside the loop causes the function to exit prematurely after processing the first clip. Move it outside the loop.
- return err
+ // Move this line outside the loop
Committable suggestion was skipped due low confidence.
// ParsePercentage converts a percentage string (e.g., "80%") to a float64 | ||
func ParsePercentage(percentage string) (float64, error) { | ||
if strings.HasSuffix(percentage, "%") { | ||
value, err := strconv.ParseFloat(strings.TrimSuffix(percentage, "%"), 64) | ||
if err != nil { | ||
return 0, err | ||
} | ||
return value, nil | ||
} | ||
return 0, errors.New("invalid percentage format") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of ParsePercentage
is correct and robust, handling different scenarios and errors effectively.
Consider adding unit tests for ParsePercentage
to cover various input scenarios. Would you like help with this?
The priority disk cleanup feature purges recordings based on several policy rules. This helps maintain disk usage within user-defined limits while retaining valuable recordings.
This retention policy will be enabled by default.
Configurable Threshold:
Sorting Policies:
Minimum Clips Per Species:
Controlled Deletion Process: