Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebuild reaper functionality in thrall AND remove old reaper lambda #4145

Merged
merged 11 commits into from
Oct 6, 2023

Conversation

twrichards
Copy link
Contributor

@twrichards twrichards commented Sep 20, 2023

https://trello.com/c/DrGAH8Y0/893-turn-on-the-reaper

Once upon a time, there was a process (in the form of a lambda) called 'the reaper' which deleted images (on a regular schedule) accordingly to a list of criteria, but was turned off out of caution after a significant chunk of images were permanently lost some years ago. This PR rebuilds 'the reaper', this time all within thrall.

Pre-requisite PRs:

What's changed

  • Delete the old 'reaper' lambda (and any traces from CI scripts etc.)
  • Add new optional config property to ThrallConfig (s3.reaper.bucket in thrall.conf) to specify the bucket name where the permanent records of what was soft & hard deleted via the reaper will be stored (see https://github.com/guardian/editorial-tools-platform/pull/706 for Guardian) - defining this property is required for the reaper to operate
  • Two new endpoints to thrall both taking count query param (for the batch size, max 1000) ...
    • doBatchSoftReap which 'soft deletes' (with deletedBy being reaper) the oldest batch of is:reapable images which are not already-soft deleted
    • doBatchHardReap which 'hard deletes' the oldest batch of is:reapable images which have been in 'soft deleted' state for at least two weeks
  • The new ReaperController which defines contains the above endpoints also has a 'schedule' (every 15mins) which [IF the s3.reaper.bucket config property is defined, otherwise doesn't run]...
    • queries the number of images uploaded in the last 7 days, then divides that to get the number of images ingested per 15mins
    • calls the doBatchSoftReap and doBatchHardReap with the count as number of images ingested per 15mins - this ensures we delete at same rate we ingest for a given environment (at the Guardian, our TEST environment ingests roughly 1% of what PROD ingests)
  • we report the counts of images soft and hard reaped via new CloudWatch metrics
    image
  • The new reaper process can be 'paused' by the presence of a file named PAUSED at the root of the new bucket s3.reaper.bucket. This is checked on each execution of the schedule, and exists early (with log message) if paused.
  • Lastly ReaperController provides a couple more endpoints
    • POST endpoint for pausing (creating that PAUSED file as described above)
    • POST endpoint for resuming from paused state (deleting that PAUSED file as described above)
    • GET endpoint for reading a record file from the bucket
    • finally, an HTML view of
      • whether Reaper is paused or not (with a button to flip that using the above)
      • a list of the last day's worth of record files
        image

NOTE: we have the endpoints exposed (in addition to being called by the schedule) so that they can be manually called to for example clear a backlog if the reaper hasn't been running for whatever reason (either historically or because it was paused using the functionality above)

@twrichards twrichards force-pushed the reaper-in-thrall branch 2 times, most recently from 32e6b2d to 750cb58 Compare September 20, 2023 12:05
@twrichards twrichards changed the title [reaper] add doBatchSoftReap and doBatchHardReap endpoints to thrall AND remove old reaper lambda rebuild reaper functionality in thrall AND remove old reaper lambda Sep 25, 2023
@twrichards twrichards force-pushed the reaper-in-thrall branch 11 times, most recently from 1b10d23 to 71509eb Compare September 27, 2023 09:10
@twrichards twrichards marked this pull request as ready for review September 27, 2023 11:00
@twrichards twrichards requested a review from a team as a code owner September 27, 2023 11:00
thrall/app/controllers/ReaperController.scala Outdated Show resolved Hide resolved
thrall/app/controllers/ReaperController.scala Outdated Show resolved Hide resolved
thrall/app/controllers/ReaperController.scala Outdated Show resolved Hide resolved
def doBatchSoftReap(count: Int): Action[AnyContent] = batchDeleteWrapper(count)(doBatchSoftReap)

def doBatchSoftReap(count: Int, deletedBy: String, isReapable: ReapableEligibility): Future[JsValue] = persistedBatchDeleteOperation("soft"){

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional for this PR, but I'd suggest building up a logMarker with the request ID (example https://github.com/guardian/grid/blob/main/cropper/app/controllers/CropperController.scala#L55) and passing it around and feeding it into the log statements; may make life easier tracking what happened in a given request. (in normal operation there won't be much ambiguity because calls are 15 mins apart, but it would allow you to quickly filter away any other thrall logs, or useful if we make a bunch of calls quickly to manually clear a backlog)

@twrichards twrichards force-pushed the reaper-in-thrall branch 3 times, most recently from 04f354a to 04565d0 Compare October 4, 2023 15:26
@prout-bot
Copy link

Seen on auth, usage, image-loader, metadata-editor, thrall, leases, cropper, collections, media-api, kahuna (merged by @twrichards 13 minutes and 38 seconds ago) Please check your changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants