Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the last created subset for matching compressed report #191

Merged
merged 2 commits into from
Oct 24, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 26 additions & 21 deletions app/models/report.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ class Report < ApplicationRecord
validates_attachment_file_name :attachment, matches: [/json\z/]

# has_one_attached :report
COMPRESSED_HASH_MESSAGE = { "code" => 69, "severity" => "warning", "message" => "report is compressed using gzip", "help-url" => "https://github.com/datacite/sashimi", "data" => "usage data needs to be uncompressed" }.freeze
COMPRESSED_HASH_MESSAGE = { "code" => 69, "severity" => "warning",
"message" => "report is compressed using gzip", "help-url" => "https://github.com/datacite/sashimi", "data" => "usage data needs to be uncompressed" }.freeze

# include validation methods for sushi
include Metadatable
Expand All @@ -19,9 +20,11 @@ class Report < ApplicationRecord
include Queueable

# attr_accessor :month, :year, :compressed
validates_presence_of :report_id, :created_by, :user_id, :created, :reporting_period
validates_presence_of :report_id, :created_by, :user_id, :created,
:reporting_period
validates_presence_of :report_datasets, if: :normal_report?
validates_format_of :created_by, with: /[-\._;()\/:a-zA-Z0-9\*~\$\=]+\z/, on: :create
validates_format_of :created_by, with: /[-._;()\/:a-zA-Z0-9*~$=]+\z/,
on: :create

# , :report_datasets
validates :uid, uniqueness: true
Expand All @@ -36,16 +39,15 @@ class Report < ApplicationRecord
# before_create :set_id
after_commit :push_report, unless: :resolution_report?


def clean_data
update_column("compressed", nil)
if !self.resolution_report?
if !resolution_report?
update_column("report_datasets", [])
end

update_column("release", self.release.downcase)
update_column("release", release.downcase)

report_subsets.each do | report_subset |
report_subsets.each do |report_subset|
report_subset.clean_data
end
end
Expand Down Expand Up @@ -121,34 +123,37 @@ def self.compressed_report?
end

def normal_report?
return nil if self.compressed_report? || self.resolution_report?
return nil if compressed_report? || resolution_report?

true
end

def compressed_report?
return nil if self.exceptions.nil? || self.exceptions.empty?
return nil if exceptions.nil? || exceptions.empty?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can achieve this check by using the Rails blank? method.
return nil if exceptions.nil? || exceptions.empty?
becomes
return nil if exceptions.blank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto format changed whitespace. I did not change that code.

The only thing I;ve changed is ASC to DESC

I don't want to change anything else in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardhallett understood


codes = exceptions.map { |exception| exception.fetch("code", "") }

codes.include?(69) && ((release.downcase == "rd1") || (release.downcase == "rd2"))
end

def resolution_report?
return nil if self.exceptions.nil? || self.exceptions.empty?
code = self.exceptions.first.fetch("code", "")
(code == 69) && (self.release.downcase == "drl")
return nil if exceptions.nil? || exceptions.empty?

code = exceptions.first.fetch("code", "")
(code == 69) && (release.downcase == "drl")
end

# Builds attachment from (rendered) content and saves it.
def save_as_attachment(content)
file_name = "#{self.uid}.json"
file_name = "#{uid}.json"

# Use Tempfile - so we can handle large amounts of data.
tmp = File.new("tmp/" + file_name, "w")
tmp << content
tmp.flush

self.attachment_file_name = file_name
self.update(:attachment => tmp)
update(attachment: tmp)

# Make sure the tmp file is deleted.
File.delete(tmp)
Expand All @@ -158,9 +163,9 @@ def save_as_attachment(content)
def load_attachment
if attachment.present?
begin
content = Paperclip.io_adapters.for(self.attachment).read
content = Paperclip.io_adapters.for(attachment).read
rescue StandardError
fail "The attachment for this report cannot be found: " + self.attachment.url.to_s
fail "The attachment for this report cannot be found: " + attachment.url.to_s
end
end
end
Expand All @@ -172,9 +177,9 @@ def load_attachment!
fail "[UsageReports] All reports should have an attachment."
end

content = load_attachment
content = load_attachment
attachment = AttachmentParser.new(content)
uuid = attachment.uuid
uuid = attachment.uuid

if uuid.blank?
fail "[UsageReports] Report-uid missing from attachment."
Expand All @@ -185,7 +190,7 @@ def load_attachment!

# set report.compressed from the attachment.
if compressed_report? || resolution_report?
report_subset = report_subsets.order("created_at ASC").first
report_subset = report_subsets.order("created_at DESC").first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is the only behavioral change, then I think this makes intuitive sense to me.

For report ab28a47e-4b66-4292-a81d-e48e8f880bb9:

Checksum in S3: de0cb0691a3d6f3019e0472ee255b013d9f70a6d0627a67bbdbbe04c3daedb19
Checksums in sashimi:

3.1.0 :001 > r = Report.where(uid:"ab28a47e-4b66-4292-a81d-e48e8f880bb9").first

 => 

#<Report:0x00007f02f5d8acd0

... 
3.1.0 :008 > r.report_subsets.each do |rs|

3.1.0 :009 >   pp rs.checksum

3.1.0 :010 > end

"de0cb0691a3d6f3019e0472ee255b013d9f70a6d0627a67bbdbbe04c3daedb19"

"e58407e2953d055e38b5611f15b55bc790698db5ccc2a0290798a819bc76c236"

"c178b302646ee2c8b77356e3dcb0b1563047bceb696fd81014383f2662c4e2ab"

"c078e954d5042ab7bd39a65d352e31e855a99bc01a0c7908c971eb4af4ee1263"

With ASC (non-matching checksum):

3.1.0 :012 > r.report_subsets.order("created_at ASC").first.checksum
 => "e58407e2953d055e38b5611f15b55bc790698db5ccc2a0290798a819bc76c236" 

With DESC (matching checksum):

3.1.0 :011 > r.report_subsets.order("created_at DESC").first.checksum
 => "de0cb0691a3d6f3019e0472ee255b013d9f70a6d0627a67bbdbbe04c3daedb19"

attachment_subset = attachment.search_subsets(checksum: report_subset.checksum)
fail "[UsageReports] cannot find gzip for a report-subset" if attachment_subset.blank?

Expand All @@ -200,7 +205,7 @@ def load_attachment!
self.report_datasets = attachment.datasets

# Loop over report_subsets setting report_subset gzip from the attachment.
self.report_subsets.each do | report_subset |
report_subsets.each do |report_subset|
attachment_subset = attachment.search_subsets(checksum: report_subset.checksum)

if attachment_subset.blank?
Expand All @@ -215,7 +220,7 @@ def load_attachment!
end

def destroy_attachment
if self.attachment.present?
if attachment.present?
self.attachment = nil
end
end
Expand Down
Loading