Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent retries #8

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Persistent retries #8

wants to merge 5 commits into from

Conversation

ConnorRigby
Copy link
Collaborator

No description provided.

this maintains backwards compatability via the `child_spec`
This enables fwupdates to resume even after a reboot
defstruct fwup_public_keys: [],
fwup_devpath: "/dev/mmcblk0",
handle_fwup_message: nil,
update_available: nil
update_available: nil,
journal_location: "/tmp/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
journal_location: "/tmp/"
journal_location: "/data/nerves_hub_link.partial"

I'm just making up the journal location so other names work for me too. What I'd like to suggest are:

  1. Putting it on a persistent filesystem so that it can survive a reboot
  2. Only having one file to make it easy to figure out where to find it and easy to clean up

I haven't gotten to it yet, but I'm guessing that the use of one file might have other ramifications. I do like the simplicity and limiting the number of files that it can create.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i only defaulted it to /tmp for the test suite since /data can't be written to on our host machines. I agree on keeping a single file, but i also wanted this to be a directory because there needs to be some way of identifying a partial download as a particular firmware after a reboot, which i was going to use the Filename for.
I also considered putting a header in the partial file, but am still thinking about that.

def save_chunk(%__MODULE__{fd: fd} = journal, data) when is_binary(data) do
hash = :crypto.hash(:sha256, data)
length = byte_size(data)
journal_entry = IO.iodata_to_binary([<<length::32>>, hash, data])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:file.write/2 takes iodata, so you can pass the list:

Suggested change
journal_entry = IO.iodata_to_binary([<<length::32>>, hash, data])
journal_entry = [<<length::32>>, hash, data]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, you're right, good catch

@type t :: %__MODULE__{
fd: :file.fd(),
content_length: non_neg_integer(),
chunks: [binary()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about firmware images that exceed available memory. I haven't read the code completely, but keeping a chunk list makes me think they're being cached.

How about a setup where the firmware download process starts by pulling chunks from the journal and feeding them to fwup. When it runs out of journal chunks, it makes an http request for the rest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i think something along those lines will be required as well. I haven't thought of exactly how yet, but i like your idea.

@fhunleth
Copy link
Contributor

I had a couple other thoughts on this:

  1. It should be possible to turn off. I'm not sure if the default is on or off, but some users may not have local storage for the updates or they may not want to have extra eMMC writes. E.g., Lichee Pi Zero...
  2. If fwup exits with an error while the cached file is being replayed back, that's pretty bad since it means that prior to the crash, fwup got farther with the same data. Given that the data is protected against disk corruption by SHAs, this really shouldn't happen. I'm thinking that it should send up a failure to NH and erase the cached data.
  3. If fwup exits while downloading firmware, the partially downloaded file should be erased. You might be doing this and I didn't read closely enough. The logic here is to prevent bytes that cause fwup to error out from being used more than once.

Btw, there's a current issue with nerves_hub_link that may make affect this work. The issue is that nerves_hub_link assumes that the currently running firmware is valid, but this isn't guaranteed. Fwup is set up to verify this and error out if you're not running validated firmware, but really nerves_hub_link shouldn't even try to apply a firmware update until it's valid. This race hits when a device is stepped through updates. I.e., it updates to firmware 1.0 and then immediately updates to 2.0. After the 1.0 update there's a race between nerves_hub_link to get the 2.0 download link and for some other software to mark the 1.0 firmware as valid. Based on how Nerves sets things up, the validation happens before nerves_hub_link starts, but that's not generally the case.

@@ -183,6 +183,11 @@ defmodule NervesHubLinkCommon.UpdateManager do

@spec maybe_update_firmware(UpdateAvailable.t(), State.t()) ::
State.download_started() | State.download_rescheduled() | State.t()
defp maybe_update_firmware(%UpdateAvailable{update_available: false}, %State{} = state) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also bail early if the :update_available key doesn't exist? Isn't that a possible case prior to your NH fix? Like would this affect people running NH servers and who aren't syncing with upstream regularly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants