Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signing error trying to flash nvme drive on a orin nano 8gb devkit #1794

Closed
sgstreet opened this issue Jan 6, 2025 · 15 comments
Closed

Signing error trying to flash nvme drive on a orin nano 8gb devkit #1794

sgstreet opened this issue Jan 6, 2025 · 15 comments

Comments

@sgstreet
Copy link

sgstreet commented Jan 6, 2025

Describe the bug
I'm new to OE4T and I'm trying to setup a terga-demo-distro project running on an nvme0, I set and built demo-image-base using the jetson-orin-nano-devkit-nvme machine. When I try to use initrd-flash to write to the nvme device I get the following errors:

== Step 1: Signing binaries at 2025-01-05T18:45:40-08:00 ==
ERR: chip_info.bin_bak missing after dumping boardinfo
ERR: signing failed at 2025-01-05T18:45:42-08:00

I seem to be missing something but I'm unsure what?

To Reproduce
Steps to reproduce the behavior:

  1. Build meta-tegrademo branch 'master' with jetson-orin-nano-devkit-nvme
  2. Build with bitbake argument 'demo-image-base'
  3. Deploy to hardware with method 'initrd-flash'
  4. I get an error
 ./initrd-flash
Starting at 2025-01-05T18:45:40-08:00
Machine:       jetson-orin-nano-devkit-nvme
Rootfs device: nvme0n1p1
Found Jetson device in recovery mode at USB 1-5
== Step 1: Signing binaries at 2025-01-05T18:45:40-08:00 ==
ERR: chip_info.bin_bak missing after dumping boardinfo
ERR: signing failed at 2025-01-05T18:45:42-08:00

I'm seeing the following on the console when running `initrd-flash'

0013.718] E> BLOCK_DEV: Failed to open blockdev.
[0013.723] E> LOADER: Failed to open blockdev 0(0).
[0013.728] E> LOADER: Failed to get storage info for binary 21 from loader.
[0013.735] C> LOADER: Could not read binary 21.
[0013.739] E> Failed to load MB2
[0013.742] C> Task 0x46 failed (err: 0x27228311)
[0013.747] E> Top caller module: MB2_PARAMS, error module: LOADER, reason: 0x11, aux_info: 0x83
[0013.755] C> Boot Info Table status dump :
0111100000111000110111111111000000011110000000000000011000001
@dwalkes
Copy link
Member

dwalkes commented Jan 10, 2025

@sgstreet I attempted to reproduce today with my jetson-orin-nano-devkit-nvme setup including from reboot force-recovery as discussed in the monthly meeting and I couldn't reproduce.

I did notice this warning message I hadn't seen previously at step 1:

Rootfs device: nvme0n1p1
Found Jetson device in recovery mode at USB 1-2
== Step 1: Signing binaries at 2025-01-10T10:34:46-07:00 ==
Partition not found: A_cpu-bootloader

Full console log at:

Starting at 2025-01-10T10:34:46-07:00
Machine:       jetson-orin-nano-devkit-nvme
Rootfs device: nvme0n1p1
Found Jetson device in recovery mode at USB 1-2
== Step 1: Signing binaries at 2025-01-10T10:34:46-07:00 ==
Partition not found: A_cpu-bootloader
== Step 2: Boot Jetson via RCM at 2025-01-10T10:35:15-07:00 ==
Found Jetson device in recovery mode at USB 1-2
== Step 3: Sending flash sequence commands at 2025-01-10T10:35:19-07:00 ==
Waiting for USB storage device flashpkg from 054bb250........[/dev/sdc]
Device size in blocks: 262144
Unmounted /dev/sdc.
== Step 4: Writing partitions on external storage device at 2025-01-10T10:35:46-07:00 ==
Waiting for USB storage device nvme0n1 from 054bb250...[/dev/sdc]
Creating partitions
  [03] name=A_kernel start=0 size=262144 sectors
  [04] name=A_kernel-dtb start=0 size=1536 sectors
  [05] name=A_reserved_on_user start=0 size=64768 sectors
  [06] name=B_kernel start=0 size=262144 sectors
  [07] name=B_kernel-dtb start=0 size=1536 sectors
  [08] name=B_reserved_on_user start=0 size=64768 sectors
  [09] name=recovery start=0 size=163840 sectors
  [10] name=recovery-dtb start=0 size=1024 sectors
  [11] name=esp start=0 size=131072 sectors
  [12] name=recovery_alt start=0 size=163840 sectors
  [13] name=recovery-dtb_alt start=0 size=1024 sectors
  [14] name=esp_alt start=0 size=131072 sectors
  [15] name=UDA start=0 size=819200 sectors
  [16] name=reserved start=0 size=982016 sectors
  [01] name=APP start=0 size=29360128 sectors
  [02] name=APP_b start=0 size=29360128 sectors
Writing partitions
  Writing boot.img (size=41297920) to /dev/sdc3 (size=134217728)...
  Writing kernel_tegra234-p3768-0000+p3767-0005-nv.dtb (size=249497) to /dev/sdc4 (size=786432)...
  Writing boot.img (size=41297920) to /dev/sdc6 (size=134217728)...
  Writing kernel_tegra234-p3768-0000+p3767-0005-nv.dtb (size=249497) to /dev/sdc7 (size=786432)...
  Writing esp.img (size=67108864) to /dev/sdc11 (size=67108864)...
  Writing demo-image-base.ext4 (size=15032385536) to /dev/sdc1 (size=15032385536)...
  Writing demo-image-base.ext4 (size=15032385536) to /dev/sdc2 (size=15032385536)...
[OK: /dev/sdc]
== Step 5: Waiting for final status from device at 2025-01-10T10:37:18-07:00 ==

Here are the host and device logs for comparison
log.initrd-flash.2025-01-10-10.34.zip
device-logs-2025-01-10-10.34.46.tar.gz

Not sure what could be happening, but as discussed in the meeting I'd try another USB host controller if you have one as I know this has caused odd failures in the past.

If you want to try the same tegraflash file I built to verify your host flashing setup you can message me on element or via email and I'll send a link.

@kekiefer
Copy link
Contributor

@sgstreet I attempted to reproduce today with my jetson-orin-nano-devkit-nvme setup including from reboot force-recovery as discussed in the monthly meeting and I couldn't reproduce.

One thing to consider was that it wasn't clear if an A/B setup was used unintentionally via NVIDIA's tools.

A reboot forced-recovery from a B slot of an A/B setup induces the error due to a mismatch between in-memory scratch register status and expectations during recovery. Here's a link to a thread on the NVIDIA forums the includes a fairly thorough investigation: https://forums.developer.nvidia.com/t/mb1-bl-crash-when-rebooting-to-rcm-from-b-slot/309503/13

@dwalkes
Copy link
Member

dwalkes commented Jan 10, 2025

True and thanks @kekiefer however we should use A/B by default on tegra-demo-distro and this should match NVIDIA's setup. We should also boot to the A slot on first boot. However nvbootctrl dump-slots-info might be a good diagnostic to check as well. I verified I ended up on the A slot on first boot despite the warning during initrd-flash about cpu bootloader

root@jetson-orin-nano-devkit-nvme:~# nvbootctrl dump-slots-info
Current version: 36.4.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
root@jetson-orin-nano-devkit-nvme:~#

@kekiefer
Copy link
Contributor

That's not enough - reboot forced-recovery if you're starting on the A slot works fine. The problem happens when you run this from a B slot.

@dwalkes
Copy link
Member

dwalkes commented Jan 10, 2025

Yep understood, just don't understand how @sgstreet would have gotten into that situation without running a capsule update.

@kekiefer
Copy link
Contributor

kekiefer commented Jan 10, 2025

Ok yes, and I'm taking a logical leap in equating this issue with the one I've outlined, just because it manifests the same way.

To be clear, the connection I'm trying to make is that entering this flash was done from a B root, that was set up by NVIDIA's tools without @sgstreet being aware of it.

@sgstreet
Copy link
Author

@kekiefer @dwalkes, sorry off line this morning. I rebuilding from scratch to try and eliminate operator errors. Can anyone point me at the the means to reflash the qspi? I want to ensure it is correct.

@kekiefer
Copy link
Contributor

The initrd-flash script will take care of that for you

@sgstreet
Copy link
Author

After some gnashing of teeth (no bootloader due to corruption by the operator - me), some random but unfounded concerns I let the magic smoke out of my board. I successful used initrd-flash to flash both the QSPI and a SD card with positive results. A working system!

The reported signing error is caused by, hold your breath, a incompatible SS USB3 port. Using an USB2 port works better. I'm sorry the newbie run around! Thank you for the hand holding!

Next up, flashing the NVME image.

@sgstreet
Copy link
Author

Well I guess I lied. I successfully used ./doflash.sh not initrd-flash. I seeing some issue with initrd-flash logs. An assessment later this afternoon.

@dwalkes
Copy link
Member

dwalkes commented Jan 11, 2025

In the meantime I've started a troubleshooting page at https://github.com/OE4T/meta-tegra/wiki/Tegraflash-Troubleshooting as discussed in the meeting this week, attempting to list the suggested troubleshooting steps in rough priority order. Whatever we learn here might be a new entry in the list.

@sgstreet
Copy link
Author

Closing this an operator error.

@dwalkes
Copy link
Member

dwalkes commented Jan 13, 2025

Thanks @sgstreet what was the issue? Any updates we should make to https://github.com/OE4T/meta-tegra/wiki/Tegraflash-Troubleshooting or the other wiki pages?

@sgstreet
Copy link
Author

Thanks @sgstreet what was the issue? Any updates we should make to https://github.com/OE4T/meta-tegra/wiki/Tegraflash-Troubleshooting or the other wiki pages?

The issue was that all of my motherboard SS USB3 ports are not compatible with the tegra234 boot ROM USB stack. For clarity, the USB ports are all on my motherboard and not an external USB3 card. I tried both the CPU and chipset ports.

Any updates we should make to https://github.com/OE4T/meta-tegra/wiki/Tegraflash-Troubleshooting or the other wiki pages?

I suspect there will be more debug steps when I get the nvme flashing to work. Which I currently believe is another operator error. That's why I moved to gitter. Do you want this on the github discussion instead? I'm super flexible and need pointing the correct direction.

@dwalkes
Copy link
Member

dwalkes commented Jan 13, 2025

That's why I moved to gitter. Do you want this on the github discussion instead?

gitter is fine, if you can circle back to help update this one with the resolution when we have it and after I forget that would be great ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants