Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DietPi first install fails on ODROID-C4 (armbian-firmware fails unpack) #7261

Closed
1 task done
doqfgc opened this issue Oct 29, 2024 · 27 comments
Closed
1 task done

DietPi first install fails on ODROID-C4 (armbian-firmware fails unpack) #7261

doqfgc opened this issue Oct 29, 2024 · 27 comments

Comments

@doqfgc
Copy link

doqfgc commented Oct 29, 2024

Creating a bug report/issue

  • I have searched the existing open and closed issues

Required Information

  • DietPi version | 9.7.1
  • Distro version | bookworm
  • Kernel version | Linux DietPi 6.6.37-current-meson64 #1 SMP PREEMPT Fri Jul 5 07:34:07 UTC 2024 aarch64 GNU/Linux
  • SBC model | ODROID C4 or (EG: RPi3)
  • Power supply used | Standard ODROID 12V/2A supply
  • SD card used | SanDisk 64GB

Additional Information (if applicable)

  • Bug report ID | 27d13357-5da5-4915-b6a9-171f0ad3c831

Steps to reproduce

  1. From a fresh image, start the first install
  2. There is no step 2, the install fails on the very first apt upgrade.

Expected behaviour

  • First install should complete without package errors.

Actual behaviour

  • armbian-firmware package install fails with a segfault and unexpected end of file or stream no matter how many reimages or reinstalls
  • linux-image-current-meson64 also fails with lzma error: compressed data is corrupt
  • It is only these two packages no matter how many reimages or reinstalls.

Extra details

  • ...
@MichaIng
Copy link
Owner

When I manually download them in browser, it works. But possible that one of the Cloudflare cache entries is broken, hence I cleared them for those two files. Please try again.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

I'm still getting a fail, even from manually dropping packages in off-cache.

# dpkg -i ./armbian-firmware_24.11.0-trunk-dietpi1.deb 
(Reading database ... 18269 files and directories currently installed.)
Preparing to unpack .../armbian-firmware_24.11.0-trunk-dietpi1.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.8.0-trunk-dietpi2) ...
dpkg-deb (subprocess): decompressing archive './armbian-firmware_24.11.0-trunk-dietpi1.deb' (size=91604592) member 'data.tar': lzma error: compressed data is corrupt
dpkg-deb: error: <decompress> subprocess returned error exit status 2
dpkg: error processing archive ./armbian-firmware_24.11.0-trunk-dietpi1.deb (--install):
 cannot copy extracted data for './lib/firmware/brcm/brcmfmac4356-sdio-nanopi-m4v2.bin' to '/lib/firmware/brcm/brcmfmac4356-sdio-nanopi-m4v2.bin.dpkg-new': unexpected end of file or stream
Errors were encountered while processing:
 ./armbian-firmware_24.11.0-trunk-dietpi1.deb

Is it possible the package itself is broken?

@MichaIng
Copy link
Owner

It works fine here 🤔:

root@NanoPiR5S:~# cd /tmp
root@NanoPiR5S:/tmp# curl -o armbian-firmware.deb https://dietpi.com/apt/dists/all/odroidc4/binary-all/armbian-firmware_24.11.0-trunk-dietpi1.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 87.3M  100 87.3M    0     0  5163k      0  0:00:17  0:00:17 --:--:-- 5302k
2024-10-29 16:52:05 root@NanoPiR5S:/tmp# dpkg -i armbian-firmware.deb
(Reading database ... 19277 files and directories currently installed.)
Preparing to unpack armbian-firmware.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.11.0-trunk-dietpi1) ...
Setting up armbian-firmware (24.11.0-trunk-dietpi1) ...
root@NanoPiR5S:/tmp# apt download armbian-firmware
Get:1 https://dietpi.com/apt all/nanopir5s all armbian-firmware all 24.11.0-trunk-dietpi1 [91.6 MB]
Fetched 91.6 MB in 13s (6786 kB/s)
root@NanoPiR5S:/tmp# dpkg -i armbian-firmware_24.11.0-trunk-dietpi1_all.deb
(Reading database ... 19277 files and directories currently installed.)
Preparing to unpack armbian-firmware_24.11.0-trunk-dietpi1_all.deb ...
Unpacking armbian-firmware (24.11.0-trunk-dietpi1) over (24.11.0-trunk-dietpi1) ...
Setting up armbian-firmware (24.11.0-trunk-dietpi1) ...
root@NanoPiR5S:/tmp#

Different SBC but very same package (just symlinked server-side), and also using the same URL for Odroid C4 works.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

Okay.

I swapped SD cards with a brand new unused SanDisk Ultra (same as my other working ODROIDs) to make sure that wasn't the culprit and the same thing happened: fail on armbian-firmware, fail on linux-image-current-meson64.

Doing a manual dpkg-deb extract to a temp folder results in extraction failure in different places each time, so now I believe the culprit may be a bad board (I doubt it's "bad card" twice in a row).

Investigation ongoing.

@Joulinar
Copy link
Collaborator

or is there something on the network that could do package inspection? Like a firewall?

@MichaIng
Copy link
Owner

Yes seems more like a network/download issue than a storage issue, based on these unexpected EOF messages. How exactly did you download the package? If like I did to /tmp, then it is a tmpfs/RAM disk anyway, not related to the SD card ... thought the extraction of course is.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

The manual download was to a folder in /tmp, yes. The manual extraction also happened in a folder in /tmp.

I doubt it's firewall. I roll my own via OPNsense.

Just to check for sanity, I risked prod and did a manual extract from a fresh package download on one of my working ODROIDs and it completed without error. I also did dietpi-update on the same working ODROID and that also completed.

That rules out the package and the network, which leaves either hardware or the base image itself, but I doubt it's the base image.

@MichaIng
Copy link
Owner

Hmm, and wget/curl did not throw any error? Can you show metadata if the file, its size in particular?

Maybe it is some problem with the dpkg extractor then. But you said other packages upgraded fine?

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

Correct and correct. I'd even installed a package in the subshell (memtester) and that installed okay.

To be sure, I did an md5sum on both the independently obtained package and the package pulled from apt upgrade and they both matched with fcdebcaad0c70e0f0e2507ee4c337890.

To be surelysure, I had an old base image on hand (v8.23.3 on bookworm) and that also failed on initial setup on armbian-firmware. And also hard locked the system at linux-image-current-meson64.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

Update: I connected it to a monitor and got a bugcheck installing armbian-firmware. It's definitely hardware.

@MichaIng
Copy link
Owner

md5sum returns fcdebcaad0c70e0f0e2507ee4c337890 here as well. Or more precisely sha256sum matches 5ef8c82df7222b2a792f5f366f2d2dfda7b047c868b101a60334da4f8fd00531, which again matches the checksum in https://dietpi.com/apt/dists/all/odroidc4/binary-all/Packages. However, APT also checks this checksum, otherwise denies the download ... ah or denies extracting, as it should not have a way to get the checksum without downloading the file. However, it should abort before attempting to extract the archive (passing things to dpkg).

What do you mean with "bugcheck"?

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

A kernel error. I tried to reproduce it but it didn't happen again; now just back to segfaulting on unpacking these two packages.

Furthermore, attempting to reboot after the failed package installation results in an unbootable system.

@MichaIng
Copy link
Owner

So maybe then it is indeed an issue with the SD card? Maybe another package upgrade related to the dpkg unpacker or one of its libraries damaged it, so that it in turn failed to unpack any other archive. Kernel errors in the middle of linux-image-current-meson64 installation can surely break system boot as well :( .

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

I doubt it, especially as it's happened to two different SD cards; one of which was previously brand new unused. I know that SD cards aren't the most reliable storage medium but to have the same issue pop up with the same two packages on different cards is very suspicious.

I did get the bugcheck to show up again though. Apologies for camera-pointed-at-screen syndrome here, I really don't want to transcribe the entire terminal.

Bugcheck image

IMG_20241029_133256842

@MichaIng
Copy link
Owner

A kernel paging request failure. This can be either a kernel bug or RAM damage. I'll just test this kernel on Odroid N2+. Btw, as we had this on another device, does e.g. htop show the correct RAM size?

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

htop reports 3.69GB, same as on working ODROIDs.

@MichaIng
Copy link
Owner

I just tested the same kernel and firmware upgrade from same original versions on my Odroid N2+ 🤔.

There is a very similar report on a different SBC: #7257
So far I cannot imagine how it can be related with pretty different kernel, other than that it might be some bug in dpkg (or a particular version of it or such), but not sure whether buggy software can cause this kernel error, or whether the kernel or hardware must be buggy for it.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

I'm going to let memtester run on all the free memory (everything but the first 100MB or so) just to see if it's bad RAM.

I don't understand the relation either, especially as I had just updated another ODROID-C4 to 9.8.0 in #7261 (comment) from 9.7.1 and had no issue.

I also doubt it's a dpkg bug as I also tried with the old base image and unless there was no changes to dpkg in a year and a half it's either not that or an undiscovered year and a half old regression.

@MichaIng
Copy link
Owner

Jep so far I also don't see the relation, but didn't want to leave it unmentioned, since it is the very same two packages failing to unpack with the exact same errors.

@doqfgc
Copy link
Author

doqfgc commented Oct 29, 2024

memtester passed the memory, so I'm probably good to dismiss that.

That leaves SD or packager bug.

In the interim I updated a third ODROID-C4 to 9.8.0 and as a second witness, that board also updated fine and without issue.

@MichaIng
Copy link
Owner

Btw, do new images boost and update fine on those other C4 that updated fine, in case you have an option to test that?`

And since smaller packages seem to upgrade fine, does reinstalling dpkg help and in case even solve the issue?

apt install --reinstall dpkg

I checked all related library packages, and none was recently upgraded, apart of libc6 in August. But I cannot imaging that libc6 itself can somehow cause an error like this. Very weird, with the other report, and as of the identical errors I do not really believe it is a coincidence, but maybe a faulty package version combination or so.

@doqfgc
Copy link
Author

doqfgc commented Oct 31, 2024

I haven't tested fresh images on my working C4s (they're in prod and I can't bring them down at this time).

Reinstalling dpkg does not help.

I have even tried a different SD card brand (Kingston, same brand as my prod C4s) to no avail.

@MichaIng
Copy link
Owner

So weird. I'll keep the current C4 (an NanoPi NEO Plus2) around for testing, as one of us has them as well, and can test next week, but will otherwise move new images in place now. Of course they will work, as they have latest kernel and firmware already, but would be interesting to see whether they run into the same error, when next kernel upgrade is ready, or when reinstalling any of the two packages.

@doqfgc
Copy link
Author

doqfgc commented Oct 31, 2024

Okay, while I had a break in peak I pulled down the noncritical working C4 to test. Same SD, same environment, same base image.

It worked without issue. No errors. It just worked.

It's gotta be faulty board at this point. I'll move forward with an RMA with my supplier.

@MichaIng
Copy link
Owner

MichaIng commented Nov 2, 2024

Does not hurt to try. I am still checking with the other user with same error on NanoPi NEO Plus2. Like maybe there is a rare issue with the way our packages are packaged on the GitHub Actions Ubuntu runner or so: #7257 (comment)

But other than you, he did not face kernel errors so far. So still some chance that it is extreme coincidence you face the same errors for different reasons.

@doqfgc
Copy link
Author

doqfgc commented Nov 15, 2024

After a bit of back and forth with my supplier, it turns out there may be a shadow second revision of the ODROID-C4 and that would be the unit suffering from the unpacking bug.

I received what they described as an "older revision" of the board and it works fine without issue.

Thus, my personal issue is solved, but this leaves questions unanswered, such as why a possible revision to a board would cause this sort of catastrophic failure.

More investigation may be required.

@MichaIng
Copy link
Owner

I am still wondering about the concurrent identical case with that NanoPi NEO Plus2. I mean may really be all coincidence, but I am not 100% convinced. However, I am glad that your supplier sent you another revision which does not have the issue anymore.

I'll close this issue then, focusing on the other case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants