Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rk3328: suspend to ram causes immediate panic on ddr4 #11

Open
pgwipeout opened this issue Jul 4, 2024 · 9 comments
Open

rk3328: suspend to ram causes immediate panic on ddr4 #11

pgwipeout opened this issue Jul 4, 2024 · 9 comments

Comments

@pgwipeout
Copy link

When suspending to ram on the rk3328 running ddr4 ram (roc-cc-rk3328) the board immediate reboots with no error produced. This issue does not occur on the rk3328 running ddr3 (rock64).

root@firefly:~# sync
root@firefly:~# systemctl suspend
root@firefly:~#
U-Boot TPL 2024.07-00614-g0f073e022ddc (Jul 04 2024 - 12:25:59)
DDR4, 333MHz
BW=32 Col=10 Bk=4 BG=2 CS0 Row=16 CS1 Row=16 CS=2 Die BW=16 Size=4096MB
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2024.07-00614-g0f073e022ddc (Jul 04 2024 - 12:25:59 +0000)
Trying to boot from MMC1
## Checking hash(es) for config config-1 ... OK
## Checking hash(es) for Image atf-1 ... sha256+ OK
## Checking hash(es) for Image u-boot ... sha256+ OK
## Checking hash(es) for Image fdt-1 ... sha256+ OK
## Checking hash(es) for Image atf-2 ... sha256+ OK
NOTICE:  BL31: v2.11.0(release):v2.10.0-1218-g0dc0fda71
NOTICE:  BL31: Built : 12:11:30, Jul  4 2024
NOTICE:  BL31:Rockchip release version: v1.2


U-Boot 2024.07-00614-g0f073e022ddc (Jul 04 2024 - 12:25:59 +0000)

Model: Firefly roc-rk3328-cc
DRAM:  4 GiB
PMIC:  RK805 (on=0x40, off=0x00)
Core:  241 devices, 29 uclasses, devicetree: separate
MMC:   mmc@ff500000: 1, mmc@ff520000: 0
Loading Environment from MMC... Reading from MMC(0)... *** Warning - bad CRC, using default environment
@odeprez
Copy link
Contributor

odeprez commented Aug 26, 2024

Cc @jwerner-chromium
Pinged other RK platform maintainers by email.

@jwerner-chromium
Copy link
Contributor

Sorry, I'm really just on this maintainer list for rk3399 and don't know anything about the other SoCs. I should maybe remove myself now that most development happens elsewhere.

@odeprez
Copy link
Contributor

odeprez commented Nov 25, 2024

Hi @pgwipeout just curious if this is still an issue at your end? As there was no clear answer from community is there any chance you could investigate and push a change for resolving?

@pgwipeout
Copy link
Author

Unfortunately it is still very much an issue, and only one that can be resolved by either Rockchip or a third party with access to their internal restricted documentation.

I suspect it's the same problem that rk3399 experienced with the ddr4 controller, but the memory layout is different between rk3328 and rk3399 so any possible fix cannot be ported directly over.

Is Rockchip aware of this method of reporting bugs yet?

@odeprez
Copy link
Contributor

odeprez commented Nov 25, 2024

Hi I asked maintainers https://trustedfirmware-a.readthedocs.io/en/latest/about/maintainers.html#rockchip-platform-port on July 9th 2024 and pinged again on Aug 26th but got no answer. I can ping again but I'm unlikely to do better than this!

@pgwipeout
Copy link
Author

Please add [email protected], [email protected], [email protected], and [email protected]. The three rock-chip's emails are current contributors to TF-A and Jimmy Brisson authored the rk3399 fix. Also, please add me, [email protected].

@pgwipeout
Copy link
Author

Kever Yang [email protected] Wed, Nov 27, 2024 at 9:22 PM
To: Olivier Deprez [email protected], Tony Xie [email protected], Heiko Stuebner [email protected]
Cc: "[email protected]" [email protected], "[email protected]" [email protected], "[email protected]" [email protected]

Thanks for your report.

From the output log, we know that the TF-A works fine for init flow, but somehow reset/reboot happen, an no more info.

The code should work at the first, so it may not need any "internal restricted documentation" for this issue.

Since you have both kernel source and TF-A source, could you help to add some more debug info so that we can narrow down the issue so that our engineer can really help.

  • Could you help to add log in TF-A for psci suspend entry, and to rk3568 cpu power down. In this way we should able to target which step trigger the system reset/reboot;

  • Try to use legacy version of the TF-A, eg. bisect the commit in plat/rockchip/rk3328 directory, we may able to find out which commit cause the failure.

As a engineer, we know that if we want to debug the issue, we have to reproduce it first, since rockchip engineers are mostly working on vendor U-Boot/kernel/TF-A/System, and rk3328 is a platform long time ago, so it's not so easy to setup the same debug environment as you have, so it will be great help if you can help to narrow down the issue first.

Thanks,

  • Kever

@pgwipeout
Copy link
Author

Peter Geis [email protected] Sun, Dec 1, 2024 at 5:37 PM
To: Kever Yang [email protected]
Cc: Olivier Deprez [email protected], Tony Xie [email protected], Heiko Stuebner [email protected], "[email protected]" [email protected], "[email protected]" [email protected]

Good Evening Kever,

I attempted to dig into this when I originally submitted my bug report and built a debug version of TF-A. From the stack trace at the end of this message, x30 points to the following:
(gdb) list *0x0000000000055220
0x55220 is in rockchip_soc_sys_pwr_dm_suspend (plat/rockchip/rk3328/drivers/pmu/pmu.c:253).
248
249 static void clks_gating_suspend(uint32_t *ungt_msk)
250 {
251 int i;
252
253 for (i = 0; i < CRU_CLKGATE_NUMS; i++) {
254 ddr_data.clk_ungt_save[i] =
255 mmio_read_32(CRU_BASE + CRU_CLKGATE_CON(i));
256 mmio_write_32(CRU_BASE + CRU_CLKGATE_CON(i),
257 ((~ungt_msk[i]) << 16) | 0xffff);

Going back to the last version that had a meaningful change to rk3328 pmu.c (v1.4 prior to commit 6bf0e07) results in a failure to boot.

U-Boot TPL 2024.07-00614-g0f073e022ddc (Dec 01 2024 - 22:26:28)
DDR4, 333MHz
BW=32 Col=10 Bk=4 BG=2 CS0 Row=16 CS1 Row=16 CS=2 Die BW=16 Size=4096MB
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2024.07-00614-g0f073e022ddc (Dec 01 2024 - 22:26:28 +0000)
Trying to boot from MMC1
## Checking hash(es) for config config-1 ... OK

Very Respectfully,
Peter Geis

root@firefly:~# systemctl suspend
root@firefly:~#
root@firefly:~# [1030799.293551] PM: suspend entry (deep)
[1030799.367413] Filesystems sync: 0.073 seconds
[1030799.370991] Freezing user space processes
[1030804.445411] Freezing user space processes completed (elapsed 5.073 seconds)
[1030804.446126] OOM killer disabled.
[1030804.446442] Freezing remaining freezable tasks
[1030804.453274] Freezing remaining freezable tasks completed (elapsed
0.006 seconds)
[1030804.454021] printk: Suspending console(s) (use no_console_suspend to debug)
Unhandled Exception in EL3.
x30            = 0x0000000000055220
x0             = 0x000000000005b8b0
x1             = 0x0000000000000002
x2             = 0x0000000000000014
x3             = 0x00000000ff440000
x4             = 0x0000000000000000
x5             = 0x00000000ff440198
x6             = 0x00000000ff440148
x7             = 0x00000000ff440160
x8             = 0x00000000ff440150
x9             = 0x00000000ff440104
x10            = 0x00000000ff440100
x11            = 0x0000000000000002
x12            = 0x0000000000000000
x13            = 0xffff724481aadc80
x14            = 0x0000000000056220
x15            = 0x00000000000557b4
x16            = 0x00000000ff09020c
x17            = 0x0000000000000040
x18            = 0x00000000000569f0
x19            = 0x0000000000000002
x20            = 0x000000000005b8b0
x21            = 0x000000000005d000
x22            = 0x0000000000000000
x23            = 0x0000000000000001
x24            = 0x0000000000000002
x25            = 0x0000000000000000
x26            = 0x000000000005b8b8
x27            = 0x0000000000059b10
x28            = 0xffff724481aadc80
x29            = 0x000000000005b7f0
scr_el3        = 0x0000000000000739
sctlr_el3      = 0x0000000000cd383b
cptr_el3       = 0x0000000000000000
tcr_el3        = 0x0000000080803520
daif           = 0x00000000000003c0
mair_el3       = 0x00000000004404ff
spsr_el3       = 0x00000000600002cc
elr_el3        = 0x00000000ff09020c
ttbr0_el3      = 0x000000000005d340
esr_el3        = 0x000000008600000f
far_el3        = 0x00000000ff09020c
spsr_el1       = 0x0000000080000005
elr_el1        = 0xffffac41cc1b7764
spsr_abt       = 0x0000000000000000
spsr_und       = 0x0000000000000000
spsr_irq       = 0x0000000000000000
spsr_fiq       = 0x0000000000000000
sctlr_el1      = 0x0000000034d4d91d
actlr_el1      = 0x0000000000000000
cpacr_el1      = 0x0000000000300000
csselr_el1     = 0x0000000000000000
sp_el1         = 0xffff800087e3ba10
esr_el1        = 0x0000000000000000
ttbr0_el1      = 0x00000000041a3000
ttbr1_el1      = 0x00000000041a4000
mair_el1       = 0x000000040044ffff
amair_el1      = 0x0000000000000000
tcr_el1        = 0x00000072b5d03590
tpidr_el1      = 0xffffc603b06b4000
tpidr_el0      = 0x0000ffff8729fe90
tpidrro_el0    = 0x0000000000000000
par_el1        = 0x0000000000000000
mpidr_el1      = 0x0000000080000000
afsr0_el1      = 0x0000000000000000
afsr1_el1      = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1       = 0xffffac41cbe10800
cntp_ctl_el0   = 0x0000000000000000
cntp_cval_el0  = 0x00001680220ccd49
cntv_ctl_el0   = 0x0000000000000000
cntv_cval_el0  = 0x049180e803100805
cntkctl_el1    = 0x00000000000000a6
sp_el0         = 0x000000000005b7f0
isr_el1        = 0x0000000000000000
dacr32_el2     = 0x0000000000000000
ifsr32_el2     = 0x0000000000000000
cpuectlr_el1   = 0x0000000000000000
cpumerrsr_el1  = 0x0000000000000000
l2merrsr_el1   = 0x0000000000000000
cpuactlr_el1   = 0x00001000090ca000
gicc_hppir     = 0x00000000000003ff
gicc_ahppir    = 0x00000000000003ff
gicc_ctlr      = 0x00000000000005e8
gicd_ispendr regs (Offsets 0x200 - 0x278)
0000000000000200:               0x0000000000000000
0000000000000208:               0x0000000000000000
0000000000000210:               0x0000000000000000
0000000000000218:               0x0000000000000000
0000000000000220:               0x0000000000000000
0000000000000228:               0x0000000000000000
0000000000000230:               0x0000000000000000
0000000000000238:               0x0000000000000000
0000000000000240:               0x0000000000000000
0000000000000248:               0x0000000000000000
0000000000000250:               0x0000000000000000
0000000000000258:               0x0000000000000000
0000000000000260:               0x0000000000000000
0000000000000268:               0x0000000000000000
0000000000000270:               0x0000000000000000
0000000000000278:               0x0000000000000000

U-Boot TPL 2024.07-00614-g0f073e022ddc (Jul 08 2024 - 14:19:30)
DDR4, 333MHz
BW=32 Col=10 Bk=4 BG=2 CS0 Row=16 CS1 Row=16 CS=2 Die BW=16 Size=4096MB
Trying to boot from BOOTROM
Returning to boot ROM...

U-Boot SPL 2024.07-00614-g0f073e022ddc (Jul 08 2024 - 14:19:30 +0000)
Trying to boot from MMC1
## Checking hash(es) for config config-1 ... OK
## Checking hash(es) for Image atf-1 ... sha256+ OK
## Checking hash(es) for Image u-boot ... sha256+ OK
## Checking hash(es) for Image fdt-1 ... sha256+ OK
## Checking hash(es) for Image atf-2 ... sha256+ OK
NOTICE:  BL31: v2.11.0(debug):v2.10.0-1218-g0dc0fda71-dirty
NOTICE:  BL31: Built : 14:19:23, Jul  8 2024
NOTICE:  BL31:Rockchip release version: v1.2
NOTICE:  BL31: Debug level: 40
INFO:    ARM GICv2 driver initialized
INFO:    plat_rockchip_pmu_init: pd status 0xe
INFO:    BL31: Initializing runtime services
INFO:    BL31: cortex_a53: CPU workaround for erratum 855873 was applied
INFO:    BL31: cortex_a53: CPU workaround for erratum 1530924 was applied
INFO:    BL31: Preparing for EL3 exit to normal world
INFO:    Entry point address = 0x200000
INFO:    SPSR = 0x3c9


U-Boot 2024.07-00614-g0f073e022ddc (Jul 08 2024 - 14:19:30 +0000)

Model: Firefly roc-rk3328-cc
DRAM:  4 GiB
PMIC:  RK805 (on=0x40, off=0x00)
Core:  241 devices, 29 uclasses, devicetree: separate
MMC:   mmc@ff500000: 1, mmc@ff520000: 0
Loading Environment from MMC... Reading from MMC(0)... *** Warning -
bad CRC, using default environment

@pgwipeout
Copy link
Author

Peter Geis [email protected] Tue, Dec 3, 2024 at 6:21 PM
To: Kever Yang [email protected]
Cc: Olivier Deprez [email protected], Tony Xie [email protected], Heiko Stuebner [email protected], "[email protected]" [email protected], "[email protected]" [email protected]
Good Evening,

I recommend adding console_set_scope(&console, CONSOLE_FLAG_BOOT | CONSOLE_FLAG_RUNTIME | CONSOLE_FLAG_CRASH); to plat/rockchip/common/bl31_plat_setup.c, so the debug console works correctly. Without this only the crash context is functional.

It would seem the issue is occurring after the transition to the sram_suspend function. I haven't found a safe way to print debug
messages from the sram functions yet to find out where it's dying, but it seems the stack trace is unable to track what happens there (it picks a random function as the crash point). I would appreciate some insight on how to debug this trace further.

I'm pretty certain rk3328 sleep has never worked on DDR4, this issue started years ago when opportunistic sleep was enabled for arm64 boards. (Around 2019, when I retired the board because I didn't have the time to debug it).

In an attempt to rule out configuration or hardware, I switched to using the rkbin bl31 image. It also immediately reboots, but it throws an interrupt before doing so, suggesting it is reaching suspend and immediately waking up and the wakeup is what is failing. The interrupt (interrupt 97) makes little sense, as that points to sdmmc_dectn_in_flt according to the TRM. I found GRF_SIG_DETECT_CON, which is fully disabled, but there was a pending sdmmc_detectn_neg_irq in GRF_SIG_DETECT_STATUS. I cleared the interrupt and verified it remained cleared, but interrupt 97 still fired again when attempting to sleep. I'm not certain if I'm chasing the correct register for this interrupt. See the below for the rkbin sleep attempt.

Very Respectfully,
Peter Geis

root@firefly:~# [ 1762.024753] rk_gmac-dwmac ff540000.ethernet eth0:
Link is Down
[ 1762.148126] PM: suspend entry (deep)
[ 1762.525416] Filesystems sync: 0.376 seconds
[ 1762.529212] Freezing user space processes
[ 1762.532011] Freezing user space processes completed (elapsed 0.002 seconds)
[ 1762.532776] OOM killer disabled.
[ 1762.533084] Freezing remaining freezable tasks
[ 1762.535075] Freezing remaining freezable tasks completed (elapsed
0.001 seconds)
[ 1762.691810] Disabling non-boot CPUs ...
[ 1762.694835] psci: CPU3 killed (polled 0 ms)
[ 1762.700497] psci: CPU2 killed (polled 0 ms)
[ 1762.706958] psci: CPU1 killed (polled 0 ms)

GPIO0_INTEN: 0xffffffff
GPIO1_INTEN: 0xffffffff
GPIO2_INTEN: 0xffffffff
GPIO3_INTEN: 0xffffffff
 IRQ: 97
012345
U-Boot TPL 2024.07-00614-g0f073e022ddc (Dec 02 2024 - 23:03:38)
DDR4, 333MHz
BW=32 Col=10 Bk=4 BG=2 CS0 Row=16 CS1 Row=16 CS=2 Die BW=16 Size=4096MB
Trying to boot from BOOTROM
Returning to boot ROM...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants