Skip to content

Latest commit

 

History

History
151 lines (108 loc) · 4.59 KB

2021-10-24.md

File metadata and controls

151 lines (108 loc) · 4.59 KB

< 2021-10-24 >

163,275 events, 107,315 push events, 139,801 commit messages, 7,731,572 characters

Sunday 2021-10-24 00:43:37 by Peter Zijlstra

sched/core: Fix ttwu() race

Paul reported rcutorture occasionally hitting a NULL deref:

sched_ttwu_pending() ttwu_do_wakeup() check_preempt_curr() := check_preempt_wakeup() find_matching_se() is_same_group() if (se->cfs_rq == pse->cfs_rq) <-- BOOM

Debugging showed that this only appears to happen when we take the new code-path from commit:

2ebb17717550 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

and only when @cpu == smp_processor_id(). Something which should not be possible, because p->on_cpu can only be true for remote tasks. Similarly, without the new code-path from commit:

c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

this would've unconditionally hit:

smp_cond_load_acquire(&p->on_cpu, !VAL);

and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this would result in an instant live-lock (with IRQs disabled), something that hasn't been reported.

The NULL deref can be explained however if the task_cpu(p) load at the beginning of try_to_wake_up() returns an old value, and this old value happens to be smp_processor_id(). Further assume that the p->on_cpu load accurately returns 1, it really is still running, just not here.

Then, when we enqueue the task locally, we can crash in exactly the observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq is from the wrong CPU, therefore we'll iterate into the non-existant parents and NULL deref.

The closest semi-plausible scenario I've managed to contrive is somewhat elaborate (then again, actual reproduction takes many CPU hours of rcutorture, so it can't be anything obvious):

				X->cpu = 1
				rq(1)->curr = X

CPU0				CPU1				CPU2

				// switch away from X
				LOCK rq(1)->lock
				smp_mb__after_spinlock
				dequeue_task(X)
				  X->on_rq = 9
				switch_to(Z)
				  X->on_cpu = 0
				UNLOCK rq(1)->lock

								// migrate X to cpu 0
								LOCK rq(1)->lock
								dequeue_task(X)
								set_task_cpu(X, 0)
								  X->cpu = 0
								UNLOCK rq(1)->lock

								LOCK rq(0)->lock
								enqueue_task(X)
								  X->on_rq = 1
								UNLOCK rq(0)->lock

// switch to X
LOCK rq(0)->lock
smp_mb__after_spinlock
switch_to(X)
  X->on_cpu = 1
UNLOCK rq(0)->lock

// X goes sleep
X->state = TASK_UNINTERRUPTIBLE
smp_mb();			// wake X
				ttwu()
				  LOCK X->pi_lock
				  smp_mb__after_spinlock

				  if (p->state)

				  cpu = X->cpu; // =? 1

				  smp_rmb()

// X calls schedule()
LOCK rq(0)->lock
smp_mb__after_spinlock
dequeue_task(X)
  X->on_rq = 0

				  if (p->on_rq)

				  smp_rmb();

				  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

				  smp_cond_load_acquire(&p->on_cpu, !VAL)

				  cpu = select_task_rq(X, X->wake_cpu, ...)
				  if (X->cpu != cpu)
switch_to(Y)
  X->on_cpu = 0
UNLOCK rq(0)->lock

However I'm having trouble convincing myself that's actually possible on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu observes ->state != RUNNING, it must also observe ->cpu != 1.

(Most of the previous ttwu() races were found on very large PowerPC)

Nevertheless, this fully explains the observed failure case.

Fix it by ordering the task_cpu(p) load after the p->on_cpu load, which is easy since nothing actually uses @cpu before this.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") Reported-by: Paul E. McKenney [email protected] Tested-by: Paul E. McKenney [email protected] Signed-off-by: Peter Zijlstra (Intel) [email protected] Signed-off-by: Ingo Molnar [email protected] Link: https://lkml.kernel.org/r/[email protected] Change-Id: I40e0e01946eadb1701a4d06758e434591e5a5c92


Sunday 2021-10-24 00:51:24 by Calyxman

Makes MMIs fabricatable with a bit more research

FINALLY HOLY SHIT. Makes the MMIs a bit less fucking STUPID HOLY SHIT. You can now fabricate them with a little bit of fabricating shit from protolathe and destructive analyzing.


< 2021-10-24 >