-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsearch.xml
930 lines (849 loc) · 421 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>docker hang 死阻塞 kubelet 初始化流程</title>
<url>/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/</url>
<content><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>最近升级了一版kubelet,修复因kubelet删除Sidecar类型Pod慢导致平台删除集群超时的问题。在灰度redis隔离集群的时候,发现升级kubelet并重启服务后,少量宿主状态变成了NotReady,并且回滚kubelet至之前版本,宿主状态仍然是NotReady。查看宿主状态时提示 ‘container runtime is down’ ,显示容器运行时出了问题。</p>
<p>我们使用的容器运行时是docker。我们就去检查docker的状态,检测结果如下:</p>
<ul>
<li>docker ps 查看所有容器列表,执行正常</li>
<li>docker inspect 查看容器详细状态,某一容器执行阻塞</li>
</ul>
<p>典型的docker hang死行为。我们最近在升级docker版本,存量宿主docker的版本为1.13.1,并且在逐步升级至18.06.3,新宿主的docker版本都是18.06.3。docker hang死问题在1.13.1版本上表现得更彻底,在执行docker ps的时候就已经hang死了;而docker 18.06.3做了一点小小的优化,在执行docker ps时去掉了容器级别的加锁操作。但是很多docker命令在执行前都会申请容器锁,因此一旦某一个容器出现问题,并不会造成docker服务不可响应,受影响的也仅仅是该容器,无法执行操作。</p>
<p>至于为什么以docker ps与docker inspect为指标检查docker状态,因为kubelet就是依赖这两个docker命令获取容器状态。</p>
<p>所以,现在问题有二:</p>
<ul>
<li>docker hang死的根因是什么?</li>
<li>docker hang死时,为什么重启kubelet,会导致宿主状态变为NotReady?</li>
</ul>
<p>docker hang死的排查详见:<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>。现在我们再来分析,当容器异常时,为什么重启kubelet,宿主的状态会从Ready变成NotReady。</p>
<h3 id="宿主状态生成机制"><a href="#宿主状态生成机制" class="headerlink" title="宿主状态生成机制"></a>宿主状态生成机制</h3><p>在问题排查之前,我们需要先了解宿主状态的生成机制。</p>
<p>宿主的所有状态都是node.Status的属性,因此我们直接定位kubelet设置node.Status的代码即可。</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// Run starts the kubelet reacting to config updates</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">Run</span><span class="params">(updates <-<span class="keyword">chan</span> kubetypes.PodUpdate)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">if</span> kl.kubeClient != <span class="literal">nil</span> {</span><br><span class="line"> <span class="comment">// Start syncing node status immediately, this may set up things the runtime needs to run.</span></span><br><span class="line"> <span class="keyword">go</span> wait.Until(kl.syncNodeStatus, kl.nodeStatusUpdateFrequency, wait.NeverStop)</span><br><span class="line"> <span class="keyword">go</span> kl.fastStatusUpdateOnce()</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>kubelet在启动时创建了一个goroutine,周期性地向apiserver同步本宿主的状态,同步周期默认是10s。</p>
<p>跟踪调用链路,我们可以看到kubelet针对宿主会设置多个Condition,表明宿主当前所处的状态,比如宿主内存是否告急、线程数是否告急,以及宿主是否就绪。</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// defaultNodeStatusFuncs is a factory that generates the default set of</span></span><br><span class="line"><span class="comment">// setNodeStatus funcs</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">defaultNodeStatusFuncs</span><span class="params">()</span> []<span class="title">func</span><span class="params">(*v1.Node)</span> <span class="title">error</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> setters = <span class="built_in">append</span>(setters,</span><br><span class="line"> nodestatus.OutOfDiskCondition(kl.clock.Now, kl.recordNodeStatusEvent),</span><br><span class="line"> nodestatus.MemoryPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderMemoryPressure, kl.recordNodeStatusEvent),</span><br><span class="line"> nodestatus.DiskPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderDiskPressure, kl.recordNodeStatusEvent),</span><br><span class="line"> nodestatus.PIDPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderPIDPressure, kl.recordNodeStatusEvent),</span><br><span class="line"> nodestatus.ReadyCondition(kl.clock.Now, kl.runtimeState.runtimeErrors, kl.runtimeState.networkErrors, validateHostFunc, kl.containerManager.Status, kl.recordNodeStatusEvent),</span><br><span class="line"> nodestatus.VolumesInUse(kl.volumeManager.ReconcilerStatesHasBeenSynced, kl.volumeManager.GetVolumesInUse),</span><br><span class="line"> <span class="comment">// TODO(mtaufen): I decided not to move this setter for now, since all it does is send an event</span></span><br><span class="line"> <span class="comment">// and record state back to the Kubelet runtime object. In the future, I'd like to isolate</span></span><br><span class="line"> <span class="comment">// these side-effects by decoupling the decisions to send events and partial status recording</span></span><br><span class="line"> <span class="comment">// from the Node setters.</span></span><br><span class="line"> kl.recordNodeSchedulableEvent,</span><br><span class="line"> )</span><br><span class="line"> <span class="keyword">return</span> setters</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>其中Ready Condition表明宿主是否就绪,kubectl查看宿主状态时,展示的Status信息就是Ready Condition的状态,常见的状态及其含义定义如下:</p>
<ul>
<li>Ready状态:表明宿主状态一切OK,能正常响应Pod事件</li>
<li>NotReady状态:表明宿主的kubelet仍在运行,但是此时已经无法处理Pod事件。NotReady绝大多数情况都是由容器运行时异常导致</li>
<li>Unknown状态:表明宿主上的kubelet已停止运行</li>
</ul>
<p>kubelet定义的Ready Condition的判定条件如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// ReadyCondition returns a Setter that updates the v1.NodeReady condition on the node.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReadyCondition</span><span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params"> nowFunc <span class="keyword">func</span>()</span> <span class="title">time</span>.<span class="title">Time</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">clock</span>.<span class="title">Now</span></span></span><br><span class="line"> runtimeErrorsFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> []<span class="title">string</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">runtimeState</span>.<span class="title">runtimeErrors</span></span></span><br><span class="line"> networkErrorsFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> []<span class="title">string</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">runtimeState</span>.<span class="title">networkErrors</span></span></span><br><span class="line"> appArmorValidateHostFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">error</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">appArmorValidator</span>.<span class="title">ValidateHost</span>, <span class="title">might</span> <span class="title">be</span> <span class="title">nil</span> <span class="title">depending</span> <span class="title">on</span> <span class="title">whether</span> <span class="title">there</span> <span class="title">was</span> <span class="title">an</span> <span class="title">appArmorValidator</span></span></span><br><span class="line"> cmStatusFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">cm</span>.<span class="title">Status</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">containerManager</span>.<span class="title">Status</span></span></span><br><span class="line"> recordEventFunc <span class="function"><span class="keyword">func</span><span class="params">(eventType, event <span class="keyword">string</span>)</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">recordNodeStatusEvent</span></span></span><br><span class="line">) Setter {</span><br><span class="line"> <span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(node *v1.Node)</span> <span class="title">error</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> rs := <span class="built_in">append</span>(runtimeErrorsFunc(), networkErrorsFunc()...)</span><br><span class="line"> requiredCapacities := []v1.ResourceName{v1.ResourceCPU, v1.ResourceMemory, v1.ResourcePods}</span><br><span class="line"> <span class="keyword">if</span> utilfeature.DefaultFeatureGate.Enabled(features.LocalStorageCapacityIsolation) {</span><br><span class="line"> requiredCapacities = <span class="built_in">append</span>(requiredCapacities, v1.ResourceEphemeralStorage)</span><br><span class="line"> }</span><br><span class="line"> missingCapacities := []<span class="keyword">string</span>{}</span><br><span class="line"> <span class="keyword">for</span> _, resource := <span class="keyword">range</span> requiredCapacities {</span><br><span class="line"> <span class="keyword">if</span> _, found := node.Status.Capacity[resource]; !found {</span><br><span class="line"> missingCapacities = <span class="built_in">append</span>(missingCapacities, <span class="keyword">string</span>(resource))</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(missingCapacities) > <span class="number">0</span> {</span><br><span class="line"> rs = <span class="built_in">append</span>(rs, fmt.Sprintf(<span class="string">"Missing node capacity for resources: %s"</span>, strings.Join(missingCapacities, <span class="string">", "</span>)))</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(rs) > <span class="number">0</span> {</span><br><span class="line"> newNodeReadyCondition = v1.NodeCondition{</span><br><span class="line"> Type: v1.NodeReady,</span><br><span class="line"> Status: v1.ConditionFalse,</span><br><span class="line"> Reason: <span class="string">"KubeletNotReady"</span>,</span><br><span class="line"> Message: strings.Join(rs, <span class="string">","</span>),</span><br><span class="line"> LastHeartbeatTime: currentTime,</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>以上代码片段显示,宿主是否Ready取决于很多条件,包含运行时判定、网络判定、基本资源判定等。</p>
<h3 id="宿主状态变化定位"><a href="#宿主状态变化定位" class="headerlink" title="宿主状态变化定位"></a>宿主状态变化定位</h3><p>接下来,我们将重点放在运行时判定,分析宿主状态发生变化的原因。运行时判定条件定义如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *runtimeState)</span> <span class="title">runtimeErrors</span><span class="params">()</span> []<span class="title">string</span></span> {</span><br><span class="line"> s.RLock()</span><br><span class="line"> <span class="keyword">defer</span> s.RUnlock()</span><br><span class="line"> <span class="keyword">var</span> ret []<span class="keyword">string</span></span><br><span class="line"> <span class="keyword">if</span> !s.lastBaseRuntimeSync.Add(s.baseRuntimeSyncThreshold).After(time.Now()) { <span class="comment">// 1</span></span><br><span class="line"> ret = <span class="built_in">append</span>(ret, <span class="string">"container runtime is down"</span>)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> s.internalError != <span class="literal">nil</span> {</span><br><span class="line"> ret = <span class="built_in">append</span>(ret, s.internalError.Error())</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> _, hc := <span class="keyword">range</span> s.healthChecks { <span class="comment">// 2</span></span><br><span class="line"> <span class="keyword">if</span> ok, err := hc.fn(); !ok {</span><br><span class="line"> ret = <span class="built_in">append</span>(ret, fmt.Sprintf(<span class="string">"%s is not healthy: %v"</span>, hc.name, err))</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> ret</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>当出现如下两种状况之一时,则判定运行时检查不通过:</p>
<ol>
<li>当前时间距最近一次运行时同步操作 (lastBaseRuntimeSync) 的时间间隔超过指定阈值(默认30s)</li>
<li>运行时健康检查未通过</li>
</ol>
<p>那么,当时宿主的NotReady是由哪种状况引起的呢?结合kubelet日志分析,kubelet每隔5s就输出一条日志:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">......</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">28.049240</span> <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">33.049359</span> <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">38.049492</span> <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">......</span><br></pre></td></tr></table></figure>
<p>因此,状况1是宿主NotReady的元凶。</p>
<p>我们继续分析为什么kubelet没有按照预期设置lastBaseRuntimeSync。kubelet启动时会创建一个goroutine,并在该goroutine中循环设置lastBaseRuntimeSync,循环如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">Run</span><span class="params">(updates <-<span class="keyword">chan</span> kubetypes.PodUpdate)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">go</span> wait.Until(kl.updateRuntimeUp, <span class="number">5</span>*time.Second, wait.NeverStop)</span><br><span class="line"> ......</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">updateRuntimeUp</span><span class="params">()</span></span> {</span><br><span class="line"> kl.updateRuntimeMux.Lock()</span><br><span class="line"> <span class="keyword">defer</span> kl.updateRuntimeMux.Unlock()</span><br><span class="line"> ......</span><br><span class="line"> kl.oneTimeInitializer.Do(kl.initializeRuntimeDependentModules)</span><br><span class="line"> kl.runtimeState.setRuntimeSync(kl.clock.Now())</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *runtimeState)</span> <span class="title">setRuntimeSync</span><span class="params">(t time.Time)</span></span> {</span><br><span class="line"> s.Lock()</span><br><span class="line"> <span class="keyword">defer</span> s.Unlock()</span><br><span class="line"> s.lastBaseRuntimeSync = t</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>正常情况下,kubelet每隔5s会将lastBaseRuntimeSync设置为当前时间,而宿主状态异常时,这个时间戳一直未被更新。也即updateRuntimeUp一直被阻塞在设置lastBaseRuntimeSync之前的某一步。因此,我们只需逐个排查updateRuntimeUp内的函数调用即可,具体过程不再展示,最终的函数调用链路如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">initializeRuntimeDependentModules -> kl.cadvisor.Start -> cc.Manager.Start -> self.createContainer -> m.createContainerLocked -> container.NewContainerHandler -> factory.CanHandleAndAccept -> self.client.ContainerInspect</span><br></pre></td></tr></table></figure>
<p>调用链路显示,最后cadvisor执行docker inspect时被hang死,阻塞了kubelet的一个关键初始化流程。</p>
<p>如果容器异常发生在kubelet初始化完毕之后,则不会对宿主的Ready状态造成任何影响,因为oneTimeInitializer是sync.Once类型,也即仅仅会在kubelet启动时执行一次。那时容器异常对kubelet的影响有限,仅仅是不能处理该Pod的任何事件,包含删除、变更等,但是仍然能够处理其他Pod的事件。</p>
<p>这就解释了为什么kubelet重启前宿主状态是Ready,重启后状态是NotReady。</p>
<h3 id="后续"><a href="#后续" class="headerlink" title="后续"></a>后续</h3><p>可能有人会问,为什么cadvisor执行docker inspect操作不加超时控制?确实,如果添加了超时控制,重启kubelet不会引起宿主状态变更。个人觉得添加超时控制没有什么问题,不清楚是否有啥坑,待详细挖掘后再来补充。</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>docker</tag>
</tags>
</entry>
<entry>
<title>go sync.pool</title>
<url>/go-sync-pool/</url>
<content><![CDATA[<p>众所周知,Go实现了自动垃圾回收,这就意味着:当我们在申请内存时,不必关心如何以及何时释放内存,这些都是由Go语言内部实现的。注:我们关心的是堆内存,因为栈内存会随着函数调用的返回自动释放。</p>
<p>自动垃圾回收极大地降低了我们写程序时的心智负担,但是,这是否就意味着我们能够随心所欲的申请大量内存呢?理论上当然可以,但实际写代码时强烈不推荐这种做法,因为大量的临时堆内存会给GC线程的造成负担。</p>
<p>此时,小明同学就问:有没有办法能缓解海量临时对象的分配问题呢?</p>
<p>当然是有的,内存复用就是一个典型方案,而内存池就是该方案的一个实例,Go语言官方提供一种内存池的实现方案——sync.Pool。</p>
<p>首先我们来看sync.Pool的使用方式:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> {</span><br><span class="line"> pool := sync.Pool{</span><br><span class="line"> New: <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">interface</span></span>{} {</span><br><span class="line"> <span class="keyword">return</span> <span class="string">"Hello"</span></span><br><span class="line"> },</span><br><span class="line"> }</span><br><span class="line"> old := pool.Get()</span><br><span class="line"> pool.Put(old.(<span class="keyword">string</span>) + <span class="string">" World"</span>)</span><br><span class="line"> <span class="built_in">new</span> := pool.Get()</span><br><span class="line"> fmt.Println(<span class="built_in">new</span>) <span class="comment">// Hello World</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>借助上面这段简单代码,我们验证了sync.Pool的内存复用。那么sync.Pool又是如何实现内存复用的呢?让我们来深入Go源码看一看。</p>
<p>sync.Pool的源码位于$GOROOT/src/sync/pool.go,其结构体定义如下:</p>
<figure class="highlight pgsql"><table><tr><td class="code"><pre><span class="line"><span class="keyword">type</span> Pool struct {</span><br><span class="line"> noCopy noCopy</span><br><span class="line"></span><br><span class="line"> <span class="keyword">local</span> unsafe.Pointer // <span class="keyword">local</span> fixed-size per-P pool, actual <span class="keyword">type</span> <span class="keyword">is</span> [P]poolLocal</span><br><span class="line"> localSize uintptr // size <span class="keyword">of</span> the <span class="keyword">local</span> <span class="keyword">array</span></span><br><span class="line"></span><br><span class="line"> // <span class="built_in">New</span> optionally specifies a <span class="keyword">function</span> <span class="keyword">to</span> generate</span><br><span class="line"> // a <span class="keyword">value</span> <span class="keyword">when</span> <span class="keyword">Get</span> would otherwise <span class="keyword">return</span> nil.</span><br><span class="line"> // It may <span class="keyword">not</span> be changed <span class="keyword">concurrently</span> <span class="keyword">with</span> calls <span class="keyword">to</span> <span class="keyword">Get</span>.</span><br><span class="line"> <span class="built_in">New</span> func() interface{}</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<ul>
<li>noCopy字段:go vet静态扫描代码时提示对象拷贝,不影响编译和运行</li>
<li>local:对象池数组,实际上是[P]poolLocal,而poolLocal则为每个P的本地内存池,P的本地内存池有两个对象:<ul>
<li>private interface{}:一个私有坑位</li>
<li>shared []interface{}:一组公有坑位</li>
</ul>
</li>
<li>localSize:local数组的大小,一般等于P的数量(在调用GOMAXPROCS时会出现短暂不一致)</li>
<li>New:当对象池为空时,就调用New方法创建一个临时对象</li>
</ul>
<p>这里需要注意的是:sync.Pool内存池并非P结构体的一个字段,而是sync.Pool自己维护了一个数组,取P的id作为数组下标来获取内存池对象。</p>
<p>了解了sync.Pool的数据结构之后,我们再来看其操作原理,sync.Pool的操作有两个:Get和Put,因为Put简单,我们先来看Put:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// Put adds x to the pool.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *Pool)</span> <span class="title">Put</span><span class="params">(x <span class="keyword">interface</span>{})</span></span> {</span><br><span class="line"> <span class="keyword">if</span> x == <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> }</span><br><span class="line"> l := p.pin()</span><br><span class="line"> <span class="keyword">if</span> l.private == <span class="literal">nil</span> {</span><br><span class="line"> l.private = x</span><br><span class="line"> x = <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> runtime_procUnpin()</span><br><span class="line"> <span class="keyword">if</span> x != <span class="literal">nil</span> {</span><br><span class="line"> l.Lock()</span><br><span class="line"> l.shared = <span class="built_in">append</span>(l.shared, x)</span><br><span class="line"> l.Unlock()</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>p.pin用于获取P的对象池,Put优先将内存对象存储到内存池私有坑位,如果私有坑位已经被占,则将其存储到公有坑位</p>
<p>注意:如果内存对象被存储至公有坑位,则需要加锁。</p>
<p>接着我们再来看Get操作:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *Pool)</span> <span class="title">Get</span><span class="params">()</span> <span class="title">interface</span></span>{} {</span><br><span class="line"> l := p.pin()</span><br><span class="line"> x := l.private</span><br><span class="line"> l.private = <span class="literal">nil</span></span><br><span class="line"> runtime_procUnpin()</span><br><span class="line"> <span class="keyword">if</span> x == <span class="literal">nil</span> {</span><br><span class="line"> l.Lock()</span><br><span class="line"> last := <span class="built_in">len</span>(l.shared) - <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> last >= <span class="number">0</span> {</span><br><span class="line"> x = l.shared[last]</span><br><span class="line"> l.shared = l.shared[:last]</span><br><span class="line"> }</span><br><span class="line"> l.Unlock()</span><br><span class="line"> <span class="keyword">if</span> x == <span class="literal">nil</span> {</span><br><span class="line"> x = p.getSlow()</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> x == <span class="literal">nil</span> && p.New != <span class="literal">nil</span> {</span><br><span class="line"> x = p.New()</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<ul>
<li>如果本P的私有坑位有对象,则直接返回</li>
<li>如果本P私有坑位没有对象,则从本P的公有坑位中获取一个对象返回</li>
<li>如果本P的公有坑位也没有对象,则依次遍历其他P的公有坑位,取走一个对象返回</li>
<li>如果所有P的公有坑位都没有对象,并且定义New函数,则调用New函数创建一个对象</li>
<li>否则返回nil</li>
</ul>
<p>注意:每当遍历一个P的公有坑位时,都需要加锁,因此最多加锁N次,最少0次,其中N为P的数目</p>
<p>了解了以上原理,我们就能够开开心心的使用sync.Pool了。此时,小明同学又问了,我明明已经使用了sync.Pool了,为什么GC压力还非常大?</p>
<p>这就涉及到sync.Pool本身的内存回收了:sync.Pool缓存临时对象并非是永久保存,它保活的时间作用域其实也非常短:我们发现sync/pool.go中还定义了poolCleanup函数用于内存池的清理,我们再看其调用时机:</p>
<figure class="highlight autoit"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">init</span><span class="params">()</span> {</span></span><br><span class="line"> runtime_registerPoolCleanup(poolCleanup)</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>runtime_xxx函数都可以对应到$GOROOT/src/runtime包下的xxx函数,我们找到对应的函数定义:</p>
<figure class="highlight swift"><table><tr><td class="code"><pre><span class="line"><span class="comment">//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">sync_runtime_registerPoolCleanup</span><span class="params">(f <span class="keyword">func</span><span class="params">()</span></span></span>) {</span><br><span class="line"> poolcleanup = f</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">clearpools</span><span class="params">()</span></span> {</span><br><span class="line"> <span class="comment">// clear sync.Pools</span></span><br><span class="line"> <span class="keyword">if</span> poolcleanup != <span class="literal">nil</span> {</span><br><span class="line"> poolcleanup()</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>因此,我们只需要定位clearpools的调用时机即可:</p>
<figure class="highlight reasonml"><table><tr><td class="code"><pre><span class="line"><span class="comment">// gcStart transitions the GC from _GCoff to _GCmark (if</span></span><br><span class="line"><span class="comment">// !mode.stwMark) or _GCmarktermination (if mode.stwMark) by</span></span><br><span class="line"><span class="comment">// performing sweep termination and GC initialization.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// This may return without performing this transition in some cases,</span></span><br><span class="line"><span class="comment">// such as when called on a system stack or with locks held.</span></span><br><span class="line">func gc<span class="constructor">Start(<span class="params">mode</span> <span class="params">gcMode</span>, <span class="params">trigger</span> <span class="params">gcTrigger</span>)</span> {</span><br><span class="line"> ......</span><br><span class="line"> clearpools<span class="literal">()</span></span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>我们发现每当GC开始时,都会清理sync.Pool内存对象池,这就意味着sync.Pool缓存的临时对象都活不过一个GC周期。如果我们的程序在疯狂分配临时对象,这就会加速GC的执行频率,而GC开始时又会释放sync.Pool内存池,这简直就是一个死循环。</p>
<p>所以小明啊,最佳的实践是什么呢?当然是优化代码逻辑咯,尽量减少内存分配次数。具体的代码优化可以借助pprof实现。</p>
]]></content>
<categories>
<category>源码分析</category>
</categories>
<tags>
<tag>go</tag>
</tags>
</entry>
<entry>
<title>netns leak 排查之旅</title>
<url>/netns-leak-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</url>
<content><![CDATA[<h3 id="揭开面纱"><a href="#揭开面纱" class="headerlink" title="揭开面纱"></a>揭开面纱</h3><p>周一,接到RD反馈线上容器网络访问存在异常,具体线上描述如下:</p>
<ul>
<li>上游服务driver-api所有容器访问下游服务duse-api某一容器TCP【telnet测试】连接不通,访问其余下游容器均正常</li>
<li>上游服务容器测试下游容器IP连通性【ping测试】正常</li>
</ul>
<p>从以上两点现象可以得出一个结论:</p>
<ul>
<li>容器的网络设备存在,IP地址连通,但是容器服务进程未启动,端口未启动</li>
<li>但是,当我们和业务RD确认之后,发现业务容器状态正常,业务进程也正运行着。嗯,问题不简单。</li>
</ul>
<p>此外,同事这边排查还有一个结论:</p>
<ul>
<li>arp反向解析duse-api特殊容器IP时,不返回MAC地址信息</li>
<li>当telnet失败后,立即执行arp,会返回MAC地址信息</li>
</ul>
<p>当我们拿着arp解析的MAC地址与容器当前的MAC地址作比较时,发现MAC地址不一致。唔,基本上确定问题所在了,net ns泄漏了。执行如下命令验证:</p>
<figure class="highlight routeros"><table><tr><td class="code"><pre><span class="line">sudo<span class="built_in"> ip </span>netns ls | <span class="keyword">while</span> read ns; <span class="keyword">do</span> sudo<span class="built_in"> ip </span>netns exec <span class="variable">$ns</span><span class="built_in"> ip </span>addr; done | grep inet | grep -v 127 | awk <span class="string">'{print $2}'</span> | sort | uniq -c</span><br></pre></td></tr></table></figure>
<p>确实发现该容器对应的IP出现了两次,该容器IP对应了两个网络命名空间,也即该容器的网络命名空间出现了泄漏。</p>
<h3 id="误入迷障"><a href="#误入迷障" class="headerlink" title="误入迷障"></a>误入迷障</h3><p>当确定了问题所在之后,我们立马调转排查方向,重新投入到net ns泄漏的排查事业当中。</p>
<p>既然net ns出现了泄漏,我们只需要排查被泄露的net ns的成因即可。在具体定位之前,首先补充一个背景:</p>
<ul>
<li>ip netns 命令默认扫描 /var/run/netns 目录,从该目录下的文件读取net ns的信息</li>
<li>默认情况下,kubelet调用docker创建容器时,docker会将net ns文件隐藏,如果不做特殊处理,我们执行 ip netns 命令将看不到任何数据</li>
<li>当前弹性云为了方便排查问题,做了一个特殊处理,将容器的网络命名空间mount到 /var/run/netns 目录 【注意,这里有个大坑】</li>
</ul>
<p>有了弹性云当前的特殊处理,我们就可以知道所有net ns的创建时间,也即 /var/run/netns 目录下对应文件的创建时间。</p>
<p>我们查看该泄漏ns文件的创建时间为2020-04-17 11:34:07,排查范围进一步缩小,只需从该时间点附近排查即可。</p>
<p>接下来,我们分析了该附近时间段,容器究竟遭遇了什么:</p>
<ul>
<li>2020-04-17 11:33:26 用户执行发布更新操作</li>
<li>2020-04-17 11:34:24 平台显示容器已启动</li>
<li>2020-04-17 11:34:28 平台显示容器启动脚本执行失败</li>
<li>2020-04-17 11:36:22 用户重新部署该容器</li>
<li>2020-04-17 11:36:31 平台显示容器已删除成功</li>
</ul>
<p>既然是容器网络命名空间泄漏,则说明再删除容器的时候,没有执行ns的清理操作。【注:这里由于基础知识不足,导致问题排查绕了地球一圈】</p>
<p>我们梳理kubelet在该时间段对该容器的清理日志,核心相关日志展示如下:</p>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">974674</span> <span class="number">37736</span> kubelet_pods.go:<span class="number">1180</span>] Killing unwanted pod <span class="string">"duse-api-xxxxx-0"</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">976803</span> <span class="number">37736</span> plugins.go:<span class="number">391</span>] Calling network plugin cni to tear down pod <span class="string">"duse-api-xxxxx-0_default"</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">983499</span> <span class="number">37736</span> kubelet_pods.go:<span class="number">1780</span>] Orphaned pod <span class="string">"4ae28778-805c-11ea-a54c-b4055d1e6372"</span> found, removing pod cgroups</span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">986360</span> <span class="number">37736</span> pod_container_manager_linux.go:<span class="number">167</span>] Attempt to kill process with pid: <span class="number">48892</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">986382</span> <span class="number">37736</span> pod_container_manager_linux.go:<span class="number">174</span>] successfully killed <span class="literal">all</span> unwanted processes.</span><br></pre></td></tr></table></figure>
<p>简单描述流程:</p>
<ul>
<li>I0417 11:36:30.974674 根据删除容器执行,执行杀死Pod操作</li>
<li>I0417 11:36:30.976803 调用cni插件清理网络命名空间</li>
<li>I0417 11:36:30.983499 常驻协程检测到Pod已终止运行,开始执行清理操作,包括清理目录、cgroup</li>
<li>I0417 11:36:30.986360 清理cgroup时杀死容器中还未退出的进程</li>
<li>I0417 11:36:30.986382 显示所有容器进程都已被杀死</li>
</ul>
<p>这里提示一点:正常情况下,容器退出时,容器内所有进程都已退出。而上面之所以出现清理cgroup时需要杀死容器内未退出进程,是由于常驻协程的检测机制导致的,常驻协程判定Pod已终止运行的条件是:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// podIsTerminated returns true if pod is in the terminated state ("Failed" or "Succeeded").</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">podIsTerminated</span><span class="params">(pod *v1.Pod)</span> <span class="title">bool</span></span> {</span><br><span class="line"> <span class="comment">// Check the cached pod status which was set after the last sync.</span></span><br><span class="line"> status, ok := kl.statusManager.GetPodStatus(pod.UID)</span><br><span class="line"> <span class="keyword">if</span> !ok {</span><br><span class="line"> <span class="comment">// If there is no cached status, use the status from the</span></span><br><span class="line"> <span class="comment">// apiserver. This is useful if kubelet has recently been</span></span><br><span class="line"> <span class="comment">// restarted.</span></span><br><span class="line"> status = pod.Status</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded || (pod.DeletionTimestamp != <span class="literal">nil</span> && notRunning(status.ContainerStatuses))</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这个容器命中了第三个或条件:容器已被标记删除,并且所有业务容器都不在运行中(业务容器启动失败,根本就没运行起来过),但是Pod的sandbox容器可能仍然处于运行状态。</p>
<p>仅依据上面的kubelet日志,难以发现问题所在。我们接着又分析了cni插件的日志,截取cni在删除该Pod容器网络时的日志如下:</p>
<figure class="highlight routeros"><table><tr><td class="code"><pre><span class="line">[pid:98497] 2020/04/17 11:36:30.990707 main.go:89: ===== start cni process =====</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990761 main.go:90: os env: [<span class="attribute">CNI_COMMAND</span>=DEL <span class="attribute">CNI_CONTAINERID</span>=c2ef79f7596b6b558f0c01c0715bac46714eefd1e9966625a09414c7218e1013 <span class="attribute">CNI_NETNS</span>=/proc/48892/ns/net <span class="attribute">CNI_ARGS</span>=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=duse-api-xxxxx-0;K8S_POD_INFRA_CONTAINER_ID=c2ef79f7596b6b558f0c01c0715bac46714eefd1e9966625a09414c7218e1013 <span class="attribute">CNI_IFNAME</span>=eth0 <span class="attribute">CNI_PATH</span>=/home/user/cloud/cni-plugins/bin <span class="attribute">LANG</span>=en_US.UTF-8 <span class="attribute">PATH</span>=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin <span class="attribute">KUBE_LOGTOSTDERR</span>=--logtostderr=false <span class="attribute">KUBE_LOG_LEVEL</span>=--v=3 <span class="attribute">KUBE_ALLOW_PRIV</span>=--allow-privileged=true <span class="attribute">KUBE_MASTER</span>=--master=https://10.xxx.xxx.xxx:6443 <span class="attribute">KUBELET_ADDRESS</span>=--address=0.0.0.0 <span class="attribute">KUBELET_HOSTNAME</span>=--hostname_override=10.xxx.xxx.xxx KUBELET_POD_INFRA_CONTAINER= <span class="attribute">KUBELET_ARGS</span>=--network-plugin=cni <span class="attribute">--cni-bin-dir</span>=/home/user/cloud/cni-plugins/bin <span class="attribute">--cni-conf-dir</span>=/home/user/cloud/cni-plugins/conf <span class="attribute">--kubeconfig</span>=/etc/kubernetes/kubeconfig/kubelet.kubeconfig <span class="attribute">--cert-dir</span>=/etc/kubernetes/ssl <span class="attribute">--log-dir</span>=/var/log/kubernetes <span class="attribute">--stderrthreshold</span>=3 <span class="attribute">--allowed-unsafe-sysctls</span>=net.*,kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.* <span class="attribute">--pod-infra-container-image</span>=registry.keji.com/k8s/pause:3.0 --eviction-hard= <span class="attribute">--image-gc-high-threshold</span>=75 <span class="attribute">--image-gc-low-threshold</span>=65 <span class="attribute">--feature-gates</span>=KubeletPluginsWatcher=false <span class="attribute">--restart-count-limit</span>=5 <span class="attribute">--last-upgrade-time</span>=2019-07-01]</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990771 main.go:91: stdin : {<span class="string">"cniVersion"</span>:<span class="string">"0.3.0"</span>,<span class="string">"logDir"</span>:<span class="string">"/home/user/cloud/cni-plugins/acllogs"</span>,<span class="string">"name"</span>:<span class="string">"cloudcni"</span>,<span class="string">"type"</span>:<span class="string">"aclCni"</span>}</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990790 main.go:181: failed <span class="keyword">to</span> Statfs <span class="string">"/proc/48892/ns/net"</span>: <span class="literal">no</span> such file <span class="keyword">or</span> directory</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990814 main.go:94: ===== end cni process =====</span><br></pre></td></tr></table></figure>
<p>其中,main.go:181行的错误日志一下就抓住了我们的眼球,结合代码分析下:</p>
<figure class="highlight stata"><table><tr><td class="code"><pre><span class="line">func cmdDel(<span class="keyword">args</span> *skel.CmdArgs) <span class="keyword">error</span> {</span><br><span class="line"> <span class="keyword">n</span>, _, <span class="keyword">err</span> := loadConf(<span class="keyword">args</span>.StdinData)</span><br><span class="line"> <span class="keyword">if</span> <span class="keyword">err</span> != nil {</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">err</span></span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> netns, <span class="keyword">err</span> := ns.GetNS(<span class="keyword">args</span>.Netns)</span><br><span class="line"> <span class="keyword">if</span> <span class="keyword">err</span> != nil {</span><br><span class="line"> <span class="keyword">log</span>.Println(<span class="keyword">err</span>) <span class="comment">//// Line 181</span></span><br><span class="line"> <span class="keyword">return</span> fmt.Errorf(<span class="string">"failed to open netns %q: %v"</span>, netns, <span class="keyword">err</span>)</span><br><span class="line"> }</span><br><span class="line"> defer netns.<span class="keyword">Close</span>()</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>可以看到,cni在调用该插件清理容器网络命名空间时,由于181行的错误,导致cni插件提前退出,并没有执行后面的清理操作。唔,终于找到你,小虫子。</p>
<p>这里,我们先简单总结下问题排查至此,得出的阶段性结论:</p>
<ul>
<li>由于容器启动失败,在删除Pod时,常驻协程定时清理非运行状态Pod的cgroup,杀死了Pod的sandbox容器</li>
<li>当删除容器命令触发的cni清理操作执行时,发现sandbox的pause进程已退出,定位不到容器的网络命名空间,因此退出cni的清理操作</li>
<li>最终容器网络命名空间泄漏</li>
</ul>
<p>既然,明确了问题所在,我们就赶紧来定制修复方案吧,甚至于,我们很快就给出了一版修复:</p>
<ul>
<li>保证在Pod的所有容器退出之前,不会执行cgroup清理操作</li>
</ul>
<p>这样就保证了删除容器命令触发的清理操作能够按照顺序执行:</p>
<ul>
<li>杀死所有业务容器</li>
<li>执行cni插件清理工作</li>
<li>杀死sandbox容器</li>
<li>执行cgroup清理工作</li>
</ul>
<p>我们风风火火的修复了内部版本之后,还验证了社区新版本代码中这块逻辑仍旧保持原样,就想着给社区送温暖(事实证明是妄想)。我们就去开源版本搭建的集群中,复现这个问题。然后噩梦就来了。。。</p>
<p>相同的Pod配置文件,我们在弹性云内部版本几乎能够百分百复现net ns泄漏的问题,而在开源社区版本中,从未出现过一次net ns泄漏。难不成,搞不好,莫不是说,不是我们定位的这个原因?</p>
<h3 id="拨云现月"><a href="#拨云现月" class="headerlink" title="拨云现月"></a>拨云现月</h3><p>这个结论对我们来说,不是一个好消息。费力不小,不说南辕北辙,但是确实还未发现问题的根因。</p>
<p>为了进一步缩小问题排查范围,我们找内核组同学请教了一个基础知识:</p>
<ul>
<li>在删除net ns时,如果该ns内仍有网络设备,系统自动先删除网络设备,然后再删除ns</li>
</ul>
<p>掌握了这个基础知识,我们再来排查。既然原生k8s集群不存在net ns泄漏问题,那问题一定由我们定制的某个模块引起。由于net ns泄漏发生在node上,当前弹性云在node节点上部署的模块包含:</p>
<ul>
<li>kubelet</li>
<li>cni plugins</li>
<li>other tools</li>
</ul>
<p>由于kubelet已经被排除嫌疑,那么罪魁祸首基本就是cni插件了。对比原生集群与弹性云线上集群的cni插件,发现一个极有可能会造成net ns泄漏的点:</p>
<ul>
<li>定制的cni插件为了排查问题的方便,将容器的网络命名空间文件绑定挂载到了 /var/run/netns 目录下 【参考上面的大坑】</li>
</ul>
<p>我们赶紧着手验证元凶是否就是它。修改cni插件代码,删除绑定挂载操作,然后在测试环境验证。验证结果符合预期,net ns不在泄漏。至此,真相终于大白于天下了。</p>
<h3 id="亡羊补牢"><a href="#亡羊补牢" class="headerlink" title="亡羊补牢"></a>亡羊补牢</h3><p>当初为net ns做一个绑定挂载,其目的就是为了方便我们排查问题,使得 ip netns 命令能够访问当前宿主上所有Pod的网络命名空间。</p>
<p>但其实一个简单的软链操作就能够实现这个目标。Pod退出时,如果这个软链文件未被清理,也不会引起net ns的泄漏,同时 ls -la /var/run/netns 命令可以清晰的看到哪些net ns仍有效,哪些已无效。</p>
<h3 id="事后诸葛"><a href="#事后诸葛" class="headerlink" title="事后诸葛"></a>事后诸葛</h3><p>为什么绑定挂载能够导致net ns泄漏呢?这是由linux 网络命名空间特性决定的:</p>
<ul>
<li>只要该命名空间中仍有一个进程存活,或者存在绑定挂载的情况(可能还存在其他情况),该ns就不会被回收</li>
<li>而一旦所有进程都已退出,并且也无特殊状况,linux将自动回收该ns</li>
</ul>
<p>最后,这个问题本身并不复杂,之所以问题存在如此之久,排查如此曲折,主要暴露了我们的基础知识有所欠缺。</p>
<p>好好学习,天天向上,方是王道!</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>docker</tag>
<tag>cni</tag>
<tag>linux namespace</tag>
</tags>
</entry>
<entry>
<title>一次读 pipe 引发的血案</title>
<url>/%E4%B8%80%E6%AC%A1%E8%AF%BB-pipe-%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/</url>
<content><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>背景详见:<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>。总结成一句话:runc非预期写pipe造成一系列组件的阻塞,当我们读pipe以消除阻塞时,发生了一个非预期的现象——宿主上所有的容器都被重建了。</p>
<p>再详细分析问题原因之前,我们先简单回顾下linux pipe的基础知识。</p>
<h3 id="linux-pipe"><a href="#linux-pipe" class="headerlink" title="linux pipe"></a>linux pipe</h3><p>linux pipe(也即管道),相信大家对它都不陌生,是一种典型的进程间通信机制。管道主要分为两类:命名管道与匿名管道。其区别在于:</p>
<ul>
<li>命名管道:管道以文件形式存储在文件系统之上,系统中的任意两个进程都可以借助命名管道通信</li>
<li>匿名管道:管道不以文件形式存储在文件系统之上,仅存储在进程的文件描述符表中,只有具有血缘关系的进程直接才能借助管道通信,如父子进程、子子进程、祖孙进程等</li>
</ul>
<p>管道可以被想象成一个固定大小的文件,分为读写两端,阻塞型管道读写有如下特点:</p>
<ul>
<li>读端:当管道内无数据时,读操作阻塞,直到有数据写入,或者所有写段关闭</li>
<li>写段:当管道已被写满时,写操作阻塞,直到数据被读出</li>
</ul>
<p>linux pipe默认大小为16内存页[ref.2],也即65536字节。</p>
<p>这里我有一个小疑惑:写pipe超过65536字节才会被阻塞,我们在宿主上也验证了这个结论,但是<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>写入5378字符时就已被阻塞。欢迎了解的小伙伴解惑。</p>
<h3 id="血案发生"><a href="#血案发生" class="headerlink" title="血案发生"></a>血案发生</h3><p>由于runc init非预期往pipe里写入大量数据而引起阻塞,我们消除阻塞的做法很简单,人为读取pipe中的内容。当我们读取完pipe中的内容时,原本一切都应该按照我们的预期发展:收集到runc init非预期写pipe的真正原因;异常容器恢复响应。确实,以上两点都如我们预期的发生了,然而,此时还发生了一个非预期的动作:宿主上所有容器都被重建了。</p>
<p>一个线上事故就此发生。原本其他线上容器运行正常,当我们解决docker hang死问题时,却引起了其他容器的一次重建,这显然是不可接受的。</p>
<h3 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h3><p>我们的第一嫌犯是docker,怀疑宿主docker服务发生了重启。当我们验证docker服务状态时,排除了docker的嫌疑,因为docker上一次重启时间是好多天前。</p>
<p>既然不是docker,那基本就是kubernetes了。kubernetes组件又分为master端与node端两大类。node端组件仅有kubelet,但是kubelet的嫌疑很小,因为它就是个打工仔,所有事情都是听从master的安排。而master端组件有三:控制器、调度器,与API服务。由于调度器包含驱逐功能,原本调度器嫌疑最大,但是因为我们线上关闭了驱逐功能,因此也基本不可能是调度器搞的鬼;而API服务则是被动的接收变更请求,也能排除嫌疑;那么嫌疑犯只剩下控制器了,控制器为什么要重建宿主上的所有容器呢?</p>
<p>以上是我们的猜测环节,为了验证猜测正确与否,我们必须收集证据。证据何在?基本就埋没在海量的组件日志中。天网恢恢疏而不漏,在控制器日志中,我们掌握了它犯罪的关键证据:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">/<span class="keyword">var</span>/log/kubernetes/kube-controller-manager.root.log.INFO<span class="number">.20200712</span><span class="number">-014245.35913</span>:I0712 <span class="number">03</span>:<span class="number">19</span>:<span class="number">59.590703</span> <span class="number">35913</span> controller_utils.<span class="keyword">go</span>:<span class="number">95</span>] Starting deletion of pod <span class="keyword">default</span>/kproxy-sf<span class="number">-69466</span><span class="number">-1</span></span><br></pre></td></tr></table></figure>
<p>我们在0709发现宿主docker异常,而控制器在0712主动删除了宿主上的容器。证据在手,我们就开始审问控制器,它的交代入下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// DeletePods will delete all pods from master running on given node,</span></span><br><span class="line"><span class="comment">// and return true if any pods were deleted, or were found pending</span></span><br><span class="line"><span class="comment">// deletion.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">DeletePods</span><span class="params">(kubeClient clientset.Interface, recorder record.EventRecorder, nodeName, nodeUID <span class="keyword">string</span>, daemonStore extensionslisters.DaemonSetLister)</span> <span class="params">(<span class="keyword">bool</span>, error)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">for</span> _, pod := <span class="keyword">range</span> pods.Items {</span><br><span class="line"> ......</span><br><span class="line"> glog.V(<span class="number">2</span>).Infof(<span class="string">"Starting deletion of pod %v/%v"</span>, pod.Namespace, pod.Name)</span><br><span class="line"> <span class="keyword">if</span> err := kubeClient.CoreV1().Pods(pod.Namespace).Delete(pod.Name, <span class="literal">nil</span>); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>, err</span><br><span class="line"> }</span><br><span class="line"> remaining = <span class="literal">true</span></span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">doEvictionPass</span><span class="params">()</span></span> {</span><br><span class="line"> nc.evictorLock.Lock()</span><br><span class="line"> <span class="keyword">defer</span> nc.evictorLock.Unlock()</span><br><span class="line"> <span class="keyword">for</span> k := <span class="keyword">range</span> nc.zonePodEvictor {</span><br><span class="line"> <span class="comment">// Function should return 'false' and a time after which it should be retried, or 'true' if it shouldn't (it succeeded).</span></span><br><span class="line"> nc.zonePodEvictor[k].Try(<span class="function"><span class="keyword">func</span><span class="params">(value scheduler.TimedValue)</span> <span class="params">(<span class="keyword">bool</span>, time.Duration)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> remaining, err := nodeutil.DeletePods(nc.kubeClient, nc.recorder, value.Value, nodeUID, nc.daemonSetStore)</span><br><span class="line"> ......</span><br><span class="line"> })</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">// Run starts an asynchronous loop that monitors the status of cluster nodes.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">Run</span><span class="params">(stopCh <-<span class="keyword">chan</span> <span class="keyword">struct</span>{})</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="comment">// Managing eviction of nodes:</span></span><br><span class="line"> <span class="comment">// When we delete pods off a node, if the node was not empty at the time we then</span></span><br><span class="line"> <span class="comment">// queue an eviction watcher. If we hit an error, retry deletion.</span></span><br><span class="line"> <span class="keyword">go</span> wait.Until(nc.doEvictionPass, scheduler.NodeEvictionPeriod, stopCh)</span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这个控制器就是node_lifecycle_controller,也即宿主生命周期控制器,该控制器定时 (每100ms) 驱逐宿主上的容器。这个控制器并非不分青红皂白就一通乱杀,不然线上早就乱套了,我们再来看看其判断条件:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// monitorNodeStatus verifies node status are constantly updated by kubelet, and if not,</span></span><br><span class="line"><span class="comment">// post "NodeReady==ConditionUnknown". It also evicts all pods if node is not ready or</span></span><br><span class="line"><span class="comment">// not reachable for a long period of time.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">monitorNodeStatus</span><span class="params">()</span> <span class="title">error</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">if</span> currentReadyCondition != <span class="literal">nil</span> {</span><br><span class="line"> <span class="comment">// Check eviction timeout against decisionTimestamp</span></span><br><span class="line"> <span class="keyword">if</span> observedReadyCondition.Status == v1.ConditionFalse {</span><br><span class="line"> <span class="keyword">if</span> decisionTimestamp.After(nc.nodeStatusMap[node.Name].readyTransitionTimestamp.Add(nc.podEvictionTimeout)) {</span><br><span class="line"> <span class="keyword">if</span> nc.evictPods(node) {</span><br><span class="line"> glog.V(<span class="number">2</span>).Infof(<span class="string">"Node is NotReady. Adding Pods on Node %s to eviction queue: %v is later than %v + %v"</span>,</span><br><span class="line"> node.Name,</span><br><span class="line"> decisionTimestamp,</span><br><span class="line"> nc.nodeStatusMap[node.Name].readyTransitionTimestamp,</span><br><span class="line"> nc.podEvictionTimeout,</span><br><span class="line"> )</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> observedReadyCondition.Status == v1.ConditionUnknown {</span><br><span class="line"> <span class="keyword">if</span> decisionTimestamp.After(nc.nodeStatusMap[node.Name].probeTimestamp.Add(nc.podEvictionTimeout)) {</span><br><span class="line"> <span class="keyword">if</span> nc.evictPods(node) {</span><br><span class="line"> glog.V(<span class="number">2</span>).Infof(<span class="string">"Node is unresponsive. Adding Pods on Node %s to eviction queues: %v is later than %v + %v"</span>,</span><br><span class="line"> node.Name,</span><br><span class="line"> decisionTimestamp,</span><br><span class="line"> nc.nodeStatusMap[node.Name].readyTransitionTimestamp,</span><br><span class="line"> nc.podEvictionTimeout-gracePeriod,</span><br><span class="line"> )</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">evictPods</span><span class="params">(node *v1.Node)</span> <span class="title">bool</span></span> {</span><br><span class="line"> nc.evictorLock.Lock()</span><br><span class="line"> <span class="keyword">defer</span> nc.evictorLock.Unlock()</span><br><span class="line"> <span class="keyword">return</span> nc.zonePodEvictor[utilnode.GetZoneKey(node)].Add(node.Name, <span class="keyword">string</span>(node.UID))</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>可见,控制器驱逐该宿主上的所有Pod的条件有二:</p>
<ol>
<li>宿主的状态为NotReady或者Unknown</li>
<li>宿主状态保持非Ready超过指定时间阈值。该时间阈值由nc.podEvictionTimeout定义,默认为5分钟,我们的线上集群将其定制为2000分钟</li>
</ol>
<p>在<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/">docker-hang-死阻塞-kubelet-初始化流程</a>中,我们提到由于docker hang死,kubelet初始化流程被阻塞,宿主状态为NotReady,命中条件1;我们检查kubelet NotReady的起始时间为2020-07-10 17:58:59,与控制器删除Pod的时间间隔基本为2000分钟,命中条件2。</p>
<p>至此,本问题基本已盖棺定论:由于线上宿主状态非Ready持续时间太长,引起控制器驱逐宿主上所有容器导致。</p>
<h3 id="思考"><a href="#思考" class="headerlink" title="思考"></a>思考</h3><p>清楚了其原理之后,大家再来思考一个问题:当宿主状态非Ready时,无法处理控制器发出的驱逐容器的请求,当且仅当宿主状态变成Ready之后,才能开始处理。既然宿主已恢复,是否还有必要立即驱逐其上的所有容器?尤其是针对有状态服务,删除Pod之后,立马又在原宿主创建该Pod。我个人感觉非但没有必要,而且还存在一定风险。</p>
<p>针对控制器的驱逐策略,我们调大了线上的驱逐时间间隔,从原来的2000分钟,调整为3年。</p>
<h3 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h3><ol>
<li><a href="https://elixir.bootlin.com/linux/v3.10/source/fs/pipe.c#L496">https://elixir.bootlin.com/linux/v3.10/source/fs/pipe.c#L496</a></li>
<li><a href="https://elixir.bootlin.com/linux/v3.10/source/include/linux/pipe_fs_i.h#L4">https://elixir.bootlin.com/linux/v3.10/source/include/linux/pipe_fs_i.h#L4</a></li>
</ol>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>docker</tag>
</tags>
</entry>
<entry>
<title>pod terminating 排查之旅</title>
<url>/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</url>
<content><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>近期,弹性云线上集群发生了几起特殊的容器漂移失败事件,其特殊之处在于容器处于Pod Terminating状态,而宿主则处于Ready状态。</p>
<p>宿主状态为Ready说明其能够正常处理Pod事件,但是Pod却卡在了退出阶段,说明此问题并非由kubelet引起,那么docker就是1号犯罪嫌疑人了。</p>
<p>下文将详细介绍问题的排查与分析全过程。</p>
<h3 id="抽丝剥茧"><a href="#抽丝剥茧" class="headerlink" title="抽丝剥茧"></a>抽丝剥茧</h3><h4 id="排除kubelet嫌疑"><a href="#排除kubelet嫌疑" class="headerlink" title="排除kubelet嫌疑"></a>排除kubelet嫌疑</h4><p>Pod状态如下:</p>
<figure class="highlight angelscript"><table><tr><td class="code"><pre><span class="line"><span class="string">[stupig@master ~]</span>$ kubectl <span class="keyword">get</span> pod -owide</span><br><span class="line">pod<span class="number">-976</span>a0<span class="number">-5</span> <span class="number">0</span>/<span class="number">1</span> Terminating <span class="number">0</span> <span class="number">112</span>m</span><br></pre></td></tr></table></figure>
<p>尽管kubelet的犯罪嫌疑已经很小,但是我们还是需要排查kubelet日志进一步确认。截取kubelet关键日志片段如下:</p>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="attribute">I1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">492682</span> <span class="number">34976</span> kubelet_pods.go:<span class="number">1017</span>] Pod <span class="string">"pod-976a0-5_default(f1e03a3d-0dc7-11eb-b4b1-246e967c4efc)"</span> is terminated, but some containers have not been cleaned up: {ID:{Type:docker ID:<span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef} Name:stupig State:exited CreatedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">57</span>.<span class="number">859913657</span> +<span class="number">0800</span> CST StartedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">57</span>.<span class="number">928654495</span> +<span class="number">0800</span> CST FinishedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">28</span>.<span class="number">661263065</span> +<span class="number">0800</span> CST ExitCode:<span class="number">0</span> Hash:<span class="number">2101852810</span> HashWithoutResources:<span class="number">2673273670</span> RestartCount:<span class="number">0</span> Reason:Completed Message: Resources:map[CpuQuota:<span class="number">200000</span> Memory:<span class="number">2147483648</span> MemorySwap:<span class="number">2147483648</span>]}</span><br><span class="line"><span class="attribute">E1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">709255</span> <span class="number">34976</span> remote_runtime.go:<span class="number">250</span>] RemoveContainer <span class="string">"41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef"</span> from runtime service failed: rpc error: code = Unknown desc = failed to remove container <span class="string">"41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef"</span>: Error response from daemon: container <span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef: driver <span class="string">"overlay2"</span> failed to remove root filesystem: unlinkat /home/docker_rt/overlay<span class="number">2</span>/e<span class="number">5</span>dab<span class="number">77</span>be<span class="number">213</span>d<span class="number">9</span>f<span class="number">9</span>cfc<span class="number">0</span>b<span class="number">0</span>b<span class="number">3281</span>dbef<span class="number">9</span>c<span class="number">2878</span>fee<span class="number">3</span>b<span class="number">8</span>e<span class="number">406</span>bc<span class="number">8</span>ab<span class="number">97</span>adc<span class="number">30</span>ae<span class="number">4</span>d<span class="number">5</span>/merged: device or resource busy</span><br><span class="line"><span class="attribute">E1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">709292</span> <span class="number">34976</span> kuberuntime_gc.go:<span class="number">126</span>] Failed to remove container <span class="string">"41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef"</span>: rpc error: code = Unknown desc = failed to remove container <span class="string">"41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef"</span>: Error response from daemon: container <span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef: driver <span class="string">"overlay2"</span> failed to remove root filesystem: unlinkat /home/docker_rt/overlay<span class="number">2</span>/e<span class="number">5</span>dab<span class="number">77</span>be<span class="number">213</span>d<span class="number">9</span>f<span class="number">9</span>cfc<span class="number">0</span>b<span class="number">0</span>b<span class="number">3281</span>dbef<span class="number">9</span>c<span class="number">2878</span>fee<span class="number">3</span>b<span class="number">8</span>e<span class="number">406</span>bc<span class="number">8</span>ab<span class="number">97</span>adc<span class="number">30</span>ae<span class="number">4</span>d<span class="number">5</span>/merged: device or resource busy</span><br></pre></td></tr></table></figure>
<p>日志显示kubelet处于Pod Terminating状态的原因很清楚:清理容器失败。</p>
<p>kubelet清理容器的命令是 <code>docker rm -f</code> ,其失败的原因在于删除容器目录 <code>xxx/merged</code> 时报错,错误提示为 <code>device or resource busy</code> 。</p>
<p>除此之外,kubelet无法再提供其他关键信息。</p>
<p>登陆宿主,我们验证对应容器的状态:</p>
<figure class="highlight subunit"><table><tr><td class="code"><pre><span class="line">[stupig@hostname ~]$ sudo docker ps -a | grep pod<span class="string">-976</span>a0<span class="string">-5</span></span><br><span class="line">41020461ed4d Removal In Progress k8s_stupig_pod<span class="string">-976</span>a0<span class="string">-5</span>_default_f1e03a3d<span class="string">-0</span>dc7<span class="string">-11</span>eb-b4b1<span class="string">-246</span>e967c4efc_0</span><br><span class="line">f0a75e10b252 Exited (0) 2 minutes ago k8s_POD_pod<span class="string">-976</span>a0<span class="string">-5</span>_default_f1e03a3d<span class="string">-0</span>dc7<span class="string">-11</span>eb-b4b1<span class="string">-246</span>e967c4efc_0</span><br><span class="line">[stupig@hostname ~]$ sudo docker rm -f 41020461ed4d</span><br><span class="line"><span class="keyword">Error </span>response from daemon: container 41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef: driver "overlay2" failed to remove root filesystem: unlinkat /home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged: device or resource busy</span><br></pre></td></tr></table></figure>
<p>问题已然清楚,现在我们有两种排查思路:</p>
<ul>
<li>参考Google上解决 <code>device or resource busy</code> 问题的思路</li>
<li>结合现象分析代码</li>
</ul>
<h4 id="Google大法"><a href="#Google大法" class="headerlink" title="Google大法"></a>Google大法</h4><p>有问题找Google!所以,我们首先咨询了Google,检索结果显示很多人都碰到了类似的问题。</p>
<p>而网络上主流的解决方案:配置docker服务MountFlags为slave,避免docker挂载点信息泄漏到其他mnt命名空间,详细原因请参阅:<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>。</p>
<p>这么简单???显然不能,检查发现docker服务当前已配置MountFlags为slave。网络银弹再次失去功效。</p>
<p>so,我们还是老老实实结合现场分析代码吧。</p>
<h4 id="docker处理流程"><a href="#docker处理流程" class="headerlink" title="docker处理流程"></a>docker处理流程</h4><p>在具体分析docker代码之前,先简单介绍下docker的处理流程,避免作为一只无头苍蝇处处碰壁。</p>
<p><img src="docker-procedure.png" alt="docker处理流程"></p>
<p>清楚了docker的处理流程之后,我们再来分析现场。</p>
<h4 id="提审docker"><a href="#提审docker" class="headerlink" title="提审docker"></a>提审docker</h4><p>问题发生在docker清理阶段,docker清理容器读写层出错,报错信息为 <code>device or resource busy</code>,说明docker读写层并没有被正确卸载,或者是没有完全卸载。下面的命令可以验证这个结论:</p>
<figure class="highlight gradle"><table><tr><td class="code"><pre><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rwn <span class="string">'/home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged'</span> <span class="regexp">/proc/</span>*/mountinfo</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">22283</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">22407</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">28454</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">28530</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br></pre></td></tr></table></figure>
<p>不出所料,容器读写层仍然被以上四个进程所挂载,进而导致docker在清理读写层目录时报错。</p>
<p>随之而来的问题是,为什么docker没有正确卸载容器读写层?我们先展示下 <code>docker stop</code> 中卸载容器读写层挂载的相关部分代码:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">Cleanup</span><span class="params">(container *container.Container)</span></span> {</span><br><span class="line"> <span class="keyword">if</span> err := daemon.conditionalUnmountOnCleanup(container); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">if</span> mountid, err := daemon.imageService.GetLayerMountID(container.ID, container.OS); err == <span class="literal">nil</span> {</span><br><span class="line"> daemon.cleanupMountsByID(mountid)</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">conditionalUnmountOnCleanup</span><span class="params">(container *container.Container)</span> <span class="title">error</span></span> {</span><br><span class="line"> <span class="keyword">return</span> daemon.Unmount(container)</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">Unmount</span><span class="params">(container *container.Container)</span> <span class="title">error</span></span> {</span><br><span class="line"> <span class="keyword">if</span> container.RWLayer == <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> errors.New(<span class="string">"RWLayer of container "</span> + container.ID + <span class="string">" is unexpectedly nil"</span>)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> err := container.RWLayer.Unmount(); err != <span class="literal">nil</span> {</span><br><span class="line"> logrus.Errorf(<span class="string">"Error unmounting container %s: %s"</span>, container.ID, err)</span><br><span class="line"> <span class="keyword">return</span> err</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rl *referencedRWLayer)</span> <span class="title">Unmount</span><span class="params">()</span> <span class="title">error</span></span> {</span><br><span class="line"> <span class="keyword">return</span> rl.layerStore.driver.Put(rl.mountedLayer.mountID)</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(d *Driver)</span> <span class="title">Put</span><span class="params">(id <span class="keyword">string</span>)</span> <span class="title">error</span></span> {</span><br><span class="line"> d.locker.Lock(id)</span><br><span class="line"> <span class="keyword">defer</span> d.locker.Unlock(id)</span><br><span class="line"> dir := d.dir(id)</span><br><span class="line"> mountpoint := path.Join(dir, <span class="string">"merged"</span>)</span><br><span class="line"> logger := logrus.WithField(<span class="string">"storage-driver"</span>, <span class="string">"overlay2"</span>)</span><br><span class="line"> <span class="keyword">if</span> err := unix.Unmount(mountpoint, unix.MNT_DETACH); err != <span class="literal">nil</span> {</span><br><span class="line"> logger.Debugf(<span class="string">"Failed to unmount %s overlay: %s - %v"</span>, id, mountpoint, err)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> err := unix.Rmdir(mountpoint); err != <span class="literal">nil</span> && !os.IsNotExist(err) {</span><br><span class="line"> logger.Debugf(<span class="string">"Failed to remove %s overlay: %v"</span>, id, err)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>代码处理流程清晰明了,最终docker会发起 <code>SYS_UMOUNT2</code> 系统调用卸载容器读写层。</p>
<p>但是,docker在清理容器读写层时却提示错误,并且容器读写层挂载信息也出现在其他进程中。难不成docker没有执行卸载操作?结合docker日志分析:</p>
<figure class="highlight routeros"><table><tr><td class="code"><pre><span class="line">Oct 14 10:50:28 hostname dockerd: <span class="attribute">time</span>=<span class="string">"2020-10-14T10:50:28.769199725+08:00"</span> <span class="attribute">level</span>=debug <span class="attribute">msg</span>=<span class="string">"Failed to unmount e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5 overlay: /home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged - invalid argument"</span> <span class="attribute">storage-driver</span>=overlay2</span><br><span class="line">Oct 14 10:50:28 hostname dockerd: <span class="attribute">time</span>=<span class="string">"2020-10-14T10:50:28.769213547+08:00"</span> <span class="attribute">level</span>=debug <span class="attribute">msg</span>=<span class="string">"Failed to remove e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5 overlay: device or resource busy"</span> <span class="attribute">storage-driver</span>=overlay2</span><br></pre></td></tr></table></figure>
<p>日志显示docker在执行卸载容器读写层命令时出错,提示 <code>invalid argument</code>。结合 <a href="https://man7.org/linux/man-pages/man2/umount.2.html">umount2</a> 文档可知,容器读写层并非是dockerd(docker后台进程)的挂载点???</p>
<p>现在,回过头来分析拥有容器读写层挂载信息的进程,我们发现一个惊人的信息:</p>
<figure class="highlight yaml"><table><tr><td class="code"><pre><span class="line">[<span class="string">stupig@hostname</span> <span class="string">~</span>]<span class="string">$</span> <span class="string">ps</span> <span class="string">-ef|grep</span> <span class="string">-E</span> <span class="string">"22283|22407|28454|28530"</span></span><br><span class="line"><span class="string">root</span> <span class="number">22283</span> <span class="number">1</span> <span class="number">0</span> <span class="number">10</span><span class="string">:48</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span> <span class="number">22407</span> <span class="number">1</span> <span class="number">0</span> <span class="number">10</span><span class="string">:48</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span> <span class="number">28454</span> <span class="number">1</span> <span class="number">0</span> <span class="number">10</span><span class="string">:49</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span> <span class="number">28530</span> <span class="number">1</span> <span class="number">0</span> <span class="number">10</span><span class="string">:49</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br></pre></td></tr></table></figure>
<p>容器读写层挂载信息没有出现在dockerd进程命名空间中,却出现在其他容器的托管服务shim进程的命名空间内,推断dockerd进程发生了重启,对比进程启动时间与命名空间详情可以进行验证:</p>
<figure class="highlight angelscript"><table><tr><td class="code"><pre><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep dockerd</span><br><span class="line"> <span class="number">34836</span> /usr/bin/dockerd --storage- Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">15</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$(pidof dockerd)/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -> ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -> mnt:[<span class="number">4026533327</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -> net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -> pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -> user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -> uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep -w containerd|grep -v shim</span><br><span class="line"> <span class="number">34849</span> docker-containerd --config Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">15</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$(pidof docker-containerd)/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -> ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -> mnt:[<span class="number">4026533327</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -> net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -> pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -> user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -> uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep -w containerd-shim</span><br><span class="line"> <span class="number">22283</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">48</span>:<span class="number">50</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">22407</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">48</span>:<span class="number">55</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">28454</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">53</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">28530</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">53</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/<span class="number">28454</span>/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -> ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -> mnt:[<span class="number">4026533200</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -> net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -> pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -> user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -> uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$$/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> ipc -> ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> mnt -> mnt:[<span class="number">4026531840</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> net -> net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> pid -> pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> user -> user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> uts -> uts:[<span class="number">4026531838</span>]</span><br></pre></td></tr></table></figure>
<p>结果验证了我们推断的正确性。现在再补充下docker组件的进程树模型,用以解释这个现象,模型如下:</p>
<figure class="highlight brainfuck"><table><tr><td class="code"><pre><span class="line"> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> </span><br><span class="line"> <span class="comment">|</span> <span class="comment">dockerd</span> <span class="comment">|</span> </span><br><span class="line"> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> </span><br><span class="line"> <span class="comment">|</span> </span><br><span class="line"> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> </span><br><span class="line"> <span class="comment">|</span> <span class="comment">containerd</span> <span class="comment">|</span> </span><br><span class="line"> <span class="literal">+</span>--<span class="literal">-</span><span class="comment">|</span>--<span class="comment">|</span>--<span class="literal">-</span><span class="comment">|</span>--<span class="literal">+</span> </span><br><span class="line"> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> <span class="comment">|</span> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> </span><br><span class="line"><span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span></span><br><span class="line"><span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span> <span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span> <span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span></span><br><span class="line"><span class="comment"></span><span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span> <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span></span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>dockerd进程启动时,会自动拉起containerd进程;当用户创建并启动容器时,containerd会启动containerd-shim进程用于托管容器进程,最终由containerd-shim调用runc启动容器进程。runc负责初始化进程命名空间,并exec容器启动命令。</p>
<p>上述模型中shim进程存在的意义是:允许dockerd/containerd升级或重启,同时不影响已运行容器。docker提供了 <code>live-restore</code> 的能力,而我们的集群也的确启用了该配置。</p>
<p>此外,由于我们在systemd的docker配置选项中配置了 <code>MountFlags=slave</code>,参考<a href="https://freedesktop.org/software/systemd/man/systemd.exec.html#MountFlags=">systemd配置说明</a>,systemd在启动dockerd进程时,会创建一个新的mnt命名空间。</p>
<p>至此,问题已基本定位清楚:</p>
<ul>
<li>systemd在启动dockerd服务时,将dockerd安置在一个新的mnt命名空间中</li>
<li>用户创建并启动容器时,dockerd会在本mnt命名空间内挂载容器读写层目录,并启动shim进程托管容器进程</li>
<li>由于某种原因,dockerd服务发生重启,systemd会将其安置在另一个新的mnt命名空间内</li>
<li>用户删除容器时,容器退出时,dockerd在清理容器读写层挂载时报错,因为挂载并非在当前dockerd的mnt命名空间内</li>
</ul>
<p>后来,我们在docker issue中也发现了<a href="https://github.com/moby/moby/issues/35873#issuecomment-386467562">官方给出的说明</a>,<code>MountFlags=slave</code> 与 <code>live-restore</code> 确实不能同时使用。</p>
<h4 id="一波又起"><a href="#一波又起" class="headerlink" title="一波又起"></a>一波又起</h4><p>还没当我们沉浸在解决问题的喜悦之中,另一个疑问接踵而来。我们线上集群好多宿主同时配置了 <code>MountFlags=slave</code> 和 <code>live-restore=true</code>,为什么问题直到最近才报出来呢?</p>
<p>当我们分析了几起 <code>Pod Terminating</code> 的涉事宿主后,发现它们的一个通性是docker版本为 <code>18.06.3-ce</code>,而我们当前主流的版本仍然是 <code>1.13.1</code>。</p>
<p>难道是新版本中才引入的问题?我们首先在测试环境中对 <code>1.13.1</code> 版本的docker进行了验证,Pod确实没有被阻塞在 Terminating 状态,这是不是说明低版本docker不存在挂载点泄漏的问题呢?</p>
<p>事实并非如此。当我们再次进行验证时,在删除Pod前记录了测试容器的读写层,之后发送删除Pod指令,Pod顺利退出,但此时,我们登录Pod之前所在宿主,发现docker日志中同样也存在如下日志:</p>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="attribute">Oct</span> <span class="number">14</span> <span class="number">22</span>:<span class="number">12</span>:<span class="number">43</span> hostname<span class="number">2</span> dockerd: time=<span class="string">"2020-10-14T22:12:43.730726978+08:00"</span> level=debug msg=<span class="string">"Failed to unmount fb41efa2cfcbfbb8d90bd1d8d77d299e17518829faf52af40f7a1552ec8aa165 overlay: /home/docker_rt/overlay2/fb41efa2cfcbfbb8d90bd1d8d77d299e17518829faf52af40f7a1552ec8aa165/merged - invalid argument"</span></span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>同样存在卸载问题的情况下,高低版本的docker却呈现出了不同的结果,这显然是docker的处理逻辑发生了变更,这里我们对比源码能够很快得出结论:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// 1.13.1 版本处理逻辑</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">cleanupContainer</span><span class="params">(container *container.Container, forceRemove, removeVolume <span class="keyword">bool</span>)</span> <span class="params">(err error)</span></span> {</span><br><span class="line"> <span class="comment">// If force removal is required, delete container from various</span></span><br><span class="line"> <span class="comment">// indexes even if removal failed.</span></span><br><span class="line"> <span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> {</span><br><span class="line"> <span class="keyword">if</span> err == <span class="literal">nil</span> || forceRemove {</span><br><span class="line"> daemon.nameIndex.Delete(container.ID)</span><br><span class="line"> daemon.linkIndex.<span class="built_in">delete</span>(container)</span><br><span class="line"> selinuxFreeLxcContexts(container.ProcessLabel)</span><br><span class="line"> daemon.idIndex.Delete(container.ID)</span><br><span class="line"> daemon.containers.Delete(container.ID)</span><br><span class="line"> <span class="keyword">if</span> e := daemon.removeMountPoints(container, removeVolume); e != <span class="literal">nil</span> {</span><br><span class="line"> logrus.Error(e)</span><br><span class="line"> }</span><br><span class="line"> daemon.LogContainerEvent(container, <span class="string">"destroy"</span>)</span><br><span class="line"> }</span><br><span class="line"> }()</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> err = os.RemoveAll(container.Root); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> fmt.Errorf(<span class="string">"Unable to remove filesystem for %v: %v"</span>, container.ID, err)</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// When container creation fails and `RWLayer` has not been created yet, we</span></span><br><span class="line"> <span class="comment">// do not call `ReleaseRWLayer`</span></span><br><span class="line"> <span class="keyword">if</span> container.RWLayer != <span class="literal">nil</span> {</span><br><span class="line"> metadata, err := daemon.layerStore.ReleaseRWLayer(container.RWLayer)</span><br><span class="line"> layer.LogReleaseMetadata(metadata)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> && err != layer.ErrMountDoesNotExist {</span><br><span class="line"> <span class="keyword">return</span> fmt.Errorf(<span class="string">"Driver %s failed to remove root filesystem %s: %s"</span>, daemon.GraphDriverName(), container.ID, err)</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment">// 18.06.3-ce 版本处理逻辑</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">cleanupContainer</span><span class="params">(container *container.Container, forceRemove, removeVolume <span class="keyword">bool</span>)</span> <span class="params">(err error)</span></span> {</span><br><span class="line"> <span class="comment">// When container creation fails and `RWLayer` has not been created yet, we</span></span><br><span class="line"> <span class="comment">// do not call `ReleaseRWLayer`</span></span><br><span class="line"> <span class="keyword">if</span> container.RWLayer != <span class="literal">nil</span> {</span><br><span class="line"> err := daemon.imageService.ReleaseLayer(container.RWLayer, container.OS)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> err = errors.Wrapf(err, <span class="string">"container %s"</span>, container.ID)</span><br><span class="line"> container.SetRemovalError(err)</span><br><span class="line"> <span class="keyword">return</span> err</span><br><span class="line"> }</span><br><span class="line"> container.RWLayer = <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> err := system.EnsureRemoveAll(container.Root); err != <span class="literal">nil</span> {</span><br><span class="line"> e := errors.Wrapf(err, <span class="string">"unable to remove filesystem for %s"</span>, container.ID)</span><br><span class="line"> container.SetRemovalError(e)</span><br><span class="line"> <span class="keyword">return</span> e</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> linkNames := daemon.linkIndex.<span class="built_in">delete</span>(container)</span><br><span class="line"> selinuxFreeLxcContexts(container.ProcessLabel)</span><br><span class="line"> daemon.idIndex.Delete(container.ID)</span><br><span class="line"> daemon.containers.Delete(container.ID)</span><br><span class="line"> daemon.containersReplica.Delete(container)</span><br><span class="line"> <span class="keyword">if</span> e := daemon.removeMountPoints(container, removeVolume); e != <span class="literal">nil</span> {</span><br><span class="line"> logrus.Error(e)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> _, name := <span class="keyword">range</span> linkNames {</span><br><span class="line"> daemon.releaseName(name)</span><br><span class="line"> }</span><br><span class="line"> container.SetRemoved()</span><br><span class="line"> stateCtr.del(container.ID)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>改动一目了然,官方在<a href="https://github.com/moby/moby/pull/31012">清理容器变更</a>中给出了详细的说明。也即在低版本docker中,问题并非不存在,仅仅是被隐藏了,并在高版本中被暴露出来。</p>
<h3 id="问题影响"><a href="#问题影响" class="headerlink" title="问题影响"></a>问题影响</h3><p>既然所有版本的docker都存在这个问题,那么其影响是什么呢?</p>
<p>在高版本docker中,其影响是显式的,会引起容器清理失败,进而造成Pod删除失败。</p>
<p>而在低版本docker中,其影响是隐式的,造成挂载点泄漏,进而可能会造成的影响如下:</p>
<ul>
<li>inode被打满:由于挂载点泄漏,容器读写层不会被清理,长时间累计可能会造成inode耗尽问题,但是是小概率事件</li>
<li>容器ID复用:由于挂载点未被卸载,当docker复用了原来已经退出的容器ID时,在挂载容器init层与读写层时会失败。由于docker生成容器ID是随机的,因此也是小概率事件</li>
</ul>
<h3 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h3><p>问题已然明确,如何解决问题成了当务之急。思路有二:</p>
<ol>
<li>治标:对标 <code>1.13.1</code> 版本的处理逻辑,修改 <code>18.06.3-ce</code> 处理代码</li>
<li>治本:既然官方也提及 <code>MountFlags=slave</code> 与 <code>live-restore</code> 不能同时使用,那么我们修改两个配置选项之一即可</li>
</ol>
<p>考虑到 <strong>重启docker不重启容器</strong> 这样一个强需求的存在,似乎我们唯一的解决方案就是关闭 <code>MountFlags=slave</code> 配置。关闭该配置后,与之而来的疑问如下:</p>
<ul>
<li>能否解决本问题?</li>
<li>如网络所传,其他systemd托管服务启用PrivateTmp是否会造成挂载点泄漏?</li>
</ul>
<p>预知后事如何,且听下回分解!</p>
<p><a href="https://plpan.github.io/%E5%88%A0%E9%99%A4%E5%AE%B9%E5%99%A8%E6%8A%A5%E9%94%99-device-or-resource-busy-%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">传送门</a></p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>docker</tag>
</tags>
</entry>
<entry>
<title>删除容器报错 device or resource busy 问题排查之旅</title>
<url>/%E5%88%A0%E9%99%A4%E5%AE%B9%E5%99%A8%E6%8A%A5%E9%94%99-device-or-resource-busy-%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</url>
<content><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>承接<a href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">上文</a>,近期我们排查弹性云线上几起故障时,故障由多个因素共同引起,列举如下:</p>
<ul>
<li>弹性云在逐步灰度升级docker版本至 <code>18.06.3-ce</code></li>
<li>由于历史原因,弹性云启用了docker服务的systemd配置选项 <code>MountFlags=slave</code></li>
<li>为了避免dockerd重启引起业务容器重建,弹性云启用了 <code>live-restore=true</code> 配置,docker服务发生重启,dockerd与shim进程mnt ns不一致</li>
</ul>
<p>在以上三个因素合力作用下,线上容器在重建与漂移场景下,出现删除失败的事件。</p>
<p>同样,文章最后也给出了两种解决方案:</p>
<ul>
<li>长痛:修改代码,忽略错误</li>
<li>短痛:修改配置,一劳永逸</li>
</ul>
<p>作为优秀的社会主义接班人,我们当然选择短痛了!依据官方提示 <code>MountFlags=slave</code> 与 <code>live-restore=true</code> 不能协同工作,那么我们只需关闭二者之一就能解决问题。</p>
<p>与我们而言,docker提供的 <code>live-restore</code> 能力是一个很关键的特性。docker重启的原因多种多样,既可能是人为调试因素,也可能是机器的非预期行为,当docker重启后,我们并不希望用户的容器也发生重建。似乎关闭 <code>MountFlags=slave</code> 成了我们唯一的选择。</p>
<p>等等,回想一下<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>,别人正是为了避免docker挂载泄漏而引起删除容器失败才开启的这个特性。</p>
<p>但是,这个17年的结论真的还具有普适性吗?是与不是,我们亲自验证即可。</p>
<h3 id="对比实验"><a href="#对比实验" class="headerlink" title="对比实验"></a>对比实验</h3><p>为了验证在关闭 <code>MountFlags=slave</code> 选项后,docker是否存在挂载点泄漏的问题,我们分别挑选了一台 <code>1.13.1</code> 与 <code>18.06.3-ce</code> 的宿主进行实验。实验步骤正如<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>所提示,在验证之前,环境准备如下:</p>
<ul>
<li>删除docker服务的systemd配置项 <code>MountFlags=slave</code></li>
<li>挑选启用systemd配置项 <code>PrivateTmp=true</code> 的任意服务,本文以 <code>httpd</code> 为例</li>
</ul>
<p>下面开始验证:</p>
<figure class="highlight awk"><table><tr><td class="code"><pre><span class="line"><span class="regexp">//</span><span class="regexp">//</span><span class="regexp">//</span> docker <span class="number">1.13</span>.<span class="number">1</span> 验证步骤及结果</span><br><span class="line"><span class="regexp">//</span> <span class="number">1</span>. 重新加载配置</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl daemon-reload</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">2</span>. 重启docker</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart docker</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">3</span>. 创建容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker run -d nginx</span><br><span class="line">c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">4</span>. 重启httpd</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">5</span>. 停止容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker stop c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line">c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">6</span>. 清理容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker rm c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line">Error response from daemon: Driver overlay2 failed to remove root filesystem c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c: remove <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged: device or resource busy</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">7</span>. 定位挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rwn <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346<span class="regexp">/merged /</span>proc<span class="regexp">/*/m</span>ountinfo</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19973</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19974</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19975</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19976</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19977</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19978</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">8</span>. 定位目标进程</span><br><span class="line">[stupig@hostname2 ~]$ ps -ef|egrep <span class="string">'19973|19974|19975|19976|19977|19978'</span></span><br><span class="line">root <span class="number">19973</span> <span class="number">1</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache <span class="number">19974</span> <span class="number">19973</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache <span class="number">19975</span> <span class="number">19973</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache <span class="number">19976</span> <span class="number">19973</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache <span class="number">19977</span> <span class="number">19973</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache <span class="number">19978</span> <span class="number">19973</span> <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br></pre></td></tr></table></figure>
<p>docker <code>1.13.1</code> 版本的实验结果正如网文所料,容器读写层挂载点出现了泄漏,并且 <code>docker rm</code> 无法清理该容器(注意 <code>docker rm -f</code> 仍然可以清理,原因参考上文)。</p>
<p>弹性云启用docker配置 <code>MountFlags=slave</code> 也是为了避免该问题发生。</p>
<p>那么现在压力转移到 docker <code>18.06.3-ce</code> 这边来了,新版本是否仍然存在这个问题呢?</p>
<figure class="highlight llvm"><table><tr><td class="code"><pre><span class="line">////// docker <span class="number">18.06</span>.<span class="number">3</span>-ce 验证步骤及结果</span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl daemon-reload</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl restart docker</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker run -d nginx</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker stop <span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker rm <span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br></pre></td></tr></table></figure>
<p>针对docker <code>18.06.3-ce</code> 的实验非常丝滑顺畅,不存在任何问题。回顾上文知识点,当容器读写层挂载点出现泄漏后,docker <code>18.06.3-ce</code> 清理容器必定失败,而现在的结果却成功了,说明容器读写层挂载点没有泄漏。</p>
<p>这简直就是黎明的曙光。</p>
<h3 id="蛛丝马迹"><a href="#蛛丝马迹" class="headerlink" title="蛛丝马迹"></a>蛛丝马迹</h3><p>上一节对比实验的结果给了我们莫大的鼓励,本节我们探索两个版本的docker的表现差异,以期定位症结所在。</p>
<p>既然核心问题在于挂载点是否被泄漏,那么我们就以挂载点为切入点,深入分析两个版本docker的差异性。我们对比在两个环境下执行完 <code>步骤4</code> 后,不同进程内的挂载详情,结果如下:</p>
<figure class="highlight gradle"><table><tr><td class="code"><pre><span class="line"><span class="comment">// docker 1.13.1</span></span><br><span class="line">[stupig@hostname2 ~]$ sudo docker run -d nginx</span><br><span class="line"><span class="number">0</span>fe8d412f99a53229ea0df3ec44c93496e150a39f724ea304adb7f924910d61b</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ~]$ sudo docker <span class="keyword">inspect</span> -f {{.GraphDriver.Data.MergedDir}} <span class="number">0</span>fe8d412f99a53229ea0df3ec44c93496e150a39f724ea304adb7f924910d61b</span><br><span class="line"><span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/merged</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 共享命名空间</span></span><br><span class="line">[stupig@hostname2 ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">223</span> <span class="number">1143</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ps -ef|<span class="keyword">grep</span> httpd|head -n <span class="number">1</span></span><br><span class="line">root <span class="number">16715</span> <span class="number">1</span> <span class="number">2</span> <span class="number">16</span>:<span class="number">09</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line"> </span><br><span class="line"><span class="comment">// httpd进程命名空间</span></span><br><span class="line">[stupig@hostname2 ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/16715/m</span>ountinfo</span><br><span class="line"><span class="number">257</span> <span class="number">235</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime shared:<span class="number">123</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<figure class="highlight gradle"><table><tr><td class="code"><pre><span class="line"><span class="comment">// docker 18.06.3-ce</span></span><br><span class="line">[stupig@hostname ~]$ sudo docker run -d nginx</span><br><span class="line">ce75d4fdb6df6d13a7bf4270f71b3752ee2d3849df1f64d5d5d19a478ac7db8d</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ sudo docker <span class="keyword">inspect</span> -f {{.GraphDriver.Data.MergedDir}} ce75d4fdb6df6d13a7bf4270f71b3752ee2d3849df1f64d5d5d19a478ac7db8d</span><br><span class="line"><span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/merged</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 共享命名空间</span></span><br><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">218</span> <span class="number">43</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ ps -ef|<span class="keyword">grep</span> httpd|head -n <span class="number">1</span></span><br><span class="line">root <span class="number">63694</span> <span class="number">1</span> <span class="number">0</span> <span class="number">16</span>:<span class="number">14</span> ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line"> </span><br><span class="line"><span class="comment">// httpd进程命名空间</span></span><br><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/63694/m</span>ountinfo</span><br><span class="line"><span class="number">435</span> <span class="number">376</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">122</span> master:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br></pre></td></tr></table></figure>
<p>咋一看,好像没啥区别啊!睁大你们的火眼金睛,是否发现差异所在了?</p>
<p>如果细心对比,还是很容易分辨出差异所在的:</p>
<ul>
<li>共享命名空间中<ul>
<li>docker <code>18.06.3-ce</code> 版本创建的挂载点是shared的</li>
<li>而docker <code>1.13.1</code> 版本创建的挂载点是private的</li>
</ul>
</li>
<li>httpd进程命名空间中<ul>
<li>docker <code>18.06.3-ce</code> 创建的挂载点仍然是共享的,并且接收共享组109传递的挂载与卸载事件,注意:共享组109正好就是共享命名空间中对应的挂载点</li>
<li>而docker <code>1.13.1</code> 版本创建的挂载点虽然也是共享的,但是却与共享命名空间中对应的挂载点没有关联关系</li>
</ul>
</li>
</ul>
<p>可能会有用户不禁要问:怎么分辨挂载点是什么类型?以及不同类型挂载点的传递属性呢?请参阅:<a href="https://man7.org/linux/man-pages/man7/mount_namespaces.7.html">mount命名空间说明文档</a>。</p>
<p>问题已然明了,由于两个版本docker所创建的容器读写层挂载点具备不同的属性,导致它们之间的行为差异。</p>
<h3 id="刨根问底"><a href="#刨根问底" class="headerlink" title="刨根问底"></a>刨根问底</h3><p>相信大家如果理解了上一节的内容,就已经了解了问题的本质。本节我们继续探索问题的根因。</p>
<p>为什么两个版本的docker行为表现不一致?不外乎两个主要原因:</p>
<ol>
<li>docker处理逻辑发生变动</li>
<li>宿主环境不一致,主要指内核</li>
</ol>
<p>第二个因素很好排除,我们对比了两个测试环境的宿主内核版本,结果是一致的。所以,基本还是因docker代码升级而产生的行为不一致。理论上,我们只需逐个分析docker <code>1.13.1</code> 与 docker <code>18.06.3-ce</code> 两个版本间的所有提交记录,就一定能够定位到关键提交信息,大力总是会出现奇迹。</p>
<p>但是,我们还是希望能够从现场中发现有用信息,缩小检索范围。</p>
<p>仍然从挂载点切入,既然两个版本的docker所创建的挂载点在共享命名空间中就已经出现差异,我们顺藤摸瓜,找找容器读写层挂载点链路上是否存在差异:</p>
<figure class="highlight awk"><table><tr><td class="code"><pre><span class="line"><span class="regexp">//</span> docker <span class="number">1.13</span>.<span class="number">1</span></span><br><span class="line"><span class="regexp">//</span> 本挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">223</span> <span class="number">1143</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 定位本挂载点的父挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">1143</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">1143</span> <span class="number">44</span> <span class="number">8</span>:<span class="number">4</span> <span class="regexp">/docker_rt/</span>overlay2 <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2 rw,relatime - xfs /</span>dev/sda4 rw,attr2,inode64,logbsize=<span class="number">256</span>k,sunit=<span class="number">512</span>,swidth=<span class="number">512</span>,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续定位祖父挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">44</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">44</span> <span class="number">39</span> <span class="number">8</span>:<span class="number">4</span> <span class="regexp">/ /</span>home rw,relatime shared:<span class="number">28</span> - xfs <span class="regexp">/dev/</span>sda4 rw,attr2,inode64,logbsize=<span class="number">256</span>k,sunit=<span class="number">512</span>,swidth=<span class="number">512</span>,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续往上</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">39</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">39</span> <span class="number">1</span> <span class="number">8</span>:<span class="number">3</span> <span class="regexp">/ /</span> rw,relatime shared:<span class="number">1</span> - ext4 <span class="regexp">/dev/</span>sda3 rw,stripe=<span class="number">64</span>,data=ordered</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> docker <span class="number">18.06</span>.<span class="number">3</span>-ce</span><br><span class="line"><span class="regexp">//</span> 本挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">218</span> <span class="number">43</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 定位本挂在点的父挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="number">43</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">43</span> <span class="number">61</span> <span class="number">8</span>:<span class="number">17</span> <span class="regexp">/ /</span>home rw,noatime shared:<span class="number">29</span> - xfs <span class="regexp">/dev/</span>sdb1 rw,attr2,nobarrier,inode64,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续定位祖父挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="number">61</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">61</span> <span class="number">1</span> <span class="number">8</span>:<span class="number">3</span> <span class="regexp">/ /</span> rw,relatime shared:<span class="number">1</span> - ext4 <span class="regexp">/dev/</span>sda3 rw,data=ordered</span><br></pre></td></tr></table></figure>
<p>两个版本的docker所创建的容器读写层挂载点链路上差异还是非常明显的:</p>
<ul>
<li>容器读写层挂载点的父级挂载点不同<ul>
<li>docker <code>18.06.3-ce</code> 创建的容器读写层挂载点的父级挂载点是 <code>/home/</code> ,并且是共享的</li>
<li>docker <code>1.13.1</code> 创建的容器读写层挂载点的父级挂载点是 <code>/home/docker_rt/overlay2</code> ,并且是私有的</li>
</ul>
</li>
</ul>
<p>这里补充一个背景,弹性云机器在初始化阶段,会将 <code>/home</code> 初始化为xfs文件系统类型,因此所有宿主上 <code>/home</code> 挂载点都具备相同属性。</p>
<p>那么,问题基本就是由 docker <code>1.13.1</code> 中多出的一层挂载层 <code>/home/docker_rt/overlay2</code> 引起。</p>
<p>如何验证这个猜想呢?现在,其实我们已经具备了检索代码的关键目标,docker <code>1.13.1</code> 会设置容器镜像层根目录的传递属性。拿着这个先验知识,我们直接查代码,检索过程基本没费什么功夫,直接展示相关代码:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// filepath: daemon/graphdriver/overlay2/overlay.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">init</span><span class="params">()</span></span> {</span><br><span class="line"> graphdriver.Register(driverName, Init)</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">Init</span><span class="params">(home <span class="keyword">string</span>, options []<span class="keyword">string</span>, uidMaps, gidMaps []idtools.IDMap)</span> <span class="params">(graphdriver.Driver, error)</span></span> {</span><br><span class="line"> <span class="keyword">if</span> err := mount.MakePrivate(home); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> supportsDType, err := fsutils.SupportsDType(home)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> !supportsDType {</span><br><span class="line"> <span class="comment">// not a fatal error until v1.16 (#27443)</span></span><br><span class="line"> logrus.Warn(overlayutils.ErrDTypeNotSupported(<span class="string">"overlay2"</span>, backingFs))</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> d := &Driver{</span><br><span class="line"> home: home,</span><br><span class="line"> uidMaps: uidMaps,</span><br><span class="line"> gidMaps: gidMaps,</span><br><span class="line"> ctr: graphdriver.NewRefCounter(graphdriver.NewFsChecker(graphdriver.FsMagicOverlay)),</span><br><span class="line"> supportsDType: supportsDType,</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> d.naiveDiff = graphdriver.NewNaiveDiffDriver(d, uidMaps, gidMaps)</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> backingFs == <span class="string">"xfs"</span> {</span><br><span class="line"> <span class="comment">// Try to enable project quota support over xfs.</span></span><br><span class="line"> <span class="keyword">if</span> d.quotaCtl, err = quota.NewControl(home); err == <span class="literal">nil</span> {</span><br><span class="line"> projectQuotaSupported = <span class="literal">true</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> d, <span class="literal">nil</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>很明显,问题就出在 <code>mount.MakePrivate</code> 函数调用上。</p>
<p>官方将 <code>GraphDriver</code> 根目录设置为 <code>Private</code>,本意是为了避免容器读写层挂载点泄漏。那为什么在高版本中去掉了这个逻辑呢?显然官方也意识到这么做并不能实现期望的目的,官方也在<a href="https://github.com/moby/moby/pull/36047">修复</a>中给出了详细说明。</p>
<p>实际上,不设置 <code>GraphDriver</code> 根目录的传播属性,反而能避免绝大多数挂载点泄漏的问题。。。</p>
<h3 id="结语"><a href="#结语" class="headerlink" title="结语"></a>结语</h3><p>现在,我们已经了解了问题的来龙去脉,我们总结下问题的解决方案:</p>
<ul>
<li>针对 <code>1.13.1</code> 版本docker,存量宿主较多,我们可以忽略 <code>device or resource busy</code> 问题,基本也不会给线上服务带来什么影响</li>
<li>针对 <code>18.06.3-ce</code> 版本docker,存量宿主较少,我们删除docker服务的systemd配置项 <code>MountFlags</code>,通过故障自愈解决docker卡在问题</li>
<li>针对增量宿主,全部删除docker服务的systemd配置项 <code>MountFlags</code></li>
</ul>
<p>最后,告诫大家不要迷信网络解决方案,甚至是官方。</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>docker</tag>
<tag>linux</tag>
</tags>
</entry>
<entry>
<title>docker hang 死排查之旅</title>
<url>/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</url>
<content><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>最近,我们在升级kubelet时,发现部分宿主机上docker出现hang死的现象,发现过程详见:<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/">docker-hang-死阻塞-kubelet-初始化流程</a>。</p>
<p>现在,我们聚焦在docker上,分析docker hang死问题发生时的现象、形成的原因、问题定位的方法,以及对应的解决办法。本文详细记录了整个过程。</p>
<h3 id="docker-hang死"><a href="#docker-hang死" class="headerlink" title="docker hang死"></a>docker hang死</h3><p>我们对docker hang死并不陌生,因为已经发生了好多起。其发生时的现象也多种多样。以往针对docker 1.13.1版本的排查都发现了一些线索,但是并没有定位到根因,最终绝大多数也是通过重启docker解决。而这一次发生在docker 18.06.3版本的docker hang死行为,经过我们4人小分队接近一周的望闻问切,终于确定了其病因。注意,docker hang死的原因远不止一种,因此本排查方法与结果并不具有普适性。</p>
<p>在开始问题排查之前,我们先整理目前掌握的知识:</p>
<ul>
<li>特定容器异常,无法响应docker inspect操作</li>
</ul>
<p>除此之外的信息,我们则一无所知。</p>
<p>当我们排查一个未知的问题时,一般的做法是先找一个切入点,然后顺藤摸瓜,逐步缩小问题排查的圈定范围,并最终在细枝末节上定位问题的所在。而本问题中,docker显然是我们唯一的切入点。</p>
<h4 id="链路跟踪"><a href="#链路跟踪" class="headerlink" title="链路跟踪"></a>链路跟踪</h4><p>首先,我们希望对docker运行的全局状况有一个大致的了解,熟悉go语言开发的用户自然能联想到调试神器pprof。我们借助pprof描绘出了docker当时运行的蓝图:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">goroutine profile: total <span class="number">722373</span></span><br><span class="line"><span class="number">717594</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2132ef</span> <span class="number">0x7fe8bc238c1a</span> <span class="number">0x7fe8bd56f7fe</span> <span class="number">0x7fe8bd56f6bd</span> <span class="number">0x7fe8bcea8719</span> <span class="number">0x7fe8bcea938b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line"># <span class="number">0x7fe8bc2132ee</span> sync.runtime_SemacquireMutex+<span class="number">0x3e</span> /usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">71</span></span><br><span class="line"># <span class="number">0x7fe8bc238c19</span> sync.(*Mutex).Lock+<span class="number">0x109</span> /usr/local/<span class="keyword">go</span>/src/sync/mutex.<span class="keyword">go</span>:<span class="number">134</span></span><br><span class="line"># <span class="number">0x7fe8bd56f7fd</span> github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+<span class="number">0x8d</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">40</span></span><br><span class="line"># <span class="number">0x7fe8bd56f6bc</span> github.com/docker/docker/daemon.(*Daemon).ContainerInspect+<span class="number">0x11c</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bcea8718</span> github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+<span class="number">0x118</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.<span class="keyword">go</span>:<span class="number">15</span></span><br><span class="line"># <span class="number">0x7fe8bcea938a</span> github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+<span class="number">0x6a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">39</span></span><br><span class="line"># <span class="number">0x7fe8bcb726c9</span> github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line"># <span class="number">0x7fe8bcb72b00</span> github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line"># <span class="number">0x7fe8bc71c26a</span> github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line"># <span class="number">0x7fe8bcb85f49</span> github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line"># <span class="number">0x7fe8bc4b9895</span> net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line"># <span class="number">0x7fe8bc72a437</span> github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line"># <span class="number">0x7fe8bcb849e1</span> github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bc4bc67d</span> net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line"># <span class="number">0x7fe8bc4b88a2</span> net/http.(*conn).serve+<span class="number">0x652</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">4175</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2132ef</span> <span class="number">0x7fe8bc238c1a</span> <span class="number">0x7fe8bcc2eccf</span> <span class="number">0x7fe8bd597af4</span> <span class="number">0x7fe8bcea2456</span> <span class="number">0x7fe8bcea956b</span> <span class="number">0x7fe8bcb73dff</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line"># <span class="number">0x7fe8bc2132ee</span> sync.runtime_SemacquireMutex+<span class="number">0x3e</span> /usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">71</span></span><br><span class="line"># <span class="number">0x7fe8bc238c19</span> sync.(*Mutex).Lock+<span class="number">0x109</span> /usr/local/<span class="keyword">go</span>/src/sync/mutex.<span class="keyword">go</span>:<span class="number">134</span></span><br><span class="line"># <span class="number">0x7fe8bcc2ecce</span> github.com/docker/docker/container.(*State).IsRunning+<span class="number">0x2e</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.<span class="keyword">go</span>:<span class="number">240</span></span><br><span class="line"># <span class="number">0x7fe8bd597af3</span> github.com/docker/docker/daemon.(*Daemon).ContainerStats+<span class="number">0xb3</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.<span class="keyword">go</span>:<span class="number">30</span></span><br><span class="line"># <span class="number">0x7fe8bcea2455</span> github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats+<span class="number">0x1e5</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.<span class="keyword">go</span>:<span class="number">115</span></span><br><span class="line"># <span class="number">0x7fe8bcea956a</span> github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm+<span class="number">0x6a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">42</span></span><br><span class="line"># <span class="number">0x7fe8bcb73dfe</span> github.com/docker/docker/api/server/router.cancellableHandler.func1+<span class="number">0xce</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.<span class="keyword">go</span>:<span class="number">92</span></span><br><span class="line"># <span class="number">0x7fe8bcb726c9</span> github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line"># <span class="number">0x7fe8bcb72b00</span> github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line"># <span class="number">0x7fe8bc71c26a</span> github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line"># <span class="number">0x7fe8bcb85f49</span> github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line"># <span class="number">0x7fe8bc4b9895</span> net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line"># <span class="number">0x7fe8bc72a437</span> github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line"># <span class="number">0x7fe8bcb849e1</span> github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bc4bc67d</span> net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line"># <span class="number">0x7fe8bc4b88a2</span> net/http.(*conn).serve+<span class="number">0x652</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2131fb</span> <span class="number">0x7fe8bc239a3b</span> <span class="number">0x7fe8bcbb679d</span> <span class="number">0x7fe8bcc26774</span> <span class="number">0x7fe8bd570b20</span> <span class="number">0x7fe8bd56f81c</span> <span class="number">0x7fe8bd56f6bd</span> <span class="number">0x7fe8bcea8719</span> <span class="number">0x7fe8bcea938b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line"># <span class="number">0x7fe8bc2131fa</span> sync.runtime_Semacquire+<span class="number">0x3a</span> /usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">56</span></span><br><span class="line"># <span class="number">0x7fe8bc239a3a</span> sync.(*RWMutex).RLock+<span class="number">0x4a</span> /usr/local/<span class="keyword">go</span>/src/sync/rwmutex.<span class="keyword">go</span>:<span class="number">50</span></span><br><span class="line"># <span class="number">0x7fe8bcbb679c</span> github.com/docker/docker/daemon/exec.(*Store).List+<span class="number">0x4c</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.<span class="keyword">go</span>:<span class="number">140</span></span><br><span class="line"># <span class="number">0x7fe8bcc26773</span> github.com/docker/docker/container.(*Container).GetExecIDs+<span class="number">0x33</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.<span class="keyword">go</span>:<span class="number">423</span></span><br><span class="line"># <span class="number">0x7fe8bd570b1f</span> github.com/docker/docker/daemon.(*Daemon).getInspectData+<span class="number">0x5cf</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">178</span></span><br><span class="line"># <span class="number">0x7fe8bd56f81b</span> github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+<span class="number">0xab</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">42</span></span><br><span class="line"># <span class="number">0x7fe8bd56f6bc</span> github.com/docker/docker/daemon.(*Daemon).ContainerInspect+<span class="number">0x11c</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bcea8718</span> github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+<span class="number">0x118</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.<span class="keyword">go</span>:<span class="number">15</span></span><br><span class="line"># <span class="number">0x7fe8bcea938a</span> github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+<span class="number">0x6a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">39</span></span><br><span class="line"># <span class="number">0x7fe8bcb726c9</span> github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line"># <span class="number">0x7fe8bcb72b00</span> github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line"># <span class="number">0x7fe8bc71c26a</span> github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line"># <span class="number">0x7fe8bcb85f49</span> github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line"># <span class="number">0x7fe8bc4b9895</span> net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line"># <span class="number">0x7fe8bc72a437</span> github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line"># <span class="number">0x7fe8bcb849e1</span> github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bc4bc67d</span> net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line"># <span class="number">0x7fe8bc4b88a2</span> net/http.(*conn).serve+<span class="number">0x652</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc212946</span> <span class="number">0x7fe8bc8b6881</span> <span class="number">0x7fe8bc8b699d</span> <span class="number">0x7fe8bc8e259b</span> <span class="number">0x7fe8bc8e1695</span> <span class="number">0x7fe8bc8c47d5</span> <span class="number">0x7fe8bd2e0c06</span> <span class="number">0x7fe8bd2eda96</span> <span class="number">0x7fe8bc8c42fb</span> <span class="number">0x7fe8bc8c4613</span> <span class="number">0x7fe8bd2a6474</span> <span class="number">0x7fe8bd2e6976</span> <span class="number">0x7fe8bd3661c5</span> <span class="number">0x7fe8bd56842f</span> <span class="number">0x7fe8bcea7bdb</span> <span class="number">0x7fe8bcea9f6b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line"># <span class="number">0x7fe8bc8b6880</span> github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+<span class="number">0x100</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.<span class="keyword">go</span>:<span class="number">222</span></span><br><span class="line"># <span class="number">0x7fe8bc8b699c</span> github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress+<span class="number">0x2c</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.<span class="keyword">go</span>:<span class="number">233</span></span><br><span class="line"># <span class="number">0x7fe8bc8e259a</span> github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+<span class="number">0x63a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.<span class="keyword">go</span>:<span class="number">515</span></span><br><span class="line"># <span class="number">0x7fe8bc8e1694</span> github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+<span class="number">0x44</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.<span class="keyword">go</span>:<span class="number">395</span></span><br><span class="line"># <span class="number">0x7fe8bc8c47d4</span> github.com/docker/docker/vendor/google.golang.org/grpc.invoke+<span class="number">0x184</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line"># <span class="number">0x7fe8bd2e0c05</span> github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary+<span class="number">0xf5</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.<span class="keyword">go</span>:<span class="number">35</span></span><br><span class="line"># <span class="number">0x7fe8bd2eda95</span> github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm+<span class="number">0xf5</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.<span class="keyword">go</span>:<span class="number">51</span></span><br><span class="line"># <span class="number">0x7fe8bc8c42fa</span> github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke+<span class="number">0x10a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">35</span></span><br><span class="line"># <span class="number">0x7fe8bc8c4612</span> github.com/docker/docker/vendor/google.golang.org/grpc.Invoke+<span class="number">0xc2</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">60</span></span><br><span class="line"># <span class="number">0x7fe8bd2a6473</span> github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start+<span class="number">0xd3</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">421</span></span><br><span class="line"># <span class="number">0x7fe8bd2e6975</span> github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start+<span class="number">0xf5</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.<span class="keyword">go</span>:<span class="number">109</span></span><br><span class="line"># <span class="number">0x7fe8bd3661c4</span> github.com/docker/docker/libcontainerd.(*client).Exec+<span class="number">0x4b4</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.<span class="keyword">go</span>:<span class="number">381</span></span><br><span class="line"># <span class="number">0x7fe8bd56842e</span> github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+<span class="number">0xb4e</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line"># <span class="number">0x7fe8bcea7bda</span> github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+<span class="number">0x34a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.<span class="keyword">go</span>:<span class="number">125</span></span><br><span class="line"># <span class="number">0x7fe8bcea9f6a</span> github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+<span class="number">0x6a</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line"># <span class="number">0x7fe8bcb726c9</span> github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line"># <span class="number">0x7fe8bcb72b00</span> github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line"># <span class="number">0x7fe8bc71c26a</span> github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line"># <span class="number">0x7fe8bcb85f49</span> github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line"># <span class="number">0x7fe8bc4b9895</span> net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line"># <span class="number">0x7fe8bc72a437</span> github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line"># <span class="number">0x7fe8bcb849e1</span> github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span> /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7fe8bc4bc67d</span> net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line"># <span class="number">0x7fe8bc4b88a2</span> net/http.(*conn).serve+<span class="number">0x652</span> /usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br></pre></td></tr></table></figure>
<p>注意,这是一份精简后的docker协程栈信息。从上面的蓝图,我们可以总结出如下结论:</p>
<ul>
<li>有 717594 个协程被阻塞在docker inspect</li>
<li>有 4175 个协程被阻塞在docker stats</li>
<li>有 1 个协程被阻塞在获取 docker exec的任务ID</li>
<li>有 1 个协程被阻塞在docker exec的执行过程</li>
</ul>
<p>从上面的结论,我们基本了解了异常容器hang死的原因:在于该容器执行docker exec (4)后未返回,进而导致获取docker exec的任务ID (3)阻塞,由于(3)获取了容器锁,进而导致了docker inspect (1)与docker stats (2)卡死。所以病因并非是docker inspect,而是docker exec。</p>
<p>要想继续往下挖掘,我们现在有必要补充一下背景知识。kubelet启动容器或者在容器内执行命令的完整调用路径如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">+--------------------------------------------------------------+</span><br><span class="line">| |</span><br><span class="line">| +------------+ |</span><br><span class="line">| | | |</span><br><span class="line">| | kubelet | |</span><br><span class="line">| | | |</span><br><span class="line">| +------|-----+ |</span><br><span class="line">| | |</span><br><span class="line">| | |</span><br><span class="line">| +------v-----+ +---------------+ |</span><br><span class="line">| | | | | |</span><br><span class="line">| | dockerd ------->| containerd | |</span><br><span class="line">| | | | | |</span><br><span class="line">| +------------+ +-------|-------+ |</span><br><span class="line">| | |</span><br><span class="line">| | |</span><br><span class="line">| +-------v-------+ +-----------+ |</span><br><span class="line">| | | | | |</span><br><span class="line">| |containerd-shim----->| runc | |</span><br><span class="line">| | | | | |</span><br><span class="line">| +---------------+ +-----------+ |</span><br><span class="line">| |</span><br><span class="line">+--------------------------------------------------------------+</span><br></pre></td></tr></table></figure>
<p>dockerd与containerd可以当做两层nginx代理,containerd-shim是容器的监护人,而runc则是容器启动与命令执行的真正工具人。runc干的事情也非常简单:按照用户指定的配置创建NS,或者进入特定NS,然后执行用户命令。说白了,创建容器就是新建NS,然后在该NS内执行用户指定的命令。</p>
<p>按照上面介绍的背景知识,我们继续往下探索containerd。幸运的是,借助pprof,我们也可以描绘出containerd此时的运行蓝图:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">goroutine profile: total <span class="number">430</span></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7f6e55f82740</span> <span class="number">0x7f6e55f92616</span> <span class="number">0x7f6e56a8412c</span> <span class="number">0x7f6e56a83d6d</span> <span class="number">0x7f6e56a911bf</span> <span class="number">0x7f6e56ac6e3b</span> <span class="number">0x7f6e565093de</span> <span class="number">0x7f6e5650dd3b</span> <span class="number">0x7f6e5650392b</span> <span class="number">0x7f6e56b51216</span> <span class="number">0x7f6e564e5909</span> <span class="number">0x7f6e563ec76a</span> <span class="number">0x7f6e563f000a</span> <span class="number">0x7f6e563f6791</span> <span class="number">0x7f6e55fb0151</span></span><br><span class="line"># <span class="number">0x7f6e56a8412b</span> github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch+<span class="number">0x24b</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.<span class="keyword">go</span>:<span class="number">102</span></span><br><span class="line"># <span class="number">0x7f6e56a83d6c</span> github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call+<span class="number">0x15c</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.<span class="keyword">go</span>:<span class="number">73</span></span><br><span class="line"># <span class="number">0x7f6e56a911be</span> github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start+<span class="number">0xbe</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.<span class="keyword">go</span>:<span class="number">1745</span></span><br><span class="line"># <span class="number">0x7f6e56ac6e3a</span> github.com/containerd/containerd/linux.(*Process).Start+<span class="number">0x8a</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/linux/process.<span class="keyword">go</span>:<span class="number">125</span></span><br><span class="line"># <span class="number">0x7f6e565093dd</span> github.com/containerd/containerd/services/tasks.(*local).Start+<span class="number">0x14d</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/services/tasks/local.<span class="keyword">go</span>:<span class="number">187</span></span><br><span class="line"># <span class="number">0x7f6e5650dd3a</span> github.com/containerd/containerd/services/tasks.(*service).Start+<span class="number">0x6a</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/services/tasks/service.<span class="keyword">go</span>:<span class="number">72</span></span><br><span class="line"># <span class="number">0x7f6e5650392a</span> github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1+<span class="number">0x8a</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">624</span></span><br><span class="line"># <span class="number">0x7f6e56b51215</span> github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/<span class="keyword">go</span>-grpc-prometheus.UnaryServerInterceptor+<span class="number">0xa5</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/<span class="keyword">go</span>-grpc-prometheus/server.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line"># <span class="number">0x7f6e564e5908</span> github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler+<span class="number">0x168</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">626</span></span><br><span class="line"># <span class="number">0x7f6e563ec769</span> github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+<span class="number">0x849</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">920</span></span><br><span class="line"># <span class="number">0x7f6e563f0009</span> github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream+<span class="number">0x1319</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">1142</span></span><br><span class="line"># <span class="number">0x7f6e563f6790</span> github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1<span class="number">.1</span>+<span class="number">0xa0</span> /<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">637</span></span><br></pre></td></tr></table></figure>
<p>同样,我们仅保留了关键的协程信息,从上面的协程栈可以看出,containerd阻塞在接收exec返回结果处,附上关键代码佐证:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *Client)</span> <span class="title">dispatch</span><span class="params">(ctx context.Context, req *Request, resp *Response)</span> <span class="title">error</span></span> {</span><br><span class="line"> errs := <span class="built_in">make</span>(<span class="keyword">chan</span> error, <span class="number">1</span>)</span><br><span class="line"> call := &callRequest{</span><br><span class="line"> req: req,</span><br><span class="line"> resp: resp,</span><br><span class="line"> errs: errs,</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">select</span> {</span><br><span class="line"> <span class="keyword">case</span> c.calls <- call:</span><br><span class="line"> <span class="keyword">case</span> <-c.done:</span><br><span class="line"> <span class="keyword">return</span> c.err</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">select</span> { <span class="comment">// 此处对应上面协程栈 /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102</span></span><br><span class="line"> <span class="keyword">case</span> err := <-errs:</span><br><span class="line"> <span class="keyword">return</span> filterCloseErr(err)</span><br><span class="line"> <span class="keyword">case</span> <-c.done:</span><br><span class="line"> <span class="keyword">return</span> c.err</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>containerd将请求传递至containerd-shim之后,一直在等待containerd-shim的返回。</p>
<p>正常情况下,如果我们能够按照调用链路逐个分析每个组件的协程调用栈信息,我们能够很快的定位问题所在。不幸的是,由于线上docker没有开启debug模式,我们无法收集containerd-shim的pprof信息,并且runc也没有开启pprof。因此单纯依赖协程调用链路定位问题这条路被堵死了。</p>
<p>截至目前,我们已经收集了部分关键信息,同时也将问题排查范围更进一步地缩小在containerd-shim与runc之间。接下来我们换一种思路继续排查。</p>
<h4 id="进程排查"><a href="#进程排查" class="headerlink" title="进程排查"></a>进程排查</h4><p>当组件的运行状态无法继续获取时,我们转换一下思维,获取容器的运行状态,也即异常容器此时的进程状态。</p>
<p>既然docker ps执行正常,而docker inspect hang死,首先我们定位异常容器,命令如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">docker ps | grep -v NAME | awk <span class="string">'{print $1}'</span> | while read cid; do echo $cid; docker inspect -f {{.State.Pid}} $cid; done</span><br></pre></td></tr></table></figure>
<p>拿到异常容器的ID之后,我们就能扫描与该容器相关的所有进程:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">UID PID PPID C STIME TTY TIME CMD</span><br><span class="line">root <span class="number">11646</span> <span class="number">6655</span> <span class="number">0</span> Jun17 ? <span class="number">00</span>:<span class="number">01</span>:<span class="number">04</span> docker-containerd-shim -namespace moby -workdir /home/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 -address /<span class="keyword">var</span>/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /<span class="keyword">var</span>/run/docker/runtime-runc</span><br><span class="line">root <span class="number">11680</span> <span class="number">11646</span> <span class="number">0</span> Jun17 ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> /dockerinit</span><br><span class="line">root <span class="number">15581</span> <span class="number">11646</span> <span class="number">0</span> Jun17 ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> docker-runc --root /<span class="keyword">var</span>/run/docker/runtime-runc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json --log-format json exec --process /tmp/runc-process616674997 --detach --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/<span class="number">0594</span>c5897a41d401e4d1d7ddd44dacdd316c7e7d53bfdae7f16b0f6b26fcbcda.pid bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">root <span class="number">15638</span> <span class="number">15581</span> <span class="number">0</span> Jun17 ? <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> docker-runc init</span><br></pre></td></tr></table></figure>
<p>核心进程列表如上,简单备注下:</p>
<ul>
<li>6655:containerd进程</li>
<li>11646:异常容器的containerd-shim进程</li>
<li>11680:异常容器的容器启动进程。在容器内查看,因PID NS的隔离,该进程ID是1</li>
<li>15581:在异常容器内执行用户命令的进程,此时还未进入容器内部</li>
<li>15638:在异常容器内执行用户命令时,进入容器NS的进程</li>
</ul>
<p>这里再补充一个背景知识:当我们启动容器时,首先会创建runc init进程,创建并进入新的容器NS;而当我们在容器内执行命令时,首先也会创建runc init进程,进入容器的NS。只有在进入容器的隔离NS之后,才会执行用户指定的命令。</p>
<p>面对上面的进程列表,我们无法直观地分辨问题究竟由哪个进程引起。因此,我们还需要了解进程当前所处的状态。借助strace,我们逐一展示进程的活动状态:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// 11646 (container-shim)</span></span><br><span class="line">Process <span class="number">11646</span> attached with <span class="number">10</span> threads</span><br><span class="line">[pid <span class="number">37342</span>] epoll_pwait(<span class="number">5</span>, <unfinished ...></span><br><span class="line">[pid <span class="number">11656</span>] futex(<span class="number">0x818cc0</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11655</span>] restart_syscall(<... resuming interrupted call ...> <unfinished ...></span><br><span class="line">[pid <span class="number">11654</span>] futex(<span class="number">0x818bd8</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11653</span>] futex(<span class="number">0x7fc730</span>, FUTEX_WAKE, <span class="number">1</span> <unfinished ...></span><br><span class="line">[pid <span class="number">11651</span>] futex(<span class="number">0xc4200b4148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11650</span>] futex(<span class="number">0xc420082948</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11649</span>] futex(<span class="number">0xc420082548</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11647</span>] restart_syscall(<... resuming interrupted call ...> <unfinished ...></span><br><span class="line">[pid <span class="number">11646</span>] futex(<span class="number">0x7fd008</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">11653</span>] <... futex resumed> ) = <span class="number">0</span></span><br><span class="line">[pid <span class="number">11647</span>] <... restart_syscall resumed> ) = <span class="number">-1</span> EAGAIN (Resource temporarily unavailable)</span><br><span class="line">[pid <span class="number">11653</span>] epoll_wait(<span class="number">4</span>, <unfinished ...></span><br><span class="line">[pid <span class="number">11647</span>] pselect6(<span class="number">0</span>, NULL, NULL, NULL, {<span class="number">0</span>, <span class="number">20000</span>}, <span class="number">0</span>) = <span class="number">0</span> (Timeout)</span><br><span class="line">[pid <span class="number">11647</span>] futex(<span class="number">0x7fc730</span>, FUTEX_WAIT, <span class="number">0</span>, {<span class="number">60</span>, <span class="number">0</span>}</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11581 (runc exec)</span></span><br><span class="line">Process <span class="number">15581</span> attached with <span class="number">7</span> threads</span><br><span class="line">[pid <span class="number">15619</span>] read(<span class="number">6</span>, <unfinished ...></span><br><span class="line">[pid <span class="number">15592</span>] futex(<span class="number">0xc4200be148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15591</span>] futex(<span class="number">0x7fd6d25f6238</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15590</span>] futex(<span class="number">0xc420084d48</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15586</span>] futex(<span class="number">0x7fd6d25f6320</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15584</span>] restart_syscall(<... resuming interrupted call ...> <unfinished ...></span><br><span class="line">[pid <span class="number">15581</span>] futex(<span class="number">0x7fd6d25d9b28</span>, FUTEX_WAIT, <span class="number">0</span>, NULL</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11638 (runc init)</span></span><br><span class="line">Process <span class="number">15638</span> attached with <span class="number">7</span> threads</span><br><span class="line">[pid <span class="number">15648</span>] futex(<span class="number">0x7f512cea5320</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15647</span>] futex(<span class="number">0x7f512cea5238</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15645</span>] futex(<span class="number">0xc4200bc148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15643</span>] futex(<span class="number">0xc420082d48</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15642</span>] futex(<span class="number">0xc420082948</span>, FUTEX_WAIT, <span class="number">0</span>, NULL <unfinished ...></span><br><span class="line">[pid <span class="number">15639</span>] restart_syscall(<... resuming interrupted call ...> <unfinished ...></span><br><span class="line">[pid <span class="number">15638</span>] write(<span class="number">2</span>, <span class="string">"/usr/local/go/src/runtime/proc.g"</span>..., <span class="number">33</span></span><br></pre></td></tr></table></figure>
<p>从关联进程的活动状态,我们可以得出如下结论:</p>
<ul>
<li>runc exec在等待从6号FD读取数据</li>
<li>runc init在等待从2号FD写入数据</li>
</ul>
<p>这些FD究竟对应的是什么文件呢?我们借助lsof可以查看:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// 11638 (runc init)</span></span><br><span class="line">COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root cwd DIR <span class="number">0</span>,<span class="number">41</span> <span class="number">192</span> <span class="number">1066743071</span> /</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root rtd DIR <span class="number">0</span>,<span class="number">41</span> <span class="number">192</span> <span class="number">1066743071</span> /</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root txt REG <span class="number">0</span>,<span class="number">4</span> <span class="number">7644224</span> <span class="number">1070360467</span> /memfd:runc_cloned:/proc/self/exe (deleted)</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">2107816</span> <span class="number">1053962</span> /usr/lib64/libc<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">19512</span> <span class="number">1054285</span> /usr/lib64/libdl<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">266688</span> <span class="number">1050626</span> /usr/lib64/libseccomp.so<span class="number">.2</span><span class="number">.3</span><span class="number">.1</span></span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">142296</span> <span class="number">1055698</span> /usr/lib64/libpthread<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">27168</span> <span class="number">3024893</span> /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">164432</span> <span class="number">1054515</span> /usr/lib64/ld<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root <span class="number">0</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root <span class="number">1</span>w FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root <span class="number">2</span>w FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root <span class="number">3</span>u unix <span class="number">0xffff881ff8273000</span> <span class="number">0</span>t0 <span class="number">1070361341</span> socket</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root <span class="number">5</span>u a_inode <span class="number">0</span>,<span class="number">9</span> <span class="number">0</span> <span class="number">7180</span> [eventpoll]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11581 (runc exec)</span></span><br><span class="line">COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME</span><br><span class="line">docker-ru <span class="number">15581</span> root cwd DIR <span class="number">0</span>,<span class="number">18</span> <span class="number">120</span> <span class="number">1066743076</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">docker-ru <span class="number">15581</span> root rtd DIR <span class="number">8</span>,<span class="number">3</span> <span class="number">4096</span> <span class="number">2</span> /</span><br><span class="line">docker-ru <span class="number">15581</span> root txt REG <span class="number">8</span>,<span class="number">3</span> <span class="number">7644224</span> <span class="number">919775</span> /usr/bin/docker-runc</span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">2107816</span> <span class="number">1053962</span> /usr/lib64/libc<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">19512</span> <span class="number">1054285</span> /usr/lib64/libdl<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">266688</span> <span class="number">1050626</span> /usr/lib64/libseccomp.so<span class="number">.2</span><span class="number">.3</span><span class="number">.1</span></span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">142296</span> <span class="number">1055698</span> /usr/lib64/libpthread<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">27168</span> <span class="number">3024893</span> /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so</span><br><span class="line">docker-ru <span class="number">15581</span> root mem REG <span class="number">8</span>,<span class="number">3</span> <span class="number">164432</span> <span class="number">1054515</span> /usr/lib64/ld<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">0</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">1</span>w FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">2</span>w FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">3</span>w REG <span class="number">0</span>,<span class="number">18</span> <span class="number">5456</span> <span class="number">1066709902</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">4</span>u a_inode <span class="number">0</span>,<span class="number">9</span> <span class="number">0</span> <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-ru <span class="number">15581</span> root <span class="number">6</span>u unix <span class="number">0xffff881ff8275400</span> <span class="number">0</span>t0 <span class="number">1070361342</span> socket</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11646 (container-shim)</span></span><br><span class="line">COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME</span><br><span class="line">docker-co <span class="number">11646</span> root cwd DIR <span class="number">0</span>,<span class="number">18</span> <span class="number">120</span> <span class="number">1066743076</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">docker-co <span class="number">11646</span> root rtd DIR <span class="number">8</span>,<span class="number">3</span> <span class="number">4096</span> <span class="number">2</span> /</span><br><span class="line">docker-co <span class="number">11646</span> root txt REG <span class="number">8</span>,<span class="number">3</span> <span class="number">4173632</span> <span class="number">919772</span> /usr/bin/docker-containerd-shim</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">0</span>r CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t0 <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">1</span>w CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t0 <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">2</span>w CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t0 <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">3</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">4</span>u a_inode <span class="number">0</span>,<span class="number">9</span> <span class="number">0</span> <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">5</span>u a_inode <span class="number">0</span>,<span class="number">9</span> <span class="number">0</span> <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">6</span>u unix <span class="number">0xffff881e8cac2800</span> <span class="number">0</span>t0 <span class="number">1066743079</span> @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">7</span>u unix <span class="number">0xffff881e8cac3400</span> <span class="number">0</span>t0 <span class="number">1066743968</span> @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">8</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1066743970</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">9</span>w FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">10</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1066743971</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">11</span>u FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">12</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1066743972</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">13</span>w FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">14</span>u FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">15</span>r FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">16</span>u FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">17</span>w FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">18</span>u FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">19</span>r FIFO <span class="number">0</span>,<span class="number">18</span> <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">20</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root <span class="number">26</span>r FIFO <span class="number">0</span>,<span class="number">8</span> <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br></pre></td></tr></table></figure>
<p>有心人结合strace与lsof的结果,已经能够自己得出结论。</p>
<p>runc init往2号FD内写数据时阻塞,2号FD对应的类型是pipe类型。而linux pipe有一个默认的数据大小,当写入的数据超过该大小时,同时读端并未读取数据,写端就会被阻塞。</p>
<p>小结一下:containerd-shim启动runc exec去容器内执行用户命令,而runc exec启动runc init进入容器时,由于往2号FD写数据超过限制大小而被阻塞。当最底层的runc init被阻塞时,造成了调用链路上所有进程都被阻塞:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">runc init → runc exec → containerd-shim exec → containerd exec → dockerd exec</span><br></pre></td></tr></table></figure>
<p>问题定位至此,我们已经了解了docker hang死的原因。但是,现在我们还有如下问题并未解决:</p>
<ul>
<li>为什么runc init会往2号FD (对应go语言的os.Stderr) 中写入超过linux pipe大小限制的数据?</li>
<li>为什么runc init出现问题只发生在特定容器?</li>
</ul>
<p>如果常态下runc init就需要往os.Stdout或者os.Stderr中写入很多数据,那么所有容器的创建都应该有问题。所以,我们可以确定是该异常容器出现了什么未知原因,导致runc init非预期往os.Stderr写入了大量数据。而此时runc init往os.Stderr中写入的数据就很有可能揭示非预期的异常。</p>
<p>所以,我们需要获取runc init当前正在写入的数据。由于runc init的2号FD是个匿名pipe,我们无法使用常规文件读取的方式获取pipe内的数据。这里感谢鹤哥趟坑,找到了一种读取匿名pipe内容的方法:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"># cat /proc/<span class="number">15638</span>/fd/<span class="number">2</span></span><br><span class="line">runtime/cgo: pthread_create failed: Resource temporarily unavailable</span><br><span class="line">SIGABRT: abort</span><br><span class="line">PC=<span class="number">0x7f512b7365f7</span> m=<span class="number">0</span> sigcode=<span class="number">18446744073709551610</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">0</span> [idle]:</span><br><span class="line">runtime: unknown pc <span class="number">0x7f512b7365f7</span></span><br><span class="line">stack: frame={sp:<span class="number">0x7ffe1121a658</span>, fp:<span class="number">0x0</span>} stack=[<span class="number">0x7ffe0ae1bb28</span>,<span class="number">0x7ffe1121ab50</span>)</span><br><span class="line"><span class="number">00007</span>ffe1121a558: <span class="number">00007</span>ffe1121a6d8 <span class="number">00007</span>ffe1121a6b0</span><br><span class="line"><span class="number">00007</span>ffe1121a568: <span class="number">0000000000000001</span> <span class="number">00007</span>f512c527660</span><br><span class="line"><span class="number">00007</span>ffe1121a578: <span class="number">00007</span>f512c54d560 <span class="number">00007</span>f512c54d208</span><br><span class="line"><span class="number">00007</span>ffe1121a588: <span class="number">00007</span>f512c333e6f <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a598: <span class="number">00007</span>f512c527660 <span class="number">0000000000000005</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5a8: <span class="number">0000000000000000</span> <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5b8: <span class="number">00007</span>f512c54d208 <span class="number">00007</span>f512c528000</span><br><span class="line"><span class="number">00007</span>ffe1121a5c8: <span class="number">00007</span>ffe1121a600 <span class="number">00007</span>f512b704b0c</span><br><span class="line"><span class="number">00007</span>ffe1121a5d8: <span class="number">00007</span>f512b7110c0 <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5e8: <span class="number">00007</span>f512c54d560 <span class="number">00007</span>ffe1121a620</span><br><span class="line"><span class="number">00007</span>ffe1121a5f8: <span class="number">00007</span>ffe1121a610 <span class="number">000000000</span>f11ed7d</span><br><span class="line"><span class="number">00007</span>ffe1121a608: <span class="number">00007</span>f512c550153 <span class="number">00000000</span>ffffffff</span><br><span class="line"><span class="number">00007</span>ffe1121a618: <span class="number">00007</span>f512c550a9b <span class="number">00007</span>f512b707d00</span><br><span class="line"><span class="number">00007</span>ffe1121a628: <span class="number">00007</span>f512babc868 <span class="number">00007</span>f512c9e9e5e</span><br><span class="line"><span class="number">00007</span>ffe1121a638: <span class="number">00007</span>f512d3bb080 <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a648: <span class="number">0000000000000011</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a658: <<span class="number">00007</span>f512b737ce8 <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007</span>ffe1121a668: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a678: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a688: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a698: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6a8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6b8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6c8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6d8: <span class="number">0000000000000000</span> <span class="number">00007</span>f512babc868</span><br><span class="line"><span class="number">00007</span>ffe1121a6e8: <span class="number">00007</span>f512c9e9e5e <span class="number">00007</span>f512d3bb080</span><br><span class="line"><span class="number">00007</span>ffe1121a6f8: <span class="number">00007</span>f512c33f260 <span class="number">00007</span>f512babc1c0</span><br><span class="line"><span class="number">00007</span>ffe1121a708: <span class="number">00007</span>f512babc1c0 <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a718: <span class="number">00007</span>f512babc243 <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a728: <span class="number">00007</span>f512b7787ec <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a738: <span class="number">00007</span>f512babc1c0 <span class="number">000000000000000</span>a</span><br><span class="line"><span class="number">00007</span>ffe1121a748: <span class="number">00007</span>f512b7e8a4d <span class="number">000000000000000</span>a</span><br><span class="line">runtime: unknown pc <span class="number">0x7f512b7365f7</span></span><br><span class="line">stack: frame={sp:<span class="number">0x7ffe1121a658</span>, fp:<span class="number">0x0</span>} stack=[<span class="number">0x7ffe0ae1bb28</span>,<span class="number">0x7ffe1121ab50</span>)</span><br><span class="line"><span class="number">00007</span>ffe1121a558: <span class="number">00007</span>ffe1121a6d8 <span class="number">00007</span>ffe1121a6b0</span><br><span class="line"><span class="number">00007</span>ffe1121a568: <span class="number">0000000000000001</span> <span class="number">00007</span>f512c527660</span><br><span class="line"><span class="number">00007</span>ffe1121a578: <span class="number">00007</span>f512c54d560 <span class="number">00007</span>f512c54d208</span><br><span class="line"><span class="number">00007</span>ffe1121a588: <span class="number">00007</span>f512c333e6f <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a598: <span class="number">00007</span>f512c527660 <span class="number">0000000000000005</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5a8: <span class="number">0000000000000000</span> <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5b8: <span class="number">00007</span>f512c54d208 <span class="number">00007</span>f512c528000</span><br><span class="line"><span class="number">00007</span>ffe1121a5c8: <span class="number">00007</span>ffe1121a600 <span class="number">00007</span>f512b704b0c</span><br><span class="line"><span class="number">00007</span>ffe1121a5d8: <span class="number">00007</span>f512b7110c0 <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5e8: <span class="number">00007</span>f512c54d560 <span class="number">00007</span>ffe1121a620</span><br><span class="line"><span class="number">00007</span>ffe1121a5f8: <span class="number">00007</span>ffe1121a610 <span class="number">000000000</span>f11ed7d</span><br><span class="line"><span class="number">00007</span>ffe1121a608: <span class="number">00007</span>f512c550153 <span class="number">00000000</span>ffffffff</span><br><span class="line"><span class="number">00007</span>ffe1121a618: <span class="number">00007</span>f512c550a9b <span class="number">00007</span>f512b707d00</span><br><span class="line"><span class="number">00007</span>ffe1121a628: <span class="number">00007</span>f512babc868 <span class="number">00007</span>f512c9e9e5e</span><br><span class="line"><span class="number">00007</span>ffe1121a638: <span class="number">00007</span>f512d3bb080 <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a648: <span class="number">0000000000000011</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a658: <<span class="number">00007</span>f512b737ce8 <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007</span>ffe1121a668: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a678: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a688: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a698: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6a8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6b8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6c8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6d8: <span class="number">0000000000000000</span> <span class="number">00007</span>f512babc868</span><br><span class="line"><span class="number">00007</span>ffe1121a6e8: <span class="number">00007</span>f512c9e9e5e <span class="number">00007</span>f512d3bb080</span><br><span class="line"><span class="number">00007</span>ffe1121a6f8: <span class="number">00007</span>f512c33f260 <span class="number">00007</span>f512babc1c0</span><br><span class="line"><span class="number">00007</span>ffe1121a708: <span class="number">00007</span>f512babc1c0 <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a718: <span class="number">00007</span>f512babc243 <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a728: <span class="number">00007</span>f512b7787ec <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a738: <span class="number">00007</span>f512babc1c0 <span class="number">000000000000000</span>a</span><br><span class="line"><span class="number">00007</span>ffe1121a748: <span class="number">00007</span>f512b7e8a4d <span class="number">000000000000000</span>a</span><br><span class="line"></span><br><span class="line">goroutine <span class="number">1</span> [running, locked to thread]:</span><br><span class="line">runtime.systemstack_switch()</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/asm_amd64.s:<span class="number">363</span> fp=<span class="number">0xc4200a3ed0</span> sp=<span class="number">0xc4200a3ec8</span> pc=<span class="number">0x7f512c7281d0</span></span><br><span class="line">runtime.startTheWorld()</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/proc.<span class="keyword">go</span>:<span class="number">978</span> +<span class="number">0x2f</span> fp=<span class="number">0xc4200a3ee8</span> sp=<span class="number">0xc4200a3ed0</span> pc=<span class="number">0x7f512c70221f</span></span><br><span class="line">runtime.GOMAXPROCS(<span class="number">0x1</span>, <span class="number">0xc42013d9a0</span>)</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/debug.<span class="keyword">go</span>:<span class="number">30</span> +<span class="number">0xa0</span> fp=<span class="number">0xc4200a3f10</span> sp=<span class="number">0xc4200a3ee8</span> pc=<span class="number">0x7f512c6d9810</span></span><br><span class="line">main.init<span class="number">.0</span>()</span><br><span class="line"> /<span class="keyword">go</span>/src/github.com/opencontainers/runc/init.<span class="keyword">go</span>:<span class="number">14</span> +<span class="number">0x61</span> fp=<span class="number">0xc4200a3f30</span> sp=<span class="number">0xc4200a3f10</span> pc=<span class="number">0x7f512c992801</span></span><br><span class="line">main.init()</span><br><span class="line"> <autogenerated>:<span class="number">1</span> +<span class="number">0x624</span> fp=<span class="number">0xc4200a3f88</span> sp=<span class="number">0xc4200a3f30</span> pc=<span class="number">0x7f512c9a1014</span></span><br><span class="line">runtime.main()</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/proc.<span class="keyword">go</span>:<span class="number">186</span> +<span class="number">0x1d2</span> fp=<span class="number">0xc4200a3fe0</span> sp=<span class="number">0xc4200a3f88</span> pc=<span class="number">0x7f512c6ff962</span></span><br><span class="line">runtime.goexit()</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/asm_amd64.s:<span class="number">2361</span> +<span class="number">0x1</span> fp=<span class="number">0xc4200a3fe8</span> sp=<span class="number">0xc4200a3fe0</span> pc=<span class="number">0x7f512c72ad71</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">6</span> [syscall]:</span><br><span class="line">os/signal.signal_recv(<span class="number">0x0</span>)</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/runtime/sigqueue.<span class="keyword">go</span>:<span class="number">139</span> +<span class="number">0xa8</span></span><br><span class="line">os/signal.loop()</span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/os/signal/signal_unix.<span class="keyword">go</span>:<span class="number">22</span> +<span class="number">0x24</span></span><br><span class="line">created by os/signal.init<span class="number">.0</span></span><br><span class="line"> /usr/local/<span class="keyword">go</span>/src/os/signal/signal_unix.<span class="keyword">go</span>:<span class="number">28</span> +<span class="number">0x43</span></span><br><span class="line"></span><br><span class="line">rax <span class="number">0x0</span></span><br><span class="line">rbx <span class="number">0x7f512babc868</span></span><br><span class="line">rcx <span class="number">0xffffffffffffffff</span></span><br><span class="line">rdx <span class="number">0x6</span></span><br><span class="line">rdi <span class="number">0x271</span></span><br><span class="line">rsi <span class="number">0x271</span></span><br><span class="line">rbp <span class="number">0x7f512c9e9e5e</span></span><br><span class="line">rsp <span class="number">0x7ffe1121a658</span></span><br><span class="line">r8 <span class="number">0xa</span></span><br><span class="line">r9 <span class="number">0x7f512c524740</span></span><br><span class="line">r10 <span class="number">0x8</span></span><br><span class="line">r11 <span class="number">0x206</span></span><br><span class="line">r12 <span class="number">0x7f512d3bb080</span></span><br><span class="line">r13 <span class="number">0xf1</span></span><br><span class="line">r14 <span class="number">0x11</span></span><br><span class="line">r15 <span class="number">0x0</span></span><br><span class="line">rip <span class="number">0x7f512b7365f7</span></span><br><span class="line">rflags <span class="number">0x206</span></span><br><span class="line">cs <span class="number">0x33</span></span><br><span class="line">fs <span class="number">0x0</span></span><br><span class="line">gs <span class="number">0x0</span></span><br><span class="line">exec failed: container_linux.<span class="keyword">go</span>:<span class="number">348</span>: starting container process caused <span class="string">"read init-p: connection reset by peer"</span></span><br></pre></td></tr></table></figure>
<p>额,runc init因资源不足创建线程失败???这种输出显然不是runc的输出,而是go runtime非预期的输出内容。那么资源不足,究竟是什么资源类型资源不足呢?我们在结合 /var/log/message 日志分析:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: runc:[<span class="number">2</span>:INIT] invoked oom-killer: gfp_mask=<span class="number">0xd0</span>, order=<span class="number">0</span>, oom_score_adj=<span class="number">997</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: CPU: <span class="number">14</span> PID: <span class="number">12788</span> Comm: runc:[<span class="number">2</span>:INIT] Tainted: G W OE ------------ T <span class="number">3.10</span><span class="number">.0</span><span class="number">-514.16</span><span class="number">.1</span>.el7.stable.v1<span class="number">.4</span>.x86_64 #<span class="number">1</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Hardware name: Inspur SA5212M4/YZMB<span class="number">-00370</span><span class="number">-107</span>, BIOS <span class="number">4.1</span><span class="number">.10</span> <span class="number">11</span>/<span class="number">14</span>/<span class="number">2016</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: ffff88103841dee0 <span class="number">00000000</span>c4394691 ffff880263e4bcb8 ffffffff8168863d</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: ffff880263e4bd50 ffffffff81683585 ffff88203cc5e300 ffff880ee02b2380</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: <span class="number">0000000000000001</span> <span class="number">0000000000000000</span> <span class="number">0000000000000000</span> <span class="number">0000000000000046</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Call Trace:</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff8168863d>] dump_stack+<span class="number">0x19</span>/<span class="number">0x1b</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81683585>] dump_header+<span class="number">0x85</span>/<span class="number">0x27f</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81185b06>] ? find_lock_task_mm+<span class="number">0x56</span>/<span class="number">0xc0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81185fbe>] oom_kill_process+<span class="number">0x24e</span>/<span class="number">0x3c0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81093c2e>] ? has_capability_noaudit+<span class="number">0x1e</span>/<span class="number">0x30</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff811f4d91>] mem_cgroup_oom_synchronize+<span class="number">0x551</span>/<span class="number">0x580</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff811f41b0>] ? mem_cgroup_charge_common+<span class="number">0xc0</span>/<span class="number">0xc0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81186844>] pagefault_out_of_memory+<span class="number">0x14</span>/<span class="number">0x90</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff816813fa>] mm_fault_error+<span class="number">0x68</span>/<span class="number">0x12b</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81694405>] __do_page_fault+<span class="number">0x395</span>/<span class="number">0x450</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff816944f5>] do_page_fault+<span class="number">0x35</span>/<span class="number">0x90</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<ffffffff81690708>] page_fault+<span class="number">0x28</span>/<span class="number">0x30</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: memory: usage <span class="number">3145728</span>kB, limit <span class="number">3145728</span>kB, failcnt <span class="number">14406932</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: memory+swap: usage <span class="number">3145728</span>kB, limit <span class="number">9007199254740988</span>kB, failcnt <span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: kmem: usage <span class="number">3143468</span>kB, limit <span class="number">9007199254740988</span>kB, failcnt <span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/b761e05249245695278b3f409d2d6e5c6a5bff6995ff0cf44d03af4aa9764a30: cache:<span class="number">0</span>KB rss:<span class="number">40</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">40</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/<span class="number">1</span>d1750ecc627cc5d60d80c071b2eb4d515ee8880c5b5136883164f08319869b0: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5: cache:<span class="number">0</span>KB rss:<span class="number">2220</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">2140</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/super-agent: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">30598</span>] <span class="number">0</span> <span class="number">30598</span> <span class="number">255</span> <span class="number">1</span> <span class="number">4</span> <span class="number">0</span> <span class="number">-998</span> pause</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">11680</span>] <span class="number">0</span> <span class="number">11680</span> <span class="number">164833</span> <span class="number">1118</span> <span class="number">20</span> <span class="number">0</span> <span class="number">997</span> dockerinit</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">12788</span>] <span class="number">0</span> <span class="number">12788</span> <span class="number">150184</span> <span class="number">1146</span> <span class="number">23</span> <span class="number">0</span> <span class="number">997</span> runc:[<span class="number">2</span>:INIT]</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: oom-kill:,cpuset=bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,mems_allowed=<span class="number">0</span><span class="number">-1</span>,oom_memcg=/kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda,task_memcg=/kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,task=runc:[<span class="number">2</span>:INIT],pid=<span class="number">12800</span>,uid=<span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup out of memory: Kill process <span class="number">12800</span> (runc:[<span class="number">2</span>:INIT]) score <span class="number">997</span> or sacrifice child</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Killed process <span class="number">12788</span> (runc:[<span class="number">2</span>:INIT]) total-vm:<span class="number">600736</span>kB, anon-rss:<span class="number">3296</span>kB, file-rss:<span class="number">276</span>kB, shmem-rss:<span class="number">1012</span>kB</span><br></pre></td></tr></table></figure>
<p>/var/log/message 记录了该容器在大约1个月前大量的OOM日志信息,该时间与异常的runc init进程启动时间基本匹配。</p>
<p>小结runc init阻塞的原因:在一个非常关键的时间节点,runc init由于内存资源不足,创建线程失败,触发go runtime的非预期输出,进而造成runc init阻塞在写pipe操作。</p>
<p>定位至此,问题的全貌已经基本描述清楚。但是我们还有一个疑问,既然runc init在往pipe中写数据,难道没有其它进程来读取pipe中的内容吗?</p>
<p>大家还记得上面lsof执行的结果吗?有心人一定发现了该pipe的读端是谁了,对,就是containerd-shim,对应的pipe的inode编号为1070361747。那么,为什么containerd-shim没有来读pipe里面的内容呢?我们结合代码来分析:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(e *execProcess)</span> <span class="title">start</span><span class="params">(ctx context.Context)</span> <span class="params">(err error)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">if</span> err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != <span class="literal">nil</span> { <span class="comment">// 执行runc init</span></span><br><span class="line"> <span class="built_in">close</span>(e.waitBlock)</span><br><span class="line"> <span class="keyword">return</span> e.parent.runtimeError(err, <span class="string">"OCI runtime exec failed"</span>)</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> !e.stdio.IsNull() {</span><br><span class="line"> fifoCtx, cancel := context.WithTimeout(ctx, <span class="number">15</span>*time.Second)</span><br><span class="line"> <span class="keyword">defer</span> cancel()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> err := copyPipes(fifoCtx, e.io, e.stdio.Stdin, e.stdio.Stdout, e.stdio.Stderr, &e.wg, &copyWaitGroup); err != <span class="literal">nil</span> { <span class="comment">// 读pipe</span></span><br><span class="line"> <span class="keyword">return</span> errors.Wrap(err, <span class="string">"failed to start io pipe copy"</span>)</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Runc)</span> <span class="title">Exec</span><span class="params">(context context.Context, id <span class="keyword">string</span>, spec specs.Process, opts *ExecOpts)</span> <span class="title">error</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> cmd := r.command(context, <span class="built_in">append</span>(args, id)...)</span><br><span class="line"> <span class="keyword">if</span> opts != <span class="literal">nil</span> && opts.IO != <span class="literal">nil</span> {</span><br><span class="line"> opts.Set(cmd)</span><br><span class="line"> }</span><br><span class="line"> ......</span><br><span class="line"> ec, err := Monitor.Start(cmd)</span><br><span class="line"> ......</span><br><span class="line"> status, err := Monitor.Wait(cmd, ec)</span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>额,containerd-shim的设计是,等待runc init执行完成之后,再来读取pipe中的内容。但是此时的runc init由于非预期的写入数据量比较大,被阻塞在了写pipe操作处。。。完美的死锁。</p>
<p>终于,本次docker hang死问题的核心脉络都已理清。接下来我们再来聊聊解决方案。</p>
<h3 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h3><p>当了解了docker hang死的成因之后,我们可以针对性的提出如下解决办法。</p>
<h4 id="最直观的办法"><a href="#最直观的办法" class="headerlink" title="最直观的办法"></a>最直观的办法</h4><p>既然docker exec可能会引起docker hang死,那么我们禁用系统中所有的docker exec操作即可。最典型的是kubelet的probe,当前我们默认给所有Pod添加了ReadinessProbe,并且是以exec的形式进入容器内执行命令。我们调整kubelet的探测行为,修改为tcp或者http probe即可。</p>
<p>这里组件虽然改动不大,但是涉及业务容器的改造成本太大了,如何迁移存量集群是个大问题。</p>
<h4 id="最根本的办法"><a href="#最根本的办法" class="headerlink" title="最根本的办法"></a>最根本的办法</h4><p>既然当前containerd-shim读pipe需要等待runc exec执行完毕,如果我们将读pipe的操作提前至runc exec命令执行之前,理论上也可以避免死锁。</p>
<p>同样。这种方案的升级成本太高了,升级containerd-shim时需要重启存量的所有容器,这个方案基本不可能通过评审。</p>
<h4 id="最简单的办法"><a href="#最简单的办法" class="headerlink" title="最简单的办法"></a>最简单的办法</h4><p>既然runc init阻塞在写pipe,我们主动读取pipe内的内容,也能让runc init顺利退出。</p>
<p>在将本解决方案自动化的过程中,如何能够识别如docker hang死是由于写pipe导致的,是一个小小的挑战。但是相对于以上两种解决方案,我认为还是值得一试,毕竟其影响微乎其微。</p>
<h3 id="后续"><a href="#后续" class="headerlink" title="后续"></a>后续</h3><p>其实我们在读pipe的时候还引起了一个另外的问题,详见:<a href="https://plpan.github.io/%E4%B8%80%E6%AC%A1%E8%AF%BB-pipe-%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/">一次读-pipe-引发的血案</a>。</p>
<p>另外,docker hang死的原因远非这一种,本次排查的结果也并非适用于所有场景。希望各位看官能够根据自己的现场排查问题。</p>
<p>本次docker hang死的排查之旅已然告终。</p>
<p>本次排查由四人小分队 @飞哥 @鹤哥 @博哥 @我 一起排查长达数天的结论,欢迎大家一键三连,以表支持。</p>
<p>以上排查如果有误,也欢迎指正。</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>docker</tag>
<tag>containerd</tag>
<tag>runc</tag>
</tags>
</entry>
<entry>
<title>docker exec 失败问题排查之旅</title>
<url>/docker-exec-%E5%A4%B1%E8%B4%A5%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</url>
<content><![CDATA[<p>锄禾日当午,值班好辛苦;</p>
<p>汗滴禾下土,一查一下午。</p>
<h3 id="问题描述"><a href="#问题描述" class="headerlink" title="问题描述"></a>问题描述</h3><p>今天,在值班排查线上问题的过程中,发现系统日志一直在刷docker异常日志:</p>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">"2021-05-12T09:08:40.642410594+08:00"</span> level=error msg=<span class="string">"stream copy error: reading from a closed fifo"</span></span><br><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">"2021-05-12T09:08:40.642418571+08:00"</span> level=error msg=<span class="string">"stream copy error: reading from a closed fifo"</span></span><br><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">"2021-05-12T09:08:40.663754355+08:00"</span> level=error msg=<span class="string">"Error running exec 110deb1c1b2a2d2671d7368bd02bfc18a968e4712a3c771dedf0b362820e73cb in container: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"read init-p: connection reset by peer\": unknown"</span></span><br></pre></td></tr></table></figure>
<p>从系统风险性上来看,异常日志出现的原因需要排查清楚,并摸清是否会对业务产生影响。</p>
<p>下文简单介绍问题排查的流程,以及产生的原因。</p>
<h3 id="问题排查"><a href="#问题排查" class="headerlink" title="问题排查"></a>问题排查</h3><p>现在我们唯一掌握的信息,只有系统日志告知dockerd执行exec失败。</p>
<p>在具体的问题分析之前,我们再来回顾一下docker的工作原理与调用链路:</p>
<p><img src="docker-call-path.png" alt="docker调用链路"></p>
<p>可见,docker的调用链路非常长,涉及组件也较多。因此,我们的排查路径主要分为如下两步:</p>
<ul>
<li>确定引起失败的组件</li>
<li>确定组件失败的原因</li>
</ul>
<h4 id="定位组件"><a href="#定位组件" class="headerlink" title="定位组件"></a>定位组件</h4><p>熟悉docker的用户能够一眼定位引起问题的组件。但是,我们还是按照常规的排查流程走一遍:</p>
<figure class="highlight angelscript"><table><tr><td class="code"><pre><span class="line"><span class="comment">// 1. 定位问题容器</span></span><br><span class="line"># sudo docker ps | grep -v pause | grep -v NAMES | awk <span class="string">'{print $1}'</span> | xargs -ti sudo docker exec {} sleep <span class="number">1</span></span><br><span class="line">sudo docker exec aa1e331ec24f sleep <span class="number">1</span></span><br><span class="line">OCI runtime exec failed: exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">"read init-p: connection reset by peer"</span>: unknown</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 2. 排除docker嫌疑</span></span><br><span class="line"># docker-containerd-ctr -a /var/run/docker/containerd/docker-containerd.sock -n moby t exec --exec-id stupig1 aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep <span class="number">1</span></span><br><span class="line">ctr: OCI runtime exec failed: exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">"read init-p: connection reset by peer"</span>: unknown</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 3. 排除containerd与containerd-shim嫌疑</span></span><br><span class="line"># docker-runc --root /var/run/docker/runtime-runc/moby/ exec aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep</span><br><span class="line">runtime/cgo: pthread_create failed: Resource temporarily unavailable</span><br><span class="line">SIGABRT: abort</span><br><span class="line">PC=<span class="number">0x6b657e</span> m=<span class="number">0</span> sigcode=<span class="number">18446744073709551610</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">0</span> [idle]:</span><br><span class="line">runtime: unknown pc <span class="number">0x6b657e</span></span><br><span class="line">stack: frame={sp:<span class="number">0x7ffd30f0d218</span>, fp:<span class="number">0x0</span>} stack=[<span class="number">0x7ffd2ab0e738</span>,<span class="number">0x7ffd30f0d760</span>)</span><br><span class="line"><span class="number">00007f</span>fd30f0d118: <span class="number">0000000000000002</span> <span class="number">00007f</span>fd30f7f184</span><br><span class="line"><span class="number">00007f</span>fd30f0d128: <span class="number">000000000069</span>c31c <span class="number">00007f</span>fd30f0d1a8</span><br><span class="line"><span class="number">00007f</span>fd30f0d138: <span class="number">000000000045814</span>e <runtime.callCgoMmap+<span class="number">62</span>> <span class="number">00007f</span>fd30f0d140</span><br><span class="line"><span class="number">00007f</span>fd30f0d148: <span class="number">00007f</span>fd30f0d190 <span class="number">0000000000411</span>a88 <runtime.persistentalloc1+<span class="number">456</span>></span><br><span class="line"><span class="number">00007f</span>fd30f0d158: <span class="number">0000000000</span>bf6dd0 <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d168: <span class="number">0000000000010000</span> <span class="number">0000000000000008</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d178: <span class="number">0000000000</span>bf6dd8 <span class="number">0000000000</span>bf7ca0</span><br><span class="line"><span class="number">00007f</span>fd30f0d188: <span class="number">00007f</span>dcbb4b7000 <span class="number">00007f</span>fd30f0d1c8</span><br><span class="line"><span class="number">00007f</span>fd30f0d198: <span class="number">0000000000451205</span> <runtime.persistentalloc.func1+<span class="number">69</span>> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1a8: <span class="number">0000000000000000</span> <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d1b8: <span class="number">00007f</span>dcbb4b7000 <span class="number">00007f</span>fd30f0d1e0</span><br><span class="line"><span class="number">00007f</span>fd30f0d1c8: <span class="number">00007f</span>fd30f0d210 <span class="number">00007f</span>fd30f0d220</span><br><span class="line"><span class="number">00007f</span>fd30f0d1d8: <span class="number">0000000000000000</span> <span class="number">00000000000000f</span>1</span><br><span class="line"><span class="number">00007f</span>fd30f0d1e8: <span class="number">0000000000000011</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1f8: <span class="number">000000000069</span>c31c <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d208: <span class="number">000000000045814</span>e <runtime.callCgoMmap+<span class="number">62</span>> <span class="number">00007f</span>fd30f0d210</span><br><span class="line"><span class="number">00007f</span>fd30f0d218: <<span class="number">00007f</span>fd30f0d268 fffffffe7fffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d228: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d238: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d248: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d258: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d268: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d278: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d288: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d298: ffffffffffffffff <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2a8: <span class="number">00000000006</span>b68ba <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2b8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2c8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2d8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2e8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2f8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d308: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line">runtime: unknown pc <span class="number">0x6b657e</span></span><br><span class="line">stack: frame={sp:<span class="number">0x7ffd30f0d218</span>, fp:<span class="number">0x0</span>} stack=[<span class="number">0x7ffd2ab0e738</span>,<span class="number">0x7ffd30f0d760</span>)</span><br><span class="line"><span class="number">00007f</span>fd30f0d118: <span class="number">0000000000000002</span> <span class="number">00007f</span>fd30f7f184</span><br><span class="line"><span class="number">00007f</span>fd30f0d128: <span class="number">000000000069</span>c31c <span class="number">00007f</span>fd30f0d1a8</span><br><span class="line"><span class="number">00007f</span>fd30f0d138: <span class="number">000000000045814</span>e <runtime.callCgoMmap+<span class="number">62</span>> <span class="number">00007f</span>fd30f0d140</span><br><span class="line"><span class="number">00007f</span>fd30f0d148: <span class="number">00007f</span>fd30f0d190 <span class="number">0000000000411</span>a88 <runtime.persistentalloc1+<span class="number">456</span>></span><br><span class="line"><span class="number">00007f</span>fd30f0d158: <span class="number">0000000000</span>bf6dd0 <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d168: <span class="number">0000000000010000</span> <span class="number">0000000000000008</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d178: <span class="number">0000000000</span>bf6dd8 <span class="number">0000000000</span>bf7ca0</span><br><span class="line"><span class="number">00007f</span>fd30f0d188: <span class="number">00007f</span>dcbb4b7000 <span class="number">00007f</span>fd30f0d1c8</span><br><span class="line"><span class="number">00007f</span>fd30f0d198: <span class="number">0000000000451205</span> <runtime.persistentalloc.func1+<span class="number">69</span>> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1a8: <span class="number">0000000000000000</span> <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d1b8: <span class="number">00007f</span>dcbb4b7000 <span class="number">00007f</span>fd30f0d1e0</span><br><span class="line"><span class="number">00007f</span>fd30f0d1c8: <span class="number">00007f</span>fd30f0d210 <span class="number">00007f</span>fd30f0d220</span><br><span class="line"><span class="number">00007f</span>fd30f0d1d8: <span class="number">0000000000000000</span> <span class="number">00000000000000f</span>1</span><br><span class="line"><span class="number">00007f</span>fd30f0d1e8: <span class="number">0000000000000011</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1f8: <span class="number">000000000069</span>c31c <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d208: <span class="number">000000000045814</span>e <runtime.callCgoMmap+<span class="number">62</span>> <span class="number">00007f</span>fd30f0d210</span><br><span class="line"><span class="number">00007f</span>fd30f0d218: <<span class="number">00007f</span>fd30f0d268 fffffffe7fffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d228: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d238: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d248: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d258: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d268: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d278: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d288: ffffffffffffffff ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d298: ffffffffffffffff <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2a8: <span class="number">00000000006</span>b68ba <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2b8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2c8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2d8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2e8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2f8: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d308: <span class="number">0000000000000000</span> <span class="number">0000000000000000</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">1</span> [running]:</span><br><span class="line">runtime.systemstack_switch()</span><br><span class="line"> /usr/local/go/src/runtime/asm_amd64.s:<span class="number">363</span> fp=<span class="number">0xc4200fe788</span> sp=<span class="number">0xc4200fe780</span> pc=<span class="number">0x454120</span></span><br><span class="line">runtime.main()</span><br><span class="line"> /usr/local/go/src/runtime/proc.go:<span class="number">128</span> +<span class="number">0x63</span> fp=<span class="number">0xc4200fe7e0</span> sp=<span class="number">0xc4200fe788</span> pc=<span class="number">0x42bb83</span></span><br><span class="line">runtime.goexit()</span><br><span class="line"> /usr/local/go/src/runtime/asm_amd64.s:<span class="number">2361</span> +<span class="number">0x1</span> fp=<span class="number">0xc4200fe7e8</span> sp=<span class="number">0xc4200fe7e0</span> pc=<span class="number">0x456c91</span></span><br><span class="line"></span><br><span class="line">rax <span class="number">0x0</span></span><br><span class="line">rbx <span class="number">0xbe2978</span></span><br><span class="line">rcx <span class="number">0x6b657e</span></span><br><span class="line">rdx <span class="number">0x0</span></span><br><span class="line">rdi <span class="number">0x2</span></span><br><span class="line">rsi <span class="number">0x7ffd30f0d1a0</span></span><br><span class="line">rbp <span class="number">0x8347ce</span></span><br><span class="line">rsp <span class="number">0x7ffd30f0d218</span></span><br><span class="line">r8 <span class="number">0x0</span></span><br><span class="line">r9 <span class="number">0x6</span></span><br><span class="line">r10 <span class="number">0x8</span></span><br><span class="line">r11 <span class="number">0x246</span></span><br><span class="line">r12 <span class="number">0x2bedc30</span></span><br><span class="line">r13 <span class="number">0xf1</span></span><br><span class="line">r14 <span class="number">0x11</span></span><br><span class="line">r15 <span class="number">0x0</span></span><br><span class="line">rip <span class="number">0x6b657e</span></span><br><span class="line">rflags <span class="number">0x246</span></span><br><span class="line">cs <span class="number">0x33</span></span><br><span class="line">fs <span class="number">0x0</span></span><br><span class="line">gs <span class="number">0x0</span></span><br><span class="line">exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">"read init-p: connection reset by peer"</span></span><br></pre></td></tr></table></figure>
<p>由上可知,异常是runc返回的。</p>
<h4 id="定位原因"><a href="#定位原因" class="headerlink" title="定位原因"></a>定位原因</h4><p>定位异常组件的同时,runc还给了我们一个惊喜:提供了详细的异常日志。</p>
<p>异常日志表明:runc exec失败的原因是因为 <code>Resource temporarily unavailable</code>,比较典型的资源不足问题。而常见的资源不足类型主要包含(ulimit -a):</p>
<ul>
<li>线程数达到限制</li>
<li>文件数达到限制</li>
<li>内存达到限制</li>
</ul>
<p>因此,我们需要进一步排查业务容器的监控,以定位不足的资源类型。</p>
<p><img src="thread-monitor.png" alt="业务线程数监控指标"></p>
<p>上图展示了业务容器的线程数监控。所有容器的线程数都已经达到1w,而弹性云默认限制容器的线程数上限就是1w,设定该上限的原因,也是为了避免单容器线程泄漏而耗尽宿主机的线程资源。</p>
<figure class="highlight gradle"><table><tr><td class="code"><pre><span class="line"># cat <span class="regexp">/sys/</span>fs<span class="regexp">/cgroup/</span>pids<span class="regexp">/kubepods/</span>burstable<span class="regexp">/pod64a6c0e7-830c-11eb-86d6-b8cef604db88/</span>aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e/pids.max</span><br><span class="line"><span class="number">10000</span></span><br></pre></td></tr></table></figure>
<p>至此,问题的原因已定位清楚,对,就是这么简单。</p>
<h3 id="runc梳理"><a href="#runc梳理" class="headerlink" title="runc梳理"></a>runc梳理</h3><p>虽然,我们已经定位了异常日志的成因,但是,对于runc的具体工作机制,一直只有一个模糊的概念。</p>
<p>趁此机会,我们以runc exec为例,梳理runc的工作流程。</p>
<ul>
<li>runc exec首先启动子进程runc init</li>
<li>runc init负责初始化容器namespace<ul>
<li>runc init利用C语言的constructor特性,实现在go代码启动之前,设置容器namespace</li>
<li>C代码nsexec执行两次clone,共三个线程:父进程,子进程,孙进程,完成对容器namespace的初始化</li>
<li>父进程与子进程完成初始化任务后退出,此时,孙进程已经在容器namespace内,孙进程开始执行go代码初始化,并等待接收runc exec发送配置</li>
</ul>
</li>
<li>runc exec将孙进程添加到容器cgroup</li>
<li>runc exec发送配置给孙进程,配置主要包含:exec的具体命令与参数等</li>
<li>孙进程调用system.Execv执行用户命令</li>
</ul>
<p>注意:</p>
<ul>
<li>步骤2.c与步骤3是并发执行的</li>
<li>runc exec与runc init通信基于socket pair对(init-p和init-c)</li>
</ul>
<p>runc exec过程中各进程的交互流程,以及namespace与cgroup的初始化参见下图:</p>
<p><img src="runc-detail.png" alt="runc工作流程"></p>
<p>综合我们对runc exec执行流程的梳理,以及runc exec返回的错误信息,我们基本定位到了runc exec返回错误的代码:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *setnsProcess)</span> <span class="title">start</span><span class="params">()</span> <span class="params">(err error)</span></span> {</span><br><span class="line"> <span class="keyword">defer</span> p.parentPipe.Close()</span><br><span class="line"> err = p.cmd.Start()</span><br><span class="line"> p.childPipe.Close()</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">"starting setns process"</span>)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> p.bootstrapData != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">if</span> _, err := io.Copy(p.parentPipe, p.bootstrapData); err != <span class="literal">nil</span> { <span class="comment">// clone标志位,ns配置</span></span><br><span class="line"> <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">"copying bootstrap data to pipe"</span>)</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> err = p.execSetns(); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">"executing setns process"</span>)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(p.cgroupPaths) > <span class="number">0</span> {</span><br><span class="line"> <span class="keyword">if</span> err := cgroups.EnterPid(p.cgroupPaths, p.pid()); err != <span class="literal">nil</span> { <span class="comment">// 这里将runc init添加到容器cgroup中</span></span><br><span class="line"> <span class="keyword">return</span> newSystemErrorWithCausef(err, <span class="string">"adding pid %d to cgroups"</span>, p.pid())</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> err := utils.WriteJSON(p.parentPipe, p.config); err != <span class="literal">nil</span> { <span class="comment">// 发送配置:命令、环境变量等</span></span><br><span class="line"> <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">"writing config to pipe"</span>)</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> ierr := parseSync(p.parentPipe, <span class="function"><span class="keyword">func</span><span class="params">(sync *syncT)</span> <span class="title">error</span></span> { <span class="comment">// 这里返回 read init-p: connection reset by peer</span></span><br><span class="line"> <span class="keyword">switch</span> sync.Type {</span><br><span class="line"> <span class="keyword">case</span> procReady:</span><br><span class="line"> <span class="comment">// This shouldn't happen.</span></span><br><span class="line"> <span class="built_in">panic</span>(<span class="string">"unexpected procReady in setns"</span>)</span><br><span class="line"> <span class="keyword">case</span> procHooks:</span><br><span class="line"> <span class="comment">// This shouldn't happen.</span></span><br><span class="line"> <span class="built_in">panic</span>(<span class="string">"unexpected procHooks in setns"</span>)</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">return</span> newSystemError(fmt.Errorf(<span class="string">"invalid JSON payload from child"</span>))</span><br><span class="line"> }</span><br><span class="line"> })</span><br><span class="line"> <span class="keyword">if</span> ierr != <span class="literal">nil</span> {</span><br><span class="line"> p.wait()</span><br><span class="line"> <span class="keyword">return</span> ierr</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>现在,问题的成因与代码分析已全部完成。</p>
<h3 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h3><ol>
<li><a href="https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt">https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt</a></li>
<li><a href="https://github.com/opencontainers/runc">https://github.com/opencontainers/runc</a></li>
</ol>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>docker</tag>
<tag>containerd</tag>
<tag>runc</tag>
</tags>
</entry>
<entry>
<title>pod terminating 排查之旅(二)</title>
<url>/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85-%E4%BA%8C/</url>
<content><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>近期,线上报障了多起Pod删除失败的Case,用户的多次删除请求均已失败告终。Pod删除失败的影响主要有二:</p>
<ul>
<li>面向用户:用户体验下降,且无法对该Pod执行后续的发布流程</li>
<li>面向弹性云:失败率提升,SLA无法达标</li>
</ul>
<p>并且,随着弹性云Docker版本升级 (1.13 → 18.06) 进度的推进,线上出现Pod删除失败的Case隐隐有增多的趋势。</p>
<p>线上问题无小事!不论是从哪个角度出发,我们都应该给线上环境把把脉,看看是哪个系统出了问题。</p>
<h1 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h1><p>由于线上出Case的频率并不低,基本每周都会出现,这反而给我们定位问题带来了便利^_^。</p>
<p>排查线上问题的思路一般分如下两个步骤:</p>
<ul>
<li>定位出现问题的组件</li>
<li>定位组件出现的问题</li>
</ul>
<h2 id="组件定位"><a href="#组件定位" class="headerlink" title="组件定位"></a>组件定位</h2><p>对于弹性云的同学来说,定位问题组件已经有一套标准流程:从上往下,看看问题是由哪个组件引起。【不清楚组件通信流程的同学可以看看<a href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">容器删除失败排查</a>】</p>
<p>1)第一嫌疑人:kubelet</p>
<p>在kubernetes体系架构下,删除Pod的执行者就是kubelet,作为第一个提审对象,它不冤。</p>
<p>虽然无奈,但是kubelet也早已习惯了,并且在多次经历过社会的毒打之后,练就了一身的甩锅能力:</p>
<figure class="highlight angelscript"><table><tr><td class="code"><pre><span class="line">I0624 <span class="number">11</span>:<span class="number">00</span>:<span class="number">26.658872</span> <span class="number">21280</span> kubelet.go:<span class="number">1923</span>] skipping pod synchronization - [PLEG <span class="keyword">is</span> <span class="keyword">not</span> healthy: pleg was last seen active <span class="number">3</span>m0<span class="number">.439656895</span>s ago; threshold <span class="keyword">is</span> <span class="number">3</span>m0s]</span><br></pre></td></tr></table></figure>
<p>什么意思?死贫道不死道友呗。</p>
<p>PLEG模块不健康?<code>PLEG</code>是kubelet的一个子模块单元,用来统一管理底层容器的运行状态。</p>
<p>kubelet招供:我是好人啊,不能冤枉我,都是docker惹的祸!</p>
<p>2)第二嫌疑人:dockerd</p>
<p>根据kubelet的证词,我们很快提审本案的第二嫌疑人:dockerd。</p>
<p>为自证清白,dockerd三下五除二打出一套军体拳:</p>
<figure class="highlight awk"><table><tr><td class="code"><pre><span class="line"><span class="comment"># docker ps</span></span><br><span class="line"><span class="regexp">//</span> 运行正常,dockerd轻舒一口气</span><br><span class="line"><span class="comment"># docker ps -a | grep -V NAMES | awk '{print $1}' | xargs -ti docker inspect -f {{.State.Pid}} {}</span></span><br><span class="line"><span class="regexp">//</span> 执行到 docker inspect -f {{.State.Pid}} <span class="number">60</span>f253d59f26 时,命令卡住</span><br></pre></td></tr></table></figure>
<p>该容器恰好属于用户删除失败的Pod。</p>
<p>你们可能没看到当时dockerd的脸色,面如死灰,且一直喃喃自语:难道真是我的锅?</p>
<p>好几分钟后,dockerd才慢慢缓过来,理了理思绪,想好了一套甩锅流程:虽然问题出现在我这,但是你们的证据不足,不能证明是我亲手干的。我手下养着一大帮人,可能是一些小弟自己偷偷干的。</p>
<p>嘿,你还有理了,那好,我继续收集证据,让你死的明明白白。</p>
<p>3)第三嫌疑人:containerd</p>
<p>作为dockerd手下的二当家,我们首先传讯了containerd。这人一看就老实忠厚,它看着dockerd的证词,苦笑了下,吐槽到:跟着大哥这么多年,还是没有得到大哥的信任(所以kubernetes在1.20版本中,将containerd扶上了大哥的位置?哈哈)。</p>
<p>containerd有条不紊的祭出三板斧:</p>
<figure class="highlight gradle"><table><tr><td class="code"><pre><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd/docker-containerd.sock -n moby c ls</span><br><span class="line"><span class="comment">// 运行正常</span></span><br><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd/docker-containerd.sock -n moby t ls</span><br><span class="line"><span class="comment">// 命令卡死,containerd老脸一僵,但是很快恢复正常</span></span><br><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd<span class="regexp">/docker-containerd.sock -n moby c ls | grep -v IMAGE | awk '{print $1}' | xargs -ti docker-containerd-ctr -a /</span>var<span class="regexp">/run/</span>docker<span class="regexp">/containerd/</span>docker-containerd.sock -n moby t ps {}</span><br><span class="line"><span class="comment">// 执行到 docker-containerd-ctr -a /var/run/docker/containerd/docker-containerd.sock -n moby t ps 60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d 时,命令卡住</span></span><br></pre></td></tr></table></figure>
<p>containerd神色不自然的说:不好意思啊,警官,可能是我家里不争气的孩子惹的祸,这孩子以前也犯过事【<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker hang 死排查</a>】。</p>
<p>4)第四嫌疑人:containerd-shim</p>
<p>containerd的话还没说完,待在一旁的儿子containerd-shim跳出来指着containerd叛逆地说:你凭什么说是我?你自己干的那些破事,我都不稀罕说你。</p>
<p>清官难断家务事!案件排查至此,从已知证据,还真不好确认到底是老子,还是儿子犯的罪。</p>
<p>5)第五嫌疑人:runc</p>
<p>万般无奈,我们清楚了本案的最后一个嫌疑人,也是年纪最小的runc。runc只会呀呀自语地说:不是我,不是我!</p>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="comment"># docker-runc --root /var/run/docker/runtime-runc/moby/ list</span></span><br><span class="line"><span class="attribute">60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span> <span class="number">0</span> stopped /run/docker/containerd/daemon/io.containerd.runtime.v<span class="number">1</span>.linux/moby/<span class="number">60</span>f<span class="number">253</span>d<span class="number">59</span>f<span class="number">26</span>e<span class="number">1</span>c<span class="number">573</span>d<span class="number">4</span>ba<span class="number">5</span>f<span class="number">824</span>e<span class="number">73</span>b<span class="number">3</span>a<span class="number">4</span>b<span class="number">1</span>bb<span class="number">1629</span>edace<span class="number">85</span>caba<span class="number">4</span>c<span class="number">620755</span>d<span class="number">4</span>d <span class="number">2021</span>-<span class="number">05</span>-<span class="number">07</span>T<span class="number">12</span>:<span class="number">43</span>:<span class="number">02</span>.<span class="number">62261156</span>Z root</span><br></pre></td></tr></table></figure>
<p>从现场收集到的证据表明,这个案件和runc还真没什么关系,案发时,它已经离开了现场,只不过留下了一个烂摊子等着别人来清理。</p>
<p>从众人招供的语录来看,案件嫌疑人初步锁定了containerd与containerd-shim,但是具体是谁,都还不好说。</p>
<p>正当大家一筹莫展之际,一位老刑警从现场提取到了一些新的线索,案件终于有了新的进展。</p>
<p>6)新线索</p>
<p>老刑警领着大家观察它收集到的新线索:</p>
<figure class="highlight yaml"><table><tr><td class="code"><pre><span class="line"><span class="comment"># sudo docker ps -a | grep PODNAME</span></span><br><span class="line"><span class="string">60f253d59f26</span> <span class="string">image</span> <span class="string">"/dockerinit"</span> <span class="number">6</span> <span class="string">weeks</span> <span class="string">ago</span> <span class="string">Up</span> <span class="number">6</span> <span class="string">weeks</span> <span class="string">k8s_CNAME_PODNAME_default_013e3b0e-8d17-11eb-8ef7-246e9693e13c_10</span></span><br><span class="line"><span class="string">6e6fc586dc12</span> <span class="string">pause:3.1</span> <span class="string">"/pause"</span> <span class="number">6</span> <span class="string">weeks</span> <span class="string">ago</span> <span class="string">Exited</span> <span class="string">(0)</span> <span class="string">About</span> <span class="string">an</span> <span class="string">hour</span> <span class="string">ago</span> <span class="string">k8s_POD_PODNAME_default_013e3b0e-8d17-11eb-8ef7-246e9693e13c_1</span></span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment"># ps -ef|grep 60f253d59f26</span></span><br><span class="line"><span class="string">root</span> <span class="number">119820</span> <span class="number">3608 </span><span class="number">0</span> <span class="string">May07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:11:16</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span> <span class="string">-workdir</span> <span class="string">/docker/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span> <span class="string">-address</span> <span class="string">/var/run/docker/containerd/docker-containerd.sock</span> <span class="string">-containerd-binary</span> <span class="string">/usr/bin/docker-containerd</span> <span class="string">-runtime-root</span> <span class="string">/var/run/docker/runtime-runc</span></span><br><span class="line"><span class="string">stupig</span> <span class="number">1629698</span> <span class="number">1599793</span> <span class="number">0</span> <span class="number">11</span><span class="string">:44</span> <span class="string">pts/0</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">grep</span> <span class="string">--color=auto</span> <span class="string">60f253d59f26</span></span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment"># ps -ef|grep 119820</span></span><br><span class="line"><span class="string">root</span> <span class="number">40825</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun04</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">40833</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun04</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">119820</span> <span class="number">3608 </span><span class="number">0</span> <span class="string">May07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:11:16</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span> <span class="string">-workdir</span> <span class="string">/docker/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span> <span class="string">-address</span> <span class="string">/var/run/docker/containerd/docker-containerd.sock</span> <span class="string">-containerd-binary</span> <span class="string">/usr/bin/docker-containerd</span> <span class="string">-runtime-root</span> <span class="string">/var/run/docker/runtime-runc</span></span><br><span class="line"><span class="string">root</span> <span class="number">119886</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">May07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:04:29</span> [<span class="string">dockerinit</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">568896</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">568898</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">695031</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">May08</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">74647</span> <span class="number">695037</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">May08</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">802705</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">74647</span> <span class="number">802709</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">865131</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">69099</span> <span class="number">865133</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1073407</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">1073428</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1375526</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">69099</span> <span class="number">1375561</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1397568</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun16</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">69099</span> <span class="number">1397570</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun16</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1483339</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">1483341</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">stupig</span> <span class="number">1631234</span> <span class="number">1599793</span> <span class="number">0</span> <span class="number">11</span><span class="string">:44</span> <span class="string">pts/0</span> <span class="number">00</span><span class="string">:00:00</span> <span class="string">grep</span> <span class="string">--color=auto</span> <span class="number">119820</span></span><br><span class="line"><span class="string">root</span> <span class="number">1692888</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">1692903</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1882984</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun21</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">1882985</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun21</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">1964311</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">1964318</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2019760</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2019784</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2122420</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2122434</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2288703</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">69099</span> <span class="number">2288705</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2330164</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2330166</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2406740</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">May27</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">74647</span> <span class="number">2406745</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">May27</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2421050</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2421069</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2445918</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2445927</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2487600</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2487602</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="string">root</span> <span class="number">2897660</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string"><defunct></span></span><br><span class="line"><span class="number">76183</span> <span class="number">2897662</span> <span class="number">119820</span> <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span> <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string"><defunct></span></span><br></pre></td></tr></table></figure>
<p>同一个Pod内pause容器依然退出,但是业务容器却没有退出,并且业务容器关联的containerd-shim进程并未执行子进程收割动作,就像是卡住了。</p>
<p>面对这些新证据,containerd-shim毫无征兆地崩溃了,大哭道:为什么总是我?</p>
<h2 id="问题定位-1"><a href="#问题定位-1" class="headerlink" title="问题定位"></a>问题定位</h2><p>言归正传,我们如何能根据上述现象快速定位问题呢?思路有三:</p>
<ol>
<li>拿着现象问谷歌</li>
<li>带着问题看代码</li>
<li>深挖现场定问题</li>
</ol>
<p>1)谷歌大法</p>
<p>当我们拿着问题呈现的现象搜索谷歌时,还真搜到了关联的内容:<a href="https://github.com/containerd/containerd/issues/2709">Exec process may cause shim hang</a>。</p>
<p>该issue中所描述的内容和我们碰到的问题基本一致。问题是由于 <a href="https://github.com/containerd/containerd/blob/v1.1.2/reaper/reaper.go#L34">reaper.Default</a> 处定义的channel大小太小引起的,调整channel大小可规避该问题。</p>
<p>2)理解代码</p>
<p>尽管本问题可以通过一些手段规避,但是我们还是需要理解代码中出现的问题。</p>
<p>containerd-shim的主要事务是执行runc命令,主要功能是托管真正的容器进程,并暴露一个服务,供外部用户与容器进行交互。</p>
<p>containerd-shim内部处理逻辑如下图所示:</p>
<p><img src="containerd-shim.png" alt="containerd-shim简单架构"></p>
<p>GRPC服务:containerd-shim的核心服务,对外暴露众多接口,诸如创建/启动task等,并调用runc执行对应的命令。</p>
<p>此外,containerd-shim内启动了三个协程(包含主协程)共同处理容器内进程退出事件。首先是主协程handleSignals:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handleSignals</span><span class="params">(logger *logrus.Entry, signals <span class="keyword">chan</span> os.Signal, server *ttrpc.Server, sv *shim.Service)</span> <span class="title">error</span></span> {</span><br><span class="line"> signals := <span class="built_in">make</span>(<span class="keyword">chan</span> os.Signal, <span class="number">32</span>)</span><br><span class="line"> signal.Notify(signals, unix.SIGTERM, unix.SIGINT, unix.SIGCHLD, unix.SIGPIPE)</span><br><span class="line"> runc.Monitor = reaper.Default</span><br><span class="line"> <span class="comment">// set the shim as the subreaper for all orphaned processes created by the container</span></span><br><span class="line"> <span class="keyword">if</span> err := system.SetSubreaper(<span class="number">1</span>); err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> {</span><br><span class="line"> <span class="keyword">select</span> {</span><br><span class="line"> <span class="keyword">case</span> s := <-signals:</span><br><span class="line"> <span class="keyword">switch</span> s {</span><br><span class="line"> <span class="keyword">case</span> unix.SIGCHLD:</span><br><span class="line"> <span class="keyword">if</span> err := reaper.Reap(); err != <span class="literal">nil</span> {</span><br><span class="line"> logger.WithError(err).Error(<span class="string">"reap exit status"</span>)</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">case</span> unix.SIGTERM, unix.SIGINT:</span><br><span class="line"> <span class="comment">// shim退出处理</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>containerd-shim调用<code>system.SetSubreaper</code>将自己作为容器内进程的收割者,一般容器内的1号进程也具备收割僵尸进程的能力,因此containerd-shim更多的是收割<code>runc exec</code>进容器内的进程。</p>
<p>当有僵尸进程出现时,就执行收割逻辑:</p>
<figure class="highlight css"><table><tr><td class="code"><pre><span class="line"><span class="selector-tag">func</span> <span class="selector-tag">Reap</span>() <span class="selector-tag">error</span> {</span><br><span class="line"> <span class="attribute">now </span>:= time.<span class="built_in">Now</span>()</span><br><span class="line"> exits, err := sys.<span class="built_in">Reap</span>(false) // 调用wait系统调用处理僵尸进程</span><br><span class="line"> Default.<span class="built_in">Lock</span>()</span><br><span class="line"> for c := range Default.subscribers { // 将退出事件发送给所有订阅者</span><br><span class="line"> for _, e := range exits {</span><br><span class="line"> c <- runc.Exit{</span><br><span class="line"> Timestamp: now,</span><br><span class="line"> Pid: e.Pid,</span><br><span class="line"> Status: e.Status,</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> }</span><br><span class="line"> <span class="selector-tag">Default</span><span class="selector-class">.Unlock</span>()</span><br><span class="line"> <span class="selector-tag">return</span> <span class="selector-tag">err</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>那么谁又是订阅者呢?shim在初始化时就订阅了一份进程退出事件:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewService</span><span class="params">(config Config, publisher events.Publisher)</span> <span class="params">(*Service, error)</span></span> {</span><br><span class="line"> s := &Service{</span><br><span class="line"> processes: <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">string</span>]proc.Process),</span><br><span class="line"> events: <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">interface</span>{}, <span class="number">128</span>),</span><br><span class="line"> ec: reaper.Default.Subscribe(), <span class="comment">// 订阅进程退出事件</span></span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">go</span> s.processExits() <span class="comment">// 退出事件处理</span></span><br><span class="line"> <span class="keyword">go</span> s.forward(publisher) <span class="comment">// 退出事件转发</span></span><br><span class="line"> <span class="keyword">return</span> s, <span class="literal">nil</span></span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>其中退出事件处理逻辑如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">processExits</span><span class="params">()</span></span> {</span><br><span class="line"> <span class="keyword">for</span> e := <span class="keyword">range</span> s.ec {</span><br><span class="line"> s.checkProcesses(e)</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">checkProcesses</span><span class="params">(e runc.Exit)</span></span> {</span><br><span class="line"> s.mu.Lock()</span><br><span class="line"> <span class="keyword">defer</span> s.mu.Unlock()</span><br><span class="line"> <span class="keyword">for</span> _, p := <span class="keyword">range</span> s.processes {</span><br><span class="line"> <span class="keyword">if</span> p.Pid() == e.Pid {</span><br><span class="line"> s.events <- &eventstypes.TaskExit{}</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>只处理<code>s.processes</code>的退出事件,而<code>s.processes</code>关联的都是什么对象呢?主要有二:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// Create a new initial process and container with the underlying OCI runtime</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">Create</span><span class="params">(ctx context.Context, r *shimapi.CreateTaskRequest)</span> <span class="params">(*shimapi.CreateTaskResponse, error)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> s.processes[r.ID] = process</span><br><span class="line"> ......</span><br><span class="line">}</span><br><span class="line"><span class="comment">// Exec an additional process inside the container</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">Exec</span><span class="params">(ctx context.Context, r *shimapi.ExecProcessRequest)</span> <span class="params">(*ptypes.Empty, error)</span></span> {</span><br><span class="line"> ......</span><br><span class="line"> s.processes[r.ID] = process</span><br><span class="line"> ......</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>代码就展示这些,其他代码大家感兴趣的自己查阅。</p>
<p>现在,我们再来根据现象查问题。从现象可知,异常容器待收割的僵尸进程较多,肯定超过了32个。当shim在收割众多僵尸进程时,往订阅者信道(大小32)中发送时,出现阻塞,阻塞点:<a href="https://github.com/containerd/containerd/blob/v1.1.2/reaper/reaper.go#L44">阻塞信号</a>,并且此时持有<code>Default.Lock</code>这一把大锁。</p>
<p>那么只要这时候再有人来申请这把锁,就会形成死锁。</p>
<p>那么究竟谁会来申请这把锁呢?这时候,要是能查看containerd-shim的协程栈就好了。</p>
<p>3)现场分析</p>
<p>确实,containerd-shim启动了一个协程方便用户导出协程栈信息,我们来看看能不能行呢?</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">executeShim</span><span class="params">()</span> <span class="title">error</span></span> {</span><br><span class="line"> dump := <span class="built_in">make</span>(<span class="keyword">chan</span> os.Signal, <span class="number">32</span>)</span><br><span class="line"> signal.Notify(dump, syscall.SIGUSR1)</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> {</span><br><span class="line"> <span class="keyword">for</span> <span class="keyword">range</span> dump {</span><br><span class="line"> dumpStacks(logger)</span><br><span class="line"> }</span><br><span class="line"> }()</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">dumpStacks</span><span class="params">(logger *logrus.Entry)</span></span> {</span><br><span class="line"> <span class="keyword">var</span> (</span><br><span class="line"> buf []<span class="keyword">byte</span></span><br><span class="line"> stackSize <span class="keyword">int</span></span><br><span class="line"> )</span><br><span class="line"> bufferLen := <span class="number">16384</span></span><br><span class="line"> <span class="keyword">for</span> stackSize == <span class="built_in">len</span>(buf) {</span><br><span class="line"> buf = <span class="built_in">make</span>([]<span class="keyword">byte</span>, bufferLen)</span><br><span class="line"> stackSize = runtime.Stack(buf, <span class="literal">true</span>)</span><br><span class="line"> bufferLen *= <span class="number">2</span></span><br><span class="line"> }</span><br><span class="line"> buf = buf[:stackSize]</span><br><span class="line"> logger.Infof(<span class="string">"=== BEGIN goroutine stack dump ===\n%s\n=== END goroutine stack dump ==="</span>, buf)</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>一切看起来貌似没什么问题,我们发送<code>SIGUSR1</code>就能获取一份协程栈。</p>
<p>当我们真正去执行时,却发现到处了一个寂寞。根本原因在于:</p>
<ul>
<li>logger.Infof()往os.Stdout输出</li>
<li>由于线上环境docker没有开启<code>debug</code>模式,线上containerd-shim的os.Stdout被赋值为<code>/dev/null</code></li>
</ul>
<figure class="highlight apache"><table><tr><td class="code"><pre><span class="line"><span class="attribute">COMMAND</span> PID USER FD TYPE DEVICE SIZE/<span class="literal">OFF</span> NODE NAME</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root cwd DIR <span class="number">0</span>,<span class="number">18</span> <span class="number">120</span> <span class="number">916720892</span> /run/docker/containerd/daemon/io.containerd.runtime.v<span class="number">1</span>.linux/moby/<span class="number">60</span>f<span class="number">253</span>d<span class="number">59</span>f<span class="number">26</span>e<span class="number">1</span>c<span class="number">573</span>d<span class="number">4</span>ba<span class="number">5</span>f<span class="number">824</span>e<span class="number">73</span>b<span class="number">3</span>a<span class="number">4</span>b<span class="number">1</span>bb<span class="number">1629</span>edace<span class="number">85</span>caba<span class="number">4</span>c<span class="number">620755</span>d<span class="number">4</span>d</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root rtd DIR <span class="number">8</span>,<span class="number">3</span> <span class="number">4096</span> <span class="number">2</span> /</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root txt REG <span class="number">8</span>,<span class="number">3</span> <span class="number">4173632</span> <span class="number">392525</span> /usr/bin/docker-containerd-shim</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root <span class="number">0</span>r CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t<span class="number">0</span> <span class="number">2052</span> /dev/null</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root <span class="number">1</span>w CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t<span class="number">0</span> <span class="number">2052</span> /dev/null</span><br><span class="line"><span class="attribute">docker</span>-co <span class="number">119820</span> root <span class="number">2</span>w CHR <span class="number">1</span>,<span class="number">3</span> <span class="number">0</span>t<span class="number">0</span> <span class="number">2052</span> /dev/null</span><br></pre></td></tr></table></figure>
<p>问题排查至此,似乎又僵住了!好在,之前看过一丢丢内核问题排查相关知识。虽然线上containerd-shim将协程栈的信息全部导出到了<code>/dev/null</code>中,但是我们还是有一些手段获取。</p>
<p>赶紧找组里的内核大佬帮忙,并很快确定了方案,基于kprobe,在操作系统往<code>/dev/null</code>设备写入协程栈时,拷贝一份内容写到内核日志中。方案实施起来也不复杂:</p>
<figure class="highlight cpp"><table><tr><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/kernel.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/module.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/kprobes.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/sched.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><asm/uaccess.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/slab.h></span></span></span><br><span class="line"> </span><br><span class="line"><span class="keyword">static</span> <span class="keyword">int</span> pid;</span><br><span class="line">module_param(pid, <span class="keyword">int</span>, <span class="number">0</span>);</span><br><span class="line"> </span><br><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">kprobe</span> <span class="title">kp</span> =</span> {</span><br><span class="line"> .symbol_name = <span class="string">"write_null"</span>,</span><br><span class="line">};</span><br><span class="line"> </span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> SEGMENT 512</span></span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">handler_pre</span><span class="params">(struct kprobe *p, struct pt_regs *regs)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">char</span> *wbuf;</span><br><span class="line"> <span class="keyword">size_t</span> count, place = <span class="number">0</span>;</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (pid != current->tgid) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> count = (<span class="keyword">size_t</span>)(regs->dx);</span><br><span class="line"> </span><br><span class="line"> printk(KERN_INFO <span class="string">"%u call write_null count: %lu\n"</span>, current->tgid, count);</span><br><span class="line"> </span><br><span class="line"> wbuf = (<span class="keyword">char</span> *)kmalloc(count + <span class="number">1</span>, <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (wbuf == <span class="literal">NULL</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">memset</span>(wbuf, <span class="number">0x0</span>, count + <span class="number">1</span>);</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (copy_from_user(wbuf, (<span class="keyword">void</span> *)regs->si, count)) {</span><br><span class="line"> printk(KERN_ERR <span class="string">"copy_from_user fail\n"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">while</span> (place < count) {</span><br><span class="line"> <span class="keyword">char</span> tmp[SEGMENT + <span class="number">1</span>];</span><br><span class="line"> <span class="built_in">memset</span>(tmp, <span class="number">0x0</span>, SEGMENT + <span class="number">1</span>);</span><br><span class="line"> </span><br><span class="line"> <span class="built_in">snprintf</span>(tmp, SEGMENT + <span class="number">1</span>, <span class="string">"%s"</span>, wbuf + place);</span><br><span class="line"> <span class="keyword">if</span> ((count - place) >= SEGMENT) {</span><br><span class="line"> place += SEGMENT;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> place = count;</span><br><span class="line"> }</span><br><span class="line"> printk(KERN_INFO <span class="string">"%s\n"</span>, tmp);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (wbuf) {</span><br><span class="line"> kfree(wbuf);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> __init <span class="title">kprobe_init</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">int</span> ret;</span><br><span class="line"> kp.pre_handler = handler_pre;</span><br><span class="line"> </span><br><span class="line"> ret = register_kprobe(&kp);</span><br><span class="line"> <span class="keyword">if</span> (ret < <span class="number">0</span>) {</span><br><span class="line"> printk(KERN_INFO <span class="string">"register_kprobe failed, returned %d\n"</span>, ret);</span><br><span class="line"> <span class="keyword">return</span> ret;</span><br><span class="line"> }</span><br><span class="line"> printk(KERN_INFO <span class="string">"Planted kprobe at %p\n"</span>, kp.addr);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> __exit <span class="title">kprobe_exit</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> unregister_kprobe(&kp);</span><br><span class="line"> printk(KERN_INFO <span class="string">"kprobe at %p unregistered\n"</span>, kp.addr);</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line">module_init(kprobe_init)</span><br><span class="line">module_exit(kprobe_exit)</span><br><span class="line">MODULE_LICENSE(<span class="string">"GPL"</span>);</span><br></pre></td></tr></table></figure>
<p>这里着重感谢睿哥提供的kprobe代码。kprobe的相关知识,感兴趣的自己学习下。</p>
<p>我们在线上部署了该kprobe内核模块之后,往containerd-shim发送<code>SIGUSR1</code>顺利获取了协程栈信息。终于补全了问题排查的最后一块拼盘。</p>
<p>补:这里重点感谢飞哥(不愧是老中医)的提醒,其实还有一个非常简单的方法获取协程栈:借助strace跟踪进程的系统调用。</p>
<p>线上containerd-shim的协程栈信息(删减了大量不重要协程栈,并调整了格式)展示如下:</p>
<figure class="highlight yaml"><table><tr><td class="code"><pre><span class="line"><span class="string">time="2021-06-23T16:35:07+08:00"</span> <span class="string">level=info</span> <span class="string">msg="===</span> <span class="string">BEGIN</span> <span class="string">goroutine</span> <span class="string">stack</span> <span class="string">dump</span> <span class="string">===</span></span><br><span class="line"><span class="string">goroutine</span> <span class="number">22</span> [<span class="string">running</span>]<span class="string">:</span></span><br><span class="line"><span class="string">main.dumpStacks(0xc4201c81e0)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:228</span> <span class="string">+0x8a</span></span><br><span class="line"><span class="string">main.executeShim.func1(0xc42012c300,</span> <span class="number">0xc4201c81e0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:148</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.executeShim</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:146</span> <span class="string">+0x5de</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">1</span> [<span class="string">chan</span> <span class="string">send</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/reaper.Reap(0xc420243be0,</span> <span class="number">0x1</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/reaper/reaper.go:44</span> <span class="string">+0x168</span></span><br><span class="line"><span class="string">main.handleSignals(0xc4201c81e0,</span> <span class="number">0xc42012c240</span><span class="string">,</span> <span class="number">0xc4201d6090</span><span class="string">,</span> <span class="number">0xc4201e4000</span><span class="string">,</span> <span class="number">0xc420117ea0</span><span class="string">,</span> <span class="number">0x86</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:197</span> <span class="string">+0x2a1</span></span><br><span class="line"><span class="string">main.executeShim(0x2,</span> <span class="number">0x60</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:151</span> <span class="string">+0x616</span></span><br><span class="line"><span class="string">main.main()</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:96</span> <span class="string">+0x81</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">5</span> [<span class="string">syscall</span>]<span class="string">:</span></span><br><span class="line"><span class="string">os/signal.signal_recv(0x6c14c0)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/sigqueue.go:139</span> <span class="string">+0xa6</span></span><br><span class="line"><span class="string">os/signal.loop()</span></span><br><span class="line"> <span class="string">/usr/local/go/src/os/signal/signal_unix.go:22</span> <span class="string">+0x22</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">os/signal.init.0</span></span><br><span class="line"> <span class="string">/usr/local/go/src/os/signal/signal_unix.go:28</span> <span class="string">+0x41</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">6</span> [<span class="string">chan</span> <span class="string">receive</span>]<span class="string">:</span></span><br><span class="line"><span class="string">main.main.func1()</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:81</span> <span class="string">+0x7b</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.main</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:80</span> <span class="string">+0x46</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">7</span> [<span class="string">select</span>, <span class="number">92427</span> <span class="string">minutes</span>, <span class="string">locked</span> <span class="string">to</span> <span class="string">thread</span>]<span class="string">:</span></span><br><span class="line"><span class="string">runtime.gopark(0x6a70a0,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x696b74</span><span class="string">,</span> <span class="number">0x6</span><span class="string">,</span> <span class="number">0x18</span><span class="string">,</span> <span class="number">0x1</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/proc.go:291</span> <span class="string">+0x11a</span></span><br><span class="line"><span class="string">runtime.selectgo(0xc420104f50,</span> <span class="number">0xc42014a180</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/select.go:392</span> <span class="string">+0xe50</span></span><br><span class="line"><span class="string">runtime.ensureSigM.func1()</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/signal_unix.go:549</span> <span class="string">+0x1f4</span></span><br><span class="line"><span class="string">runtime.goexit()</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/asm_amd64.s:2361</span> <span class="string">+0x1</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">18</span> [<span class="string">semacquire</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">sync.runtime_SemacquireMutex(0xc4201e4004,</span> <span class="number">0x403200</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/sema.go:71</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">sync.(*Mutex).Lock(0xc4201e4000)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/sync/mutex.go:134</span> <span class="string">+0x108</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).checkProcesses(0xc4201e4000,</span> <span class="number">0xc02cd591b40305ef</span><span class="string">,</span> <span class="number">0x13af4280e2630f</span><span class="string">,</span> <span class="number">0x7fc440</span><span class="string">,</span> <span class="number">0x1c11b</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:470</span> <span class="string">+0x45</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).processExits(0xc4201e4000)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:465</span> <span class="string">+0xd0</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.NewService</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:86</span> <span class="string">+0x3e9</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">19</span> [<span class="string">syscall</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">syscall.Syscall6(0xe8,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc4201189b8</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0xffffffffffffffff</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0xc420118938</span><span class="string">,</span> <span class="number">0x45b793</span><span class="string">,</span> <span class="number">0xc42047add0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/syscall/asm_linux_amd64.s:44</span> <span class="string">+0x5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait(0x4,</span> <span class="number">0xc4201189b8</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0xffffffffffffffff</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:1518</span> <span class="string">+0x7a</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/console.(*Epoller).Wait(0xc4201be060,</span> <span class="number">0xc420117aa8</span><span class="string">,</span> <span class="number">0xc420117ab0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/console/console_linux.go:110</span> <span class="string">+0x7a</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.(*Service).initPlatform</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service_linux.go:109</span> <span class="string">+0xc6</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">20</span> [<span class="string">chan</span> <span class="string">receive</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).forward(0xc4201e4000,</span> <span class="number">0x6c0600</span><span class="string">,</span> <span class="number">0xc4201d4010</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:514</span> <span class="string">+0x62</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.NewService</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:90</span> <span class="string">+0x49b</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">21</span> [<span class="string">IO</span> <span class="string">wait</span>, <span class="number">92427</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">internal/poll.runtime_pollWait(0x7f75331fcf00,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/netpoll.go:173</span> <span class="string">+0x57</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).wait(0xc4201e6118,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xc420010100</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:85</span> <span class="string">+0x9b</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).waitRead(0xc4201e6118,</span> <span class="number">0xffffffffffffff00</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:90</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">internal/poll.(*FD).Accept(0xc4201e6100,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_unix.go:372</span> <span class="string">+0x1a8</span></span><br><span class="line"><span class="string">net.(*netFD).accept(0xc4201e6100,</span> <span class="number">0xc4201d60a0</span><span class="string">,</span> <span class="number">0xc4201d6060</span><span class="string">,</span> <span class="number">0xc4201563c0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/net/fd_unix.go:238</span> <span class="string">+0x42</span></span><br><span class="line"><span class="string">net.(*UnixListener).accept(0xc4201d6270,</span> <span class="number">0x451c70</span><span class="string">,</span> <span class="number">0xc42011cea8</span><span class="string">,</span> <span class="number">0xc42011ceb0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/net/unixsock_posix.go:162</span> <span class="string">+0x32</span></span><br><span class="line"><span class="string">net.(*UnixListener).Accept(0xc4201d6270,</span> <span class="number">0x6a6b10</span><span class="string">,</span> <span class="number">0xc4201563c0</span><span class="string">,</span> <span class="number">0x6c3840</span><span class="string">,</span> <span class="number">0xc420012018</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/net/unixsock.go:253</span> <span class="string">+0x49</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Server).Serve(0xc4201d6090,</span> <span class="number">0x6c3440</span><span class="string">,</span> <span class="number">0xc4201d6270</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:69</span> <span class="string">+0x106</span></span><br><span class="line"><span class="string">main.serve.func1(0x6c3440,</span> <span class="number">0xc4201d6270</span><span class="string">,</span> <span class="number">0xc4201d6090</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:176</span> <span class="string">+0x71</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.serve</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:174</span> <span class="string">+0x1be</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">8</span> [<span class="string">select</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run(0xc4201563c0,</span> <span class="number">0x6c3840</span><span class="string">,</span> <span class="number">0xc420012018</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:398</span> <span class="string">+0x3f0</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Server).Serve</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:109</span> <span class="string">+0x25c</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">9</span> [<span class="string">IO</span> <span class="string">wait</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">internal/poll.runtime_pollWait(0x7f75331fce30,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xc42011ea48</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/netpoll.go:173</span> <span class="string">+0x57</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).wait(0xc4201a0118,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xffffffffffffff00</span><span class="string">,</span> <span class="number">0x6c0a40</span><span class="string">,</span> <span class="number">0x7e11b8</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:85</span> <span class="string">+0x9b</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).waitRead(0xc4201a0118,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:90</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">internal/poll.(*FD).Read(0xc4201a0100,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/internal/poll/fd_unix.go:157</span> <span class="string">+0x17d</span></span><br><span class="line"><span class="string">net.(*netFD).Read(0xc4201a0100,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0xc42011eb30</span><span class="string">,</span> <span class="number">0x451430</span><span class="string">,</span> <span class="number">0xc420001b00</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/net/fd_unix.go:202</span> <span class="string">+0x4f</span></span><br><span class="line"><span class="string">net.(*conn).Read(0xc42000c050,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/net/net.go:176</span> <span class="string">+0x6a</span></span><br><span class="line"><span class="string">bufio.(*Reader).Read(0xc42012c480,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0x2</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/bufio/bufio.go:216</span> <span class="string">+0x238</span></span><br><span class="line"><span class="string">io.ReadAtLeast(0x6c02c0,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x6c62e0</span><span class="string">,</span> <span class="number">0xc42011ef50</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/io/io.go:309</span> <span class="string">+0x86</span></span><br><span class="line"><span class="string">io.ReadFull(0x6c02c0,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xc42011eef0</span><span class="string">,</span> <span class="number">0x3</span><span class="string">,</span> <span class="number">0x3</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/io/io.go:327</span> <span class="string">+0x58</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4200105a0,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x6c02c0</span><span class="string">,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc42011ee70</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0xc42011eed0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38</span> <span class="string">+0x60</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc420010580,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0xc420392940</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x73</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86</span> <span class="string">+0x6d</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func1(0xc42014a2a0,</span> <span class="number">0xc4201563c0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,</span> <span class="number">0xc420010580</span><span class="string">,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42014a300</span><span class="string">,</span> <span class="number">0xc42012c4e0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:329</span> <span class="string">+0x1bf</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:299</span> <span class="string">+0x247</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">22661144</span> [<span class="string">chan</span> <span class="string">receive</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/reaper.(*Monitor).Wait(0x7fb5d0,</span> <span class="number">0xc42017a580</span><span class="string">,</span> <span class="number">0xc420674360</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x68c3c0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/reaper/reaper.go:82</span> <span class="string">+0x52</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.cmdOutput(0xc42017a580,</span> <span class="number">0x1</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:693</span> <span class="string">+0x110</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.(*Runc).runOrError(0xc42018b6c0,</span> <span class="number">0xc42017a580</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4201d6c30</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:673</span> <span class="string">+0x19b</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.(*Runc).Kill(0xc42018b6c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc420014480</span><span class="string">,</span> <span class="number">0x40</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0xc42054bbc7</span><span class="string">,</span> <span class="number">0xc420393080</span><span class="string">,</span> <span class="number">0xc42012d9e0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:320</span> <span class="string">+0x1e2</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/proc.(*Init).kill(0xc42018c3c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0x40</span><span class="string">,</span> <span class="number">0xc42054bc01</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/proc/init.go:341</span> <span class="string">+0x78</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/proc.(*runningState).Kill(0xc4201ca030,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/proc/init_state.go:331</span> <span class="string">+0xa5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).Kill(0xc4201e4000,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a940</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:356</span> <span class="string">+0x271</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim/v1.RegisterShimService.func10(0x6c3800,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a920</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc4201be1a0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1670</span> <span class="string">+0xc5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).dispatch(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e6570</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc420226c30</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:71</span> <span class="string">+0x10e</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).call(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e6570</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc420226c30</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:44</span> <span class="string">+0xb5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func2(0xc4201563c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xace455</span><span class="string">,</span> <span class="number">0xc420392900</span><span class="string">,</span> <span class="number">0xc42014a2a0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,0xc400ace455)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:402</span> <span class="string">+0xaa</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:401</span> <span class="string">+0x763</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">22661145</span> [<span class="string">semacquire</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">sync.runtime_SemacquireMutex(0xc4201e4004,</span> <span class="number">0x64b800</span><span class="string">)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/runtime/sema.go:71</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">sync.(*Mutex).Lock(0xc4201e4000)</span></span><br><span class="line"> <span class="string">/usr/local/go/src/sync/mutex.go:134</span> <span class="string">+0x108</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).State(0xc4201e4000,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc420120af0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:271</span> <span class="string">+0x59</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim/v1.RegisterShimService.func1(0x6c3800,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a980</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc4201be080</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1607</span> <span class="string">+0xc8</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).dispatch(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e65a0</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc420226c80</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:71</span> <span class="string">+0x10e</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).call(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e65a0</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc420226c80</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:44</span> <span class="string">+0xb5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func2(0xc4201563c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xace457</span><span class="string">,</span> <span class="number">0xc420392940</span><span class="string">,</span> <span class="number">0xc42014a2a0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,</span> <span class="number">0xc400ace457</span><span class="string">)</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:402</span> <span class="string">+0xaa</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line"> <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:401</span> <span class="string">+0x763</span></span><br><span class="line"> </span><br><span class="line"><span class="string">===</span> <span class="string">END</span> <span class="string">goroutine</span> <span class="string">stack</span> <span class="string">dump</span> <span class="string">==="</span> <span class="string">namespace=moby</span> <span class="string">path="/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d"</span> <span class="string">pid=119820</span></span><br></pre></td></tr></table></figure>
<p>分析以上协程栈可知:</p>
<ul>
<li>goroutine 1:<code>handleSignals</code>确实阻塞在<code>reaper.go:44</code>,导致后续进程无法被收割</li>
<li>goroutine 18:<code>checkProcesses</code>阻塞在<code>service.go:470</code>,获取锁失败,但是并非是<code>reaper.Default</code>这把大锁</li>
<li>goroutine 22661144:<code>shim.(*Service).Kill</code>阻塞在<code>reaper.go:82</code></li>
</ul>
<p>其中,最为异常的是<code>goroutine 22661144</code>,其阻塞点代码如下:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *Monitor)</span> <span class="title">Wait</span><span class="params">(c *exec.Cmd, ec <span class="keyword">chan</span> runc.Exit)</span> <span class="params">(<span class="keyword">int</span>, error)</span></span> {</span><br><span class="line"> <span class="keyword">for</span> e := <span class="keyword">range</span> ec { <span class="comment">// reaper.go:82</span></span><br><span class="line"> <span class="keyword">if</span> e.Pid == c.Process.Pid {</span><br><span class="line"> c.Wait()</span><br><span class="line"> m.Unsubscribe(ec)</span><br><span class="line"> <span class="keyword">return</span> e.Status, <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>, ErrNoSuchProcess</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>其中<code>ec</code>就是<code>reaper.Default</code>的一个订阅方。</p>
<p>死锁的形成如下:</p>
<ul>
<li>goroutine 22661144:等待着关联进程的退出事件到来,并且持有<code>Service.mu</code>这把锁</li>
<li>goroutine 18:等待获取<code>Service.mu</code>这把锁后,再去处理订阅的事件</li>
<li>goroutine 1:往所有订阅方发送事件</li>
</ul>
<p>这三个协程形成了完美的死锁现场。</p>
<h1 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h1><p>清楚了问题的成因之后,解决问题的方案也很简单,只需调整默认的订阅者信道大小即可。社区优化方案有二:</p>
<ul>
<li>调整信道大小,溢出事件自动忽略:<a href="https://github.com/containerd/containerd/pull/2748/files">https://github.com/containerd/containerd/pull/2748/files</a></li>
<li>优化锁逻辑:<a href="https://github.com/containerd/containerd/pull/2743">https://github.com/containerd/containerd/pull/2743</a> 【好几个commit】</li>
</ul>
<p>但是,当我们替换containerd-shim后,影响的也仅是在此之后创建的容器,而之前创建的容器仍然会出现该问题。</p>
<p>这个可以通过添加告警自愈的手段解决:直接杀containerd-shim进程。这样所有的进程都会由init进程完成收割。</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>docker</tag>
<tag>containers</tag>
</tags>
</entry>
<entry>
<title>client-go informer 缓存失效问题排查</title>
<url>/client-go-informer-%E7%BC%93%E5%AD%98%E5%A4%B1%E6%95%88%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/</url>
<content><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>长期以来,弹性云线上服务一直饱受缓存不一致的困扰。</p>
<p>缓存不一致的发生一般伴随着kube-apiserver的升级或重启。且当缓存不一致问题发生时,用户侧能够较为明显的感知,问题严重时会引发线上故障。而常见的故障有:</p>
<ul>
<li>平台数据不一致:Pod状态一会正常,一会不正常,并且来回跳动</li>
<li>服务管理事件丢失:服务变更时,服务管理未正常工作,如服务树未挂载、流量未接入等等</li>
</ul>
<p>在问题未定位之前,弹性云制定了诸多问题感知与及时止损策略:</p>
<ul>
<li>问题感知:<ul>
<li>人工:kube-apiserver升级或重启时,人工通知关联方也重启平台服务</li>
<li>智能:配置监控与报警策略,当一段时间内未收到k8s对象的变更事件时,发送告警信息</li>
</ul>
</li>
<li>及时止损:<ul>
<li>重启:缓存不一致问题发生时,重启服务,并从kube-apiserver全量拉取最新的数据</li>
<li>自愈:部分场景下,即使服务重启也不能完全恢复,添加自愈策略,主动感知并处理异常情况</li>
</ul>
</li>
</ul>
<p>问题感知与止损策略并没有真正意义上解决问题,而仅仅是在确定性场景下尝试恢复服务,并且伴随着更多异常场景的发现,策略也需同步调整。</p>
<h1 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h1><p>感知与止损是一种类似亡羊补牢的修复手段,显然,我们更希望的是一个彻底解决问题的方案。那么,我们先从引起缓存不一致的根因开始排查。</p>
<p>我们选择notifier来排查该问题,notifier是一个集群管理服务的控制器集合,其功能主要包含:</p>
<ul>
<li>服务树挂载</li>
<li>DNS注册</li>
<li>LVS摘接流等</li>
</ul>
<p>选择notifier的原因,在于其功能较为简单:notifier使用了client-go的informer,并对核心资源事件注册处理函数;此外也没有复杂的业务流程来干扰问题排查。</p>
<h2 id="问题复现"><a href="#问题复现" class="headerlink" title="问题复现"></a>问题复现</h2><p>我们在线下环境中进行测试,发现kube-apiserver服务重启后,问题能够稳定复现,这给我们排查问题带来了极大的便利。因此问题复现步骤如下:</p>
<ul>
<li>启动notifier服务</li>
<li>重启kube-apiserver服务</li>
</ul>
<h2 id="状态分析"><a href="#状态分析" class="headerlink" title="状态分析"></a>状态分析</h2><p>当问题发生时,我们首先对服务状态做一些基本检查:</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line"><span class="comment"># #服务存活状态</span></span><br><span class="line"><span class="comment"># ps -ef|grep notifier</span></span><br><span class="line">stupig 1007922 1003335 2 13:41 pts/1 00:00:08 ./notifier -c configs/notifier-test.toml</span><br><span class="line"> </span><br><span class="line"><span class="comment"># #服务FD打开状态</span></span><br><span class="line"><span class="comment"># lsof -nP -p 1007922</span></span><br><span class="line">COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME</span><br><span class="line">nobody 1007922 stupig 0u CHR 136,1 0t0 4 /dev/pts/1</span><br><span class="line">nobody 1007922 stupig 1u CHR 136,1 0t0 4 /dev/pts/1</span><br><span class="line">nobody 1007922 stupig 2u CHR 136,1 0t0 4 /dev/pts/1</span><br><span class="line">nobody 1007922 stupig 3u unix 0xffff8810a3132400 0t0 4254094659 socket</span><br><span class="line">nobody 1007922 stupig 4u a_inode 0,9 0 8548 [eventpoll]</span><br><span class="line">nobody 1007922 stupig 5r FIFO 0,8 0t0 4253939077 pipe</span><br><span class="line">nobody 1007922 stupig 6w FIFO 0,8 0t0 4253939077 pipe</span><br><span class="line">nobody 1007922 stupig 8u IPv4 4254094660 0t0 UDP *:37087</span><br><span class="line">nobody 1007922 stupig 9r CHR 1,9 0t0 2057 /dev/urandom</span><br><span class="line">nobody 1007922 stupig 10u IPv4 4253939079 0t0 TCP *:4397 (LISTEN)</span><br><span class="line">nobody 1007922 stupig 11u REG 8,17 12538653 8604570895 ./logs/notifier.stupig.log.INFO.20211127-134138.1007922</span><br><span class="line">nobody 1007922 stupig 15u IPv4 4254204931 0t0 TCP 127.0.0.1:43566->127.0.0.1:2479 (ESTABLISHED) <span class="comment"># ETCD</span></span><br><span class="line">nobody 1007922 stupig 19u REG 8,5 252384 821 /tmp/notifier.stupig.log.ERROR.20211127-134505.1007922</span><br><span class="line">nobody 1007922 stupig 20u REG 8,5 252384 822 /tmp/notifier.stupig.log.WARNING.20211127-134505.1007922</span><br><span class="line">nobody 1007922 stupig 21u REG 8,17 414436 8606917935 ./logs/notifier.stupig.log.WARNING.20211127-134139.1007922</span><br><span class="line">nobody 1007922 stupig 24u REG 8,17 290725 8606917936 ./logs/notifier.stupig.log.ERROR.20211127-134238.1007922</span><br><span class="line">nobody 1007922 stupig 30u REG 8,5 252384 823 /tmp/notifier.stupig.log.INFO.20211127-134505.1007922</span><br></pre></td></tr></table></figure>
<p>对比问题发生前的服务状态信息,我们发现一个严重的问题,notifier与kube-apiserver (服务地址:<a href="https://localhost:6443/">https://localhost:6443</a>) 建立的连接消失了。</p>
<p>因此,notifier与kube-apiserver的数据失去了同步,其后notifier也感知不到业务的变更事件,并最终丧失了对服务的管理能力。</p>
<h2 id="日志分析"><a href="#日志分析" class="headerlink" title="日志分析"></a>日志分析</h2><p>现在我们分析notifier的运行日志,重点关注kube-apiserver重启时,notifier打印的日志,其中关键日志信息如下:</p>
<figure class="highlight routeros"><table><tr><td class="code"><pre><span class="line">E1127 14:08:19.728515 1041482 reflector.go:251] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> watch *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">"https://127.0.0.1:6443/api/v1/endpoints?resourceVersion=276025109&timeoutSeconds=395&watch=true"</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:20.731407 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">"https://127.0.0.1:6443/api/v1/endpoints?limit=500&resourceVersion=0"</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:21.733509 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">"https://127.0.0.1:6443/api/v1/endpoints?limit=500&resourceVersion=0"</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:22.734679 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">"https://127.0.0.1:6443/api/v1/endpoints?limit=500&resourceVersion=0"</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br></pre></td></tr></table></figure>
<p>上面展示了关键的异常信息 <code>http2: no cached connection was available</code> ,而其关联的操作正是EndpointInformer的ListAndWatch操作。</p>
<p>这里我们已经掌握了关键线索,下一步,我们将结合代码分析定位根因。</p>
<h2 id="代码分析"><a href="#代码分析" class="headerlink" title="代码分析"></a>代码分析</h2><p>Informer的工作机制介绍不是本文重点,我们仅关注下面的代码片段:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// Run starts a watch and handles watch events. Will restart the watch if it is closed.</span></span><br><span class="line"><span class="comment">// Run will exit when stopCh is closed.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Reflector)</span> <span class="title">Run</span><span class="params">(stopCh <-<span class="keyword">chan</span> <span class="keyword">struct</span>{})</span></span> {</span><br><span class="line"> glog.V(<span class="number">3</span>).Infof(<span class="string">"Starting reflector %v (%s) from %s"</span>, r.expectedType, r.resyncPeriod, r.name)</span><br><span class="line"> wait.Until(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> {</span><br><span class="line"> <span class="keyword">if</span> err := r.ListAndWatch(stopCh); err != <span class="literal">nil</span> {</span><br><span class="line"> utilruntime.HandleError(err)</span><br><span class="line"> }</span><br><span class="line"> }, r.period, stopCh)</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>Informer的Reflector组件运行在一个独立的goroutine中,并循环调用ListAndWatch接收kube-apiserver的通知事件。</p>
<p>我们结合日志分析可得出结论:当kube-apiserver服务重启后,notifier服务的所有ListAndWatch操作都返回了 <code>http2: no cached connection was available</code> 错误。</p>
<p>因此,我们将关注的重点转移至该错误信息上。</p>
<p>通过代码检索,我们定位了该错误的定位及返回位置:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L301</span></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn = errors.New(<span class="string">"http2: no cached connection was available"</span>)</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/client_conn_pool.go:L55~80</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *clientConnPool)</span> <span class="title">getClientConn</span><span class="params">(req *http.Request, addr <span class="keyword">string</span>, dialOnMiss <span class="keyword">bool</span>)</span> <span class="params">(*ClientConn, error)</span></span> {</span><br><span class="line"> <span class="keyword">if</span> isConnectionCloseRequest(req) && dialOnMiss {</span><br><span class="line"> <span class="comment">// It gets its own connection.</span></span><br><span class="line"> <span class="keyword">const</span> singleUse = <span class="literal">true</span></span><br><span class="line"> cc, err := p.t.dialClientConn(addr, singleUse)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> cc, <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> p.mu.Lock()</span><br><span class="line"> <span class="keyword">for</span> _, cc := <span class="keyword">range</span> p.conns[addr] {</span><br><span class="line"> <span class="keyword">if</span> cc.CanTakeNewRequest() {</span><br><span class="line"> p.mu.Unlock()</span><br><span class="line"> <span class="keyword">return</span> cc, <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> !dialOnMiss {</span><br><span class="line"> p.mu.Unlock()</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, ErrNoCachedConn</span><br><span class="line"> }</span><br><span class="line"> call := p.getStartDialLocked(addr)</span><br><span class="line"> p.mu.Unlock()</span><br><span class="line"> <-call.done</span><br><span class="line"> <span class="keyword">return</span> call.res, call.err</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>上述代码返回 <code>ErrNoCachedConn</code> 的条件为:</p>
<ul>
<li>参数dialOnMiss值为false</li>
<li>p.conns连接池内没有可用连接</li>
</ul>
<p>理论上,在发送http请求时,如果连接池为空,则会先建立一个连接,然后发送请求;并且连接池能够自动剔除状态异常的连接。那么本文关注的问题有时如何发生的呢?</p>
<p>现在我们关注 <code>getClientConn</code> 方法的调用链,主要有二:</p>
<p>栈一:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"> <span class="number">0</span> <span class="number">0x0000000000a590b8</span> in notifier/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">55</span></span><br><span class="line"> <span class="number">1</span> <span class="number">0x0000000000a5aea6</span> in notifier/vendor/golang.org/x/net/http2.noDialClientConnPool.GetClientConn</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">255</span></span><br><span class="line"> <span class="number">2</span> <span class="number">0x0000000000a6c4f9</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">345</span></span><br><span class="line"> <span class="number">3</span> <span class="number">0x0000000000a6bd0e</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTrip</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">313</span></span><br><span class="line"> <span class="number">4</span> <span class="number">0x0000000000a5b97e</span> in notifier/vendor/golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/configure_transport.<span class="keyword">go</span>:<span class="number">75</span></span><br><span class="line"> <span class="number">5</span> <span class="number">0x0000000000828e45</span> in net/http.(*Transport).roundTrip</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/transport.<span class="keyword">go</span>:<span class="number">537</span></span><br><span class="line"> <span class="number">6</span> <span class="number">0x00000000008016de</span> in net/http.(*Transport).RoundTrip</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/roundtrip.<span class="keyword">go</span>:<span class="number">17</span></span><br><span class="line"> <span class="number">7</span> <span class="number">0x00000000016a1ef8</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport.(*userAgentRoundTripper).RoundTrip</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport/round_trippers.<span class="keyword">go</span>:<span class="number">162</span></span><br><span class="line"> <span class="number">8</span> <span class="number">0x00000000007a3aa2</span> in net/http.send</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line"> <span class="number">9</span> <span class="number">0x00000000007a324b</span> in net/http.(*Client).send</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">175</span></span><br><span class="line"><span class="number">10</span> <span class="number">0x00000000007a6ed5</span> in net/http.(*Client).do</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">717</span></span><br><span class="line"><span class="number">11</span> <span class="number">0x00000000007a5d9e</span> in net/http.(*Client).Do</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">585</span></span><br><span class="line"><span class="number">12</span> <span class="number">0x00000000016b9487</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).request</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">732</span></span><br><span class="line"><span class="number">13</span> <span class="number">0x00000000016b9f2d</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).Do</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">804</span></span><br><span class="line"><span class="number">14</span> <span class="number">0x00000000017093bb</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1.(*endpoints).List</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1/endpoints.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line">……</span><br></pre></td></tr></table></figure>
<p>栈二:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"> <span class="number">0</span> <span class="number">0x0000000000a590b8</span> in notifier/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">55</span></span><br><span class="line"> <span class="number">1</span> <span class="number">0x0000000000a5aea6</span> in notifier/vendor/golang.org/x/net/http2.noDialClientConnPool.GetClientConn</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">255</span></span><br><span class="line"> <span class="number">2</span> <span class="number">0x0000000000a6c4f9</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">345</span></span><br><span class="line"> <span class="number">3</span> <span class="number">0x0000000000a6bd0e</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTrip</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">313</span></span><br><span class="line"> <span class="number">4</span> <span class="number">0x00000000008296ed</span> in net/http.(*Transport).roundTrip</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/transport.<span class="keyword">go</span>:<span class="number">590</span></span><br><span class="line"> <span class="number">5</span> <span class="number">0x00000000008016de</span> in net/http.(*Transport).RoundTrip</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/roundtrip.<span class="keyword">go</span>:<span class="number">17</span></span><br><span class="line"> <span class="number">6</span> <span class="number">0x00000000016a1ef8</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport.(*userAgentRoundTripper).RoundTrip</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport/round_trippers.<span class="keyword">go</span>:<span class="number">162</span></span><br><span class="line"> <span class="number">7</span> <span class="number">0x00000000007a3aa2</span> in net/http.send</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line"> <span class="number">8</span> <span class="number">0x00000000007a324b</span> in net/http.(*Client).send</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">175</span></span><br><span class="line"> <span class="number">9</span> <span class="number">0x00000000007a6ed5</span> in net/http.(*Client).do</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">717</span></span><br><span class="line"><span class="number">10</span> <span class="number">0x00000000007a5d9e</span> in net/http.(*Client).Do</span><br><span class="line"> at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">585</span></span><br><span class="line"><span class="number">11</span> <span class="number">0x00000000016b9487</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).request</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">732</span></span><br><span class="line"><span class="number">12</span> <span class="number">0x00000000016b9f2d</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).Do</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">804</span></span><br><span class="line"><span class="number">13</span> <span class="number">0x00000000017093bb</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1.(*endpoints).List</span><br><span class="line"> at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1/endpoints.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line">……</span><br></pre></td></tr></table></figure>
<p>分别跟踪两个调用栈后,我们可以很快排除栈一的因素:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// file: net/http/transport.go:L502~620</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">roundTrip</span><span class="params">(req *Request)</span> <span class="params">(*Response, error)</span></span> {</span><br><span class="line"> <span class="keyword">if</span> altRT := t.alternateRoundTripper(req); altRT != <span class="literal">nil</span> { <span class="comment">// L537</span></span><br><span class="line"> <span class="keyword">if</span> resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {</span><br><span class="line"> <span class="keyword">return</span> resp, err</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">var</span> err error</span><br><span class="line"> req, err = rewindBody(req)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/configure_transport.go:L74~80</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rt noDialH2RoundTripper)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> {</span><br><span class="line"> res, err := rt.t.RoundTrip(req) <span class="comment">// L75</span></span><br><span class="line"> <span class="keyword">if</span> err == ErrNoCachedConn {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, http.ErrSkipAltProtocol</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> res, err</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L312~314</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> {</span><br><span class="line"> <span class="keyword">return</span> t.RoundTripOpt(req, RoundTripOpt{}) <span class="comment">// L313</span></span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L337~379</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTripOpt</span><span class="params">(req *http.Request, opt RoundTripOpt)</span> <span class="params">(*http.Response, error)</span></span> {</span><br><span class="line"> addr := authorityAddr(req.URL.Scheme, req.URL.Host)</span><br><span class="line"> <span class="keyword">for</span> retry := <span class="number">0</span>; ; retry++ {</span><br><span class="line"> cc, err := t.connPool().GetClientConn(req, addr) <span class="comment">// L345</span></span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> {</span><br><span class="line"> t.vlogf(<span class="string">"http2: Transport failed to get client conn for %s: %v"</span>, addr, err)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/client_conn_pool.go:L254~256</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p noDialClientConnPool)</span> <span class="title">GetClientConn</span><span class="params">(req *http.Request, addr <span class="keyword">string</span>)</span> <span class="params">(*ClientConn, error)</span></span> {</span><br><span class="line"> <span class="keyword">return</span> p.getClientConn(req, addr, noDialOnMiss) <span class="comment">// L255</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>栈一调用 <code>getClientConn</code> 返回了 <code>ErrNoCachedConn</code> 错误,并在 <code>noDialH2RoundTripper.RoundTrip</code> 函数中被替换为 <code>http.ErrSkipAltProtocol</code> 错误,返回 <code>roundTrip</code> 函数后继续执行余下流程,并进入栈二的流程。</p>
<p>因此我们重点关注栈二的流程:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// file: net/http/transport.go:L502~620</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">roundTrip</span><span class="params">(req *Request)</span> <span class="params">(*Response, error)</span></span> {</span><br><span class="line"> <span class="keyword">for</span> {</span><br><span class="line"> <span class="keyword">var</span> resp *Response</span><br><span class="line"> <span class="keyword">if</span> pconn.alt != <span class="literal">nil</span> {</span><br><span class="line"> <span class="comment">// HTTP/2 path.</span></span><br><span class="line"> t.setReqCanceler(cancelKey, <span class="literal">nil</span>) <span class="comment">// not cancelable with CancelRequest</span></span><br><span class="line"> resp, err = pconn.alt.RoundTrip(req) <span class="comment">// L590</span></span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> err == <span class="literal">nil</span> {</span><br><span class="line"> resp.Request = origReq</span><br><span class="line"> <span class="keyword">return</span> resp, <span class="literal">nil</span></span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// Failed. Clean up and determine whether to retry.</span></span><br><span class="line"> <span class="keyword">if</span> http2isNoCachedConnError(err) {</span><br><span class="line"> <span class="keyword">if</span> t.removeIdleConn(pconn) {</span><br><span class="line"> t.decConnsPerHost(pconn.cacheKey)</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> !pconn.shouldRetryRequest(req, err) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L312~314</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> {</span><br><span class="line"> <span class="keyword">return</span> t.RoundTripOpt(req, RoundTripOpt{}) <span class="comment">// L313</span></span><br><span class="line">}</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 内层调用栈同栈一,不再列出</span></span><br></pre></td></tr></table></figure>
<p>区别于栈一,栈二不再对返回错误做一个转换,而是直接返回了 <code>ErrNoCachedConn</code> 错误,并且 <code>roundTrip</code> 的错误处理流程中也特殊处理了本类错误。如果检测 <code>http2isnoCachedConnError</code> 返回true,则连接池会移除该异常连接。</p>
<p>一切都那么的合乎情理,那么问题是如何发生的呢?这里问题就发生在 <code>http2isnoCachedConnError</code>:</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// file: net/http/h2_bundle.go:L6922~6928</span></span><br><span class="line"><span class="comment">// isNoCachedConnError reports whether err is of type noCachedConnError</span></span><br><span class="line"><span class="comment">// or its equivalent renamed type in net/http2's h2_bundle.go. Both types</span></span><br><span class="line"><span class="comment">// may coexist in the same running program.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">http2isNoCachedConnError</span><span class="params">(err error)</span> <span class="title">bool</span></span> {</span><br><span class="line"> _, ok := err.(<span class="keyword">interface</span>{ IsHTTP2NoCachedConnError() })</span><br><span class="line"> <span class="keyword">return</span> ok</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>如果 <code>err</code> 对象实现了匿名接口 (仅定义了一个函数 <code>IsHTTP2NoCachedConnError</code>),那么返回true,否则返回false。</p>
<p>那么,<code>getClientConn</code> 返回的错误类型实现了该接口吗?很显然:没有。</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L301</span></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn = errors.New(<span class="string">"http2: no cached connection was available"</span>)</span><br></pre></td></tr></table></figure>
<p>至此,问题发生的原因已基本定位清楚。</p>
<h1 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h1><p>既然问题是由于 <code>getClientConn</code> 返回的错误类型 <code>ErrNoCachedConn</code> 没有实现 <code>IsHTTP2NoCachedConnError</code> 函数引起,那么其修复策略自然是:修改返回错误类型,并实现该接口函数。</p>
<p>注意,由于该部分代码是我们引用的外部代码库的内容,我们检查最新的 <code>golang.org/x/net</code> 代码发现,问题早在2018年1月份就已被修复。。。具体参见:<a href="https://github.com/golang/net/commit/ab555f366c4508dbe0802550b1b20c46c5c18aa0">golang.org/x/net修复方案</a>。</p>
<figure class="highlight go"><table><tr><td class="code"><pre><span class="line"><span class="comment">// noCachedConnError is the concrete type of ErrNoCachedConn, which</span></span><br><span class="line"><span class="comment">// needs to be detected by net/http regardless of whether it's its</span></span><br><span class="line"><span class="comment">// bundled version (in h2_bundle.go with a rewritten type name) or</span></span><br><span class="line"><span class="comment">// from a user's x/net/http2. As such, as it has a unique method name</span></span><br><span class="line"><span class="comment">// (IsHTTP2NoCachedConnError) that net/http sniffs for via func</span></span><br><span class="line"><span class="comment">// isNoCachedConnError.</span></span><br><span class="line"><span class="keyword">type</span> noCachedConnError <span class="keyword">struct</span>{}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(noCachedConnError)</span> <span class="title">IsHTTP2NoCachedConnError</span><span class="params">()</span></span> {}</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(noCachedConnError)</span> <span class="title">Error</span><span class="params">()</span> <span class="title">string</span></span> { <span class="keyword">return</span> <span class="string">"http2: no cached connection was available"</span> }</span><br><span class="line"></span><br><span class="line"><span class="comment">// isNoCachedConnError reports whether err is of type noCachedConnError</span></span><br><span class="line"><span class="comment">// or its equivalent renamed type in net/http2's h2_bundle.go. Both types</span></span><br><span class="line"><span class="comment">// may coexist in the same running program.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">isNoCachedConnError</span><span class="params">(err error)</span> <span class="title">bool</span></span> {</span><br><span class="line"> _, ok := err.(<span class="keyword">interface</span>{ IsHTTP2NoCachedConnError() })</span><br><span class="line"> <span class="keyword">return</span> ok</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn error = noCachedConnError{}</span><br></pre></td></tr></table></figure>
<p>而我们线上使用的版本仍然为:1c05540f6。</p>
<p>因此,我们的修复策略变得更为简单,升级vendor中的依赖库版本即可。</p>
<p>目前,线上notifier服务已升级依赖版本,全量上线所有机房。并且也已验证kube-apiserver重启,不会导致notifier服务异常。</p>
]]></content>
<categories>
<category>问题排查</category>
</categories>
<tags>
<tag>kubernetes</tag>
<tag>client-go</tag>
<tag>informer</tag>
</tags>
</entry>
</search>