Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] The service suddenly **hangs** and becomes unresponsive. #968

Open
rambo-panda opened this issue Dec 27, 2024 · 1 comment
Open

Comments

@rambo-panda
Copy link
Contributor

rambo-panda commented Dec 27, 2024

Recently, I deployed a Node.js service in a Docker container, and there are occasional instances of the service becoming unresponsive. Since I'm not very familiar with Skia, I'm not sure if it's a Skia issue. So I'm asking everyone for ideas on how to approach this(Willing to pay for advice.).

  • Phenomenon: The service does not accept any signals. The service is in an Ssl state. It seems that the internal code of the service is in a "Stop the world" state, and its internal code execution is suspended (such as zookeeper heartbeat not sent).

  • Suspicious business process

    1. Cache images (because most of the images are the same, so cache reuse) [the biggest suspect in my option]
       const cache = new Map();
       const getImg = async (url) => {
         if (cache.has(url)) {
           const ret =  cache.get(url);
           ret._timer();
           return ret;
         }
       
         cache.set(
           url,
           get(url)
             .then(loadImage)
             .then((img) => {
               // This is a unified encapsulation, similar to a debounce algorithm.
               let _timer = () => {};
               img._timer = () => {
                  _timer = clearTimeout.bind(
                     null,
                     setTimeout(cache.delete.bind(cache, url), lts).unref()
                   );
               };
               cache.set(url, img);
               return img;
             }),
         );
       
         return getImg(url);
       };
      I suspect that during the deletion of lts, the img in drawImage(img) is being garbage collected, which leads to the mutex lock waiting. or a private method(_timer) is bound to img.
    2. Because there are many lines to be drawn, I borrowed the concept of fibers and enforced waiting.
           for (let i = 0; i < dots.length; i++) {
              if (i % 7_000 === 0) {
                await sleep(2);
              }
              ctx.lineTo(...dots[i]);
            }
  • env info

    System:
    OS: Linux 3.10 Ubuntu 22.04.1 LTS 22.04.1 LTS (Jammy Jellyfish)
    CPU: (40) x64 Intel Xeon Processor (Cascadelake)
    Memory: 60.21 GB / 78.66 GB
    Container: Yes
    Shell: 5.1.16 - /bin/bash
    Glibc : (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
    Binaries:
    Node: 20.16.0 - /usr/bin/node
    npm: 10.8.1 - /usr/bin/npm
    
     @napi-rs/[email protected]
    
  • Call stack:

    cat /proc/14972/stack
    [<ffffffffaa90cfa6>] futex_wait_queue_me+0xc6/0x130
    [<ffffffffaa90dc8b>] futex_wait+0x17b/0x280
    [<ffffffffaa90f9d6>] do_futex+0x106/0x5a0
    [<ffffffffaa90fef0>] SyS_futex+0x80/0x190
    [<ffffffffaaf75d9b>] system_call_fastpath+0x22/0x27
    [<ffffffffffffffff>] 0xffffffffffffffff
    
  • gdb stack:

    (gdb) bt
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    
    (gdb) thread apply all bt
    
    Thread 11 (LWP 14982 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 10 (LWP 14981 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 9 (LWP 14980 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 8 (LWP 14979 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 7 (LWP 14978 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 6 (LWP 14977 "node"):
    #0  0x00007fe0b14da059 in __GI___pthread_attr_copy (target=0x690a550, source=0x690a548) at ./nptl/pthread_attr_copy.c:47
    #1  0x0000000000000002 in ?? ()
    #2  0x00003d6343705391 in ?? ()
    #3  0x00007fe0aaffbdb0 in ?? ()
    #4  0x00000000013b0dd3 in v8::internal::FindClosestElementsTransition(v8::internal::Isolate*, v8::internal::Map, v8::internal::ElementsKind, v8::internal::ConcurrencyMode) ()
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)
    
    Thread 5 (LWP 14976 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 4 (LWP 14975 "node"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
    Thread 3 (LWP 14974 "node"):
    #0  0x00007fe0b14da059 in __GI___pthread_attr_copy (target=0x690a550, source=0x690a548) at ./nptl/pthread_attr_copy.c:47
    #1  0x0000000000000002 in ?? ()
    #2  0x00003d6343705391 in ?? ()
    #3  0x00007fe0b0c40d80 in ?? ()
    #4  0x00000000013b0dd3 in v8::internal::FindClosestElementsTransition(v8::internal::Isolate*, v8::internal::Map, v8::internal::ElementsKind, v8::internal::ConcurrencyMode) ()
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)
    
    Thread 2 (LWP 14973 "node"):
    #0  0x00007fe0b156ed18 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x0000000ab143efc0 in ?? ()
    #2  0x00007fe0b143fc70 in ?? ()
    #3  0xffffffff00000400 in ?? ()
    #4  0x0000000000000000 in ?? ()
    
    Thread 1 (LWP 14972 "node /var/www/w"):
    #0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
    #1  0x0000000000000000 in ?? ()
    
@valamidev
Copy link

This code snippet looks very sus, but I can recommend in case using this lib a lot in back-end having the service restarted time to time, it still has multiple memory-leak.

const cache = new Map();
 const getImg = async (url) => {
   if (cache.has(url)) {
     const ret =  cache.get(url);
     ret._timer();
     return ret;
   }
 
   cache.set(
     url,
     get(url)
       .then(loadImage)
       .then((img) => {
         // This is a unified encapsulation, similar to a debounce algorithm.
         let _timer = () => {};
         img._timer = () => {
            _timer = clearTimeout.bind(
               null,
               setTimeout(cache.delete.bind(cache, url), lts).unref()
             );
         };
         cache.set(url, img);
         return img;
       }),
   );
 
   return getImg(url);
 };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants