Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neolink Memory Leak / Errors #286

Open
KSti56 opened this issue Jul 22, 2024 · 13 comments
Open

Neolink Memory Leak / Errors #286

KSti56 opened this issue Jul 22, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@KSti56
Copy link

KSti56 commented Jul 22, 2024

Describe the bug
This software is really amazing, but unfortunately I've been having a lot of issues with it, specifically with memory leaks. Memory usage is at 3.3G after just an hour. I have read a couple of the issues pertaining to memory leaks (like this one) and it seems like the main recommendation was to update. I updated a couple weeks ago (currently on b37b3ee release). Unfortunately, memory leaks continue and the logs are full of errors. I'm no expert at this stuff, so any help would be greatly appreciated.

Additionally, I checked today and saw a bunch of these errors:

gst_poll_read_control: assertion 'set != NULL' failed
INFO  neolink::rtsp::stream] Buffer full on audsrc
INFO  neolink::rtsp::stream] Buffer full on vidsrc

To Reproduce
No specific steps to reproduce. I believe it has been happening since installation.

Expected behavior
No memory leaks (I would expect maybe 200-300MB of usage?).

Versions
NVR software: BlueIris v5.9.3.4
Neolink software: b37b3ee
Reolink camera model and firmware: 3 B800s -- Firmware on 2 of them is v3.0.0.82_20080600 and the other one is v3.0.0.183_21012800

Service Info

@KSti56 KSti56 added the bug Something isn't working label Jul 22, 2024
@KSti56 KSti56 changed the title Neolink Issues / Errors Neolink Memory Leak / Errors Jul 25, 2024
@QuantumEntangledAndy
Copy link
Owner

QuantumEntangledAndy commented Aug 3, 2024

Well aware of this issue, (multiple open issues too) I've used valgrind, massif, and all sorts of other programs to track it. Can't seem to squash this one

Also Buffer full on audsrc means that your client (VLC or whatever) is not pulling the frames and the gstreamer buffer is full, this is mostly harmless as it just means we stop sending frames to gstraemer for a while.

The program kind of works like this

  • Neolink
    • Gets frames from camera
    • Buffers for reorganising packets (maximum of about 200 frames)
    • Buffers for paused playback (15s of frames)
    • Hand frames to gstreamer
  • Gstreamer
    • Gets frames from neolink
    • Has its own buffer to hold the frames
    • Waits for right time/request from rtsp client to deliver frames

Most of the apps memory allocations happen in the gstreamer part which I don't have direct control over. I'll keep looking when I can but nothing seems to be obvious

@KSti56
Copy link
Author

KSti56 commented Aug 3, 2024

Thanks for the reply! I really appreciate your work on this project. Sorry about the duplicate issue, I didn't see any other ones relating to this exact issue (but now that I went through the older issues, I see what you are referring to). I'm glad to hear that you're aware of the issue and trying to fix it, I know it's probably frustrating that you can't find the root cause. I'll keep monitoring issues with hopes of an eventual fix.

As for the Buffer full on audsrc, I'm using BlueIris. Is there a certain configuration parameter that I can adjust to fix this? I have noticed that the video feed (and therefore recordings) cut out somewhat often, which I assume is related to this.

And as for the memory leak issue, I assume the only temporary fixes are just either allocating more memory and/or setting up automatic restarts every x hours?

Thanks again for the help and work on this project! Let me know if there is any information that would be helpful for the debugging/troubleshooting process.

@RutgerDiehard
Copy link

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

@Pytonballoon810
Copy link

For those using neolink in a docker configuration here is a docker compose work around for the memory leak:

...
  neolink:
    container_name: neolink
    restart: always
    deploy:
      resources:
        limits:
          memory: 512M
    image: quantumentangledandy/neolink
...

also i came up with this quick bash script for restarting the container when the process itself takes up too much memory:

#!/bin/bash

# The name of the Docker container
CONTAINER_NAME="neolink"

# Define the memory threshold (in percentage)
THRESHOLD=20

# Get the total system memory in kilobytes
TOTAL_MEM=$(grep MemTotal /proc/meminfo | awk '{print $2}')

# Function to calculate the total memory usage of processes with the word "neolink"
get_neolink_mem_usage() {
    ps aux | grep -i "neolink" | grep -v "grep" | awk '{mem+=$6} END {print mem}'
}

# Get the current memory usage of neolink processes
NEOLINK_MEM=$(get_neolink_mem_usage)

# Convert the memory usage to percentage
MEM_PERCENT=$(echo "$NEOLINK_MEM $TOTAL_MEM" | awk '{printf "%.2f", ($1/$2)*100}')

# Restart the Docker container if memory usage exceeds the threshold
if (( $(echo "$MEM_PERCENT > $THRESHOLD" | bc -l) )); then
    echo "Memory usage of neolink processes is $MEM_PERCENT%, which exceeds the threshold of $THRESHOLD%"
    echo "Restarting the Docker container $CONTAINER_NAME..."
    docker restart $CONTAINER_NAME
else
    echo "Memory usage of neolink processes is $MEM_PERCENT%, which is within the safe limit."
fi

You could probably use a chron job to run this script every minute and that should also work.

@KSti56
Copy link
Author

KSti56 commented Sep 20, 2024

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

I switched to this Docker version, and while it did seem to be running pretty smoothly, it eventually crashed after 7-8 hours (I think it was a memory leak issue, but I wasn't monitoring it). Interestingly, I also received these errors right when the recording stopped. I haven't seen these messages before, so if anyone has any ideas of what they mean, I would appreciate the help.

neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel
neolink_core::bc_protocol::connection::bcconn] Remaining: 0 of 100 message space for 4 (ID: 3)
neolink::rtsp] cam19: Join Pause
neolink::rtsp] cam19: Retryable error: Timed out waiting to send Media Frame (Caused by: deadline has elapsed)

@RutgerDiehard
Copy link

RutgerDiehard commented Oct 2, 2024

I've seen memory leaks when running with 2 x RLC-CX810. If it helps troubleshooting, I've reverted back to image: quantumentangledandy/neolink:v0.5.17 which has been running consistently at about 120MB for several days now. So it would suggest changes after this version has caused the leak. I didn't test every image between but this was the first one that worked consistently.

I switched to this Docker version, and while it did seem to be running pretty smoothly, it eventually crashed after 7-8 hours (I think it was a memory leak issue, but I wasn't monitoring it). Interestingly, I also received these errors right when the recording stopped. I haven't seen these messages before, so if anyone has any ideas of what they mean, I would appreciate the help.

neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel
neolink_core::bc_protocol::connection::bcconn] Remaining: 0 of 100 message space for 4 (ID: 3)
neolink::rtsp] cam19: Join Pause
neolink::rtsp] cam19: Retryable error: Timed out waiting to send Media Frame (Caused by: deadline has elapsed)

Although quantumentangledandy/neolink:0.5.17 ran with no memory issues and I didn't have to babysit it, I wasn't able to run Frigate with clean logs. I would get regular stream errors and FFMPEG crashes from multiple cameras. So, I set about testing new versions of neolink to see if it would solve the Frigate errors. quantumentangledandy/neolink:0.5.18 spammed the container logs with messages to do with encryption, so I tried quantumentangledandy/neolink:0.6.0. This, for the last 12 hours, has been perfectly stable, Frigate shows no stream errors and no FFMPEG crashes, and it runs with the resource shown below. This is with 4 x 4K cameras (2 x Reolink RLC-811a, 2 x RLC-CX810) and a RLC-410 via a Reolink NVR.
image

I have limited the resources available to neolink and Frigate in the Portainer stack (which runs on bare-metal Ubuntu Server 24.04) just so they don't impact other containers I use on the system. So far, I'm very happy how it's running.

@jmoney7823956789378
Copy link

Testing with 0.5.17, 0.6.0, 0.6.3rc2, using 15x D800 Reolink cameras:
image

Hosted on a generic ubuntu22.02 LXC on proxmox
Forced to use a watchdog process to terminate and restart the service at 75% total memory consumption

@federicotravaini
Copy link

Adding myself to the users who have this buffer problem. I run neolink with 5 cameras (B800 and D800) and can't get it to work properly with frigate. Main reason why I want neolink is that NVR doesn't provide Main stream for cameras via rtsp/https

@cincodenada
Copy link

cincodenada commented Dec 29, 2024

Edit: I did some more digging and realized that my PR just reverts an attempted fix from August 17 (so, after 0.6.3rc2). So it's unlikely to fix the underlying issues above, but if someone here (@federicotravaini?) happens to be running master, it might at least improve things.

@wizmo2
Copy link

wizmo2 commented Jan 2, 2025

Having similar issues with a standard debian docker install on a RPI4.
image

My Lumus (firmware v2.0.0.705) is remotely mounted. Wifi is poor. Resource usage is pretty stable with good connection, but will eat memory and CPU if the camera repeatably disconnects, or the user account gets locked,

Testing out an automation in HA to remotely restarting the docker, triggered when memory usage gets too high.

Would try to help, but loos like this is a known issue and my rust is "rusty" ;)

Update 1/28/25: Noticed that network traffic is also increacing. May indicate that failed session are still active and continue receiving data.

@keithkmyers
Copy link

@QuantumEntangledAndy Andy -- could there be some sort of band-aid applied for this until the leak is resolved? Or, come at it from a different angle?

For example --

  • internally manage gstreamer subprocess instances. Kill them and transition to new ones in a way that doesn't end user sessions.
  • Or, switch from gstreamer to go2RTC or another solution

This is a great project, but this problem is a show stopper. Countless hours of good work you and others have put into this project, to have this one thing make it nearly unusable! I mean, look at the graph jmoney shared a few posts above. He's having to restart the process every hour. I have a server with 512GB of RAM and Neolink will consume darn near all of it within 24 hours. It's real serious!!

@QuantumEntangledAndy
Copy link
Owner

go2RTC will not work in the way that you seem to suggest. 1. It's written in go, so not directly integrate with this rust project, 2. it still requires the source to be in a somewhat usable format rtsp or rtp or something so gstreamer would still be required to handle it.

Internally managing it is also not something I'd want to integrate, there are dedicated watchdog programs for such things

@mrspouse
Copy link

mrspouse commented Feb 1, 2025

I've been having similar problems with a Proxmox LXC docker container connected to one Lumus camera. Reading the comment by @RutgerDiehard above I thought I would give v0.6.0 a go. It's been running continuously for 12 hours with consistent output throughout. With the newer versions it was getting to 1GB within a couple of hours and continuing to rise.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants