Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influxdb restaring #357

Open
przemo opened this issue Nov 6, 2024 · 7 comments
Open

Influxdb restaring #357

przemo opened this issue Nov 6, 2024 · 7 comments

Comments

@przemo
Copy link

przemo commented Nov 6, 2024

Problem/Motivation

The influxDB engine is in stop/start loop - increase in CPU usage.
I changed the logs to TRACE however I am unable to infer what would be the issue.
It was working flawlessly for last couple years.

HA runs on Proxmox VM (OVA). The allocated storage is used at 70%.
I want to avoid removing the addon because I don't want to lose the data.

Expected behavior

The opposite of the above.

Actual behavior

        end
[08:13:59] INFO: Starting the InfluxDB...
[08:14:04] INFO: InfluxDB stopped, restarting...
[08:14:05] TRACE: bashio::config: envvars|keys
[08:14:05] TRACE: bashio::addon.config
[08:14:05] TRACE: bashio::cache.exists: addons.self.options.config
[08:14:05] TRACE: bashio::fs.file_exists: /tmp/.bashio/addons.self.options.config.cache
[08:14:05] TRACE: bashio::cache.get: addons.self.options.config
[08:14:05] TRACE: bashio::cache.exists: addons.self.options.config
[08:14:05] TRACE: bashio::fs.file_exists: /tmp/.bashio/addons.self.options.config.cache
[08:14:05] TRACE: bashio::jq: {"auth":false,"reporting":true,"ssl":false,"certfile":"fullchain.pem","keyfile":"privkey.pem","envvars":[],"log_level":"trace"} if (.envvars|keys == null) then
            null
        elif (.envvars|keys | type == "string") then
            .envvars|keys // empty
        elif (.envvars|keys | type == "boolean") then
            .envvars|keys // false
        elif (.envvars|keys | type == "array") then
            if (.envvars|keys == []) then
                empty
            else
                .envvars|keys[]
            end
        elif (.envvars|keys | type == "object") then
            if (.envvars|keys == {}) then
                empty
            else
                .envvars|keys
            end
        else
            .envvars|keys
        end
[08:14:05] INFO: Starting the InfluxDB...
[08:14:11] INFO: InfluxDB stopped, restarting...
[08:14:12] TRACE: bashio::config: envvars|keys
[08:14:12] TRACE: bashio::addon.config
[08:14:12] TRACE: bashio::cache.exists: addons.self.options.config
[08:14:12] TRACE: bashio::fs.file_exists: /tmp/.bashio/addons.self.options.config.cache
[08:14:12] TRACE: bashio::cache.get: addons.self.options.config
[08:14:12] TRACE: bashio::cache.exists: addons.self.options.config
[08:14:12] TRACE: bashio::fs.file_exists: /tmp/.bashio/addons.self.options.config.cache
[08:14:12] TRACE: bashio::jq: {"auth":false,"reporting":true,"ssl":false,"certfile":"fullchain.pem","keyfile":"privkey.pem","envvars":[],"log_level":"trace"} if (.envvars|keys == null) then
            null
        elif (.envvars|keys | type == "string") then
            .envvars|keys // empty
        elif (.envvars|keys | type == "boolean") then
            .envvars|keys // false
        elif (.envvars|keys | type == "array") then
            if (.envvars|keys == []) then
                empty
            else
                .envvars|keys[]
            end
        elif (.envvars|keys | type == "object") then
            if (.envvars|keys == {}) then
                empty
            else
                .envvars|keys
            end
        else
            .envvars|keys
        end
[08:14:12] INFO: Starting the InfluxDB...

Steps to reproduce

No idea.

Proposed changes

@rjodwyer
Copy link

same issue here.

hassio image on bare metal.

tried "ha core rebuild" with no luck.

@HeedfulCrayon
Copy link

I just barely encountered the same issue. Causing some major problems with HA because I have some sensor queries that are no longer working either

@lubarb
Copy link

lubarb commented Dec 13, 2024

Similar issue (?) on Home Assistant Yellow

In the Influxdb add-on log I am continuously seeing things like:
`[15:25:28] INFO: InfluxDB stopped, restarting...
ts=2024-12-13T15:25:29.343-05:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get "http://localhost:8086/ping\": dial tcp [::1]:8086: connect: connection refused"

[15:25:29] INFO: Starting the InfluxDB...
[15:25:48] INFO: InfluxDB stopped, restarting...
[15:25:49] INFO: Starting the InfluxDB...
ts=2024-12-13T15:25:58.572-05:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get "http://localhost:8086/ping\": dial tcp [::1]:8086: connect: connection refused"
[15:26:08] INFO: InfluxDB stopped, restarting...
[15:26:10] INFO: Starting the InfluxDB...
[15:26:30] INFO: InfluxDB stopped, restarting...
[15:26:31] INFO: Starting the InfluxDB...
ts=2024-12-13T15:26:43.607-05:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="Get "http://localhost:8086/ping\": dial tcp [::1]:8086: connect: connection refused"
[15:26:53] INFO: InfluxDB stopped, restarting...
[15:26:54] INFO: Starting the InfluxDB...`

I can't log items to Influxdb or access data stored to it. Restarting Home Assistant sometimes does and sometimes does not fix it.

Homeassistant configuration.yaml has
`influxdb:

host: 192.168.1.25
port: 8086
database: homeassistant
username: homeassistant
password: xyzzy1234
max_retries: 3
default_measurement: state`

The add-on configuration/yaml page has:
`auth: true

reporting: true
ssl: true
certfile: fullchain.pem
keyfile: privkey.pem
envvars: []
`

@jackik1410
Copy link

Same thing happening here.
I suspect a different HA component interfering somehow or such, considering that the only changes to this addon received were just version bumps.

@ghulleman
Copy link

Same here... after the 2025.1 update of HA. Did not install anny additional addons, updated some... but nothing noteworthy in the changelogs for those addons.

@rjodwyer
Copy link

added some code to output the influxdb processes debug code(from inside container: sed -i 's/level="warn"/level="debug"/' /etc/influxdb/influxdb.conf) and get nothing useful, it seems to be a timeout that the process doesnt start fast enough so something(watchdog is disabled in HA) stops the influxdb process and then we start all over again, never being able to succesfully start as there isnt enough time.

output from the logs:
ts=2025-01-12T05:23:53.935376Z lvl=info msg="Opened shard" log_id=0u2FKJK0000 service=store trace_id=0u2FKa3l000 op_name=tsdb_open index_version=inmem path=/data/influxdb/data/homeassistant/autogen/1573 duration=819.578ms
ts=2025-01-12T05:23:53.945275Z lvl=info msg="Opened file" log_id=0u2FKJK0000 engine=tsm1 service=filestore path=/data/influxdb/data/homeassistant/autogen/1528/000000067-000000002.tsm id=0 duration=145.535ms
ts=2025-01-12T05:23:53.959687Z lvl=info msg="Opened shard" log_id=0u2FKJK0000 service=store trace_id=0u2FKa3l000 op_name=tsdb_open index_version=inmem path=/data/influxdb/data/homeassistant/autogen/1564 duration=962.834ms
[16:23:54] INFO: InfluxDB stopped, restarting...
[16:23:54] INFO: InfluxDB exited with code 256
[16:23:55] TRACE: bashio::config: envvars|keys

@ghulleman
Copy link

ghulleman commented Jan 12, 2025

Did find something in the home assistant logs;

Logger: homeassistant.components.influxdb
Bron: components/influxdb/init.py:578
integratie: InfluxDB (documentatie, problemen)
Eerst voorgekomen: 8 januari 2025 om 17:47:15 (3 gebeurtenissen)
Laatst gelogd: 9 januari 2025 om 15:30:22

Resumed, lost 8246 events.
Resumed, lost 5933 events.
Resumed, lost 22571 events.

Could it be that the influx db is in some kind of 'restoring' action during startup? As rjodwyer suggests the process might not start fast enough? I did not make any changes on the past months in influx db setup. Everything was running fine. Only thing mayor was the update to 2025.01. But if that was the cause, I do believe more people should have issues.

I am going to try and get more into the docker containers, how I can access this in ha (I an running hassos on a RB5) and going to restore a backup prior to the first messages in a test rig.

Update - after enabeling SSH root access to get to the docker container, the addon suddenly started. I did have to reboot for the SSH keys to be added, but I allready rebooted the entire system earlier with no succes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants