Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics component causes significant slowdown #132597

Closed
dominikandreas opened this issue Dec 8, 2024 · 25 comments
Closed

Statistics component causes significant slowdown #132597

dominikandreas opened this issue Dec 8, 2024 · 25 comments

Comments

@dominikandreas
Copy link

dominikandreas commented Dec 8, 2024

The problem

The _stat_average_linear function in the statistics component causes high CPU load usage when source sensors update frequently. Other functions from the component are likely to be affected as well (I only used _stat_average_linear)

What version of Home Assistant Core has the issue?

likely independent of any particular version

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Statistics

Link to integration documentation on our website

https://www.home-assistant.io/integrations/statistics/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

I have a few sensors that update very frequently (e.g. current grid power at ~10Hz) for which I use an average_linear statistics sensor. Here's an example taken from .storage/core.config_entries:

{
  "created_at": "2024-10-22T11:19:49.252106+00:00",
  "data": {},
  "disabled_by": null,
  "discovery_keys": {},
  "domain": "statistics",
  "entry_id": "01JASZ8Q84GPBRK2W2NB180JZ1",
  "minor_version": 1,
  "modified_at": "2024-10-22T11:19:49.252115+00:00",
  "options": {
    "entity_id": "sensor.ac_power",
    "keep_last_sample": false,
    "max_age": {
      "hours": 0,
      "minutes": 5,
      "seconds": 0
    },
    "name": "Grid 5m Average",
    "percentile": 50,
    "precision": 2,
    "state_characteristic": "average_linear"
  },
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "user",
  "title": "Grid 5m Average",
  "unique_id": null,
  "version": 1
}

The issue seems to be related to these lines in the code:

https://github.com/home-assistant/core/blob/dev/homeassistant/components/statistics/sensor.py#L797-L809

When my source sensor started updating more frequently (switched from wifi to lan to connect to the source), home assistant became barely useable and it took me a long time to figure out what the issue was. I thought at first it was related to a version upgrade of home assistant which happened at at the same time, but everything runs smoothly again after disabling the statistics entities.

In particular, it seems to spend a lot of time within total_seconds. I wonder if a simple fix would be to keep track of timestamps and not datetime objects, but maybe also limit the updates to a certain frequency.

image

Apart from the marked line, also notice the number of calls to total_seconds of datetime.timedelta at the bottom. Since the profiler ran for 65 seconds, that's ~57k total_seconds calls per second!

@home-assistant
Copy link

home-assistant bot commented Dec 8, 2024

Hey there @ThomDietrich, mind taking a look at this issue as it has been labeled with an integration (statistics) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of statistics can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign statistics Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


statistics documentation
statistics source
(message by IssueLinks)

@vukisz
Copy link

vukisz commented Dec 9, 2024

@dominikandreas I think I have same issue. Updated to 2024.12.1 on Sunday. Then noticed, that host - Intel NUC fans kicks in regularly. Added CPU sensor in HA and can see, that CPU spikes from 2% usage to 7-8% usage every 5 minutes.
Tried profiler with qcachegrind but haven't noticed anything in particular. Which app do you use for profiler parsing?
Mine troubleshooting which lead me to this topic: disabled devices on by one and noticed, that disabling shelly device my issue got resolved. But I don't have integral sensors from shelly sensor entities. But I do use them in template sensor calculations

image

P.S. my initial topic with initial troubleshooting: https://community.home-assistant.io/t/small-cpu-spikings-after-os-14-0-and-core-2024-12-01/807509/3

@dominikandreas
Copy link
Author

dominikandreas commented Dec 9, 2024

@vukisz I used the profiler integration to generate a cprof file (this gets stored in your config directory) with the profiler start action (I used 60 seconds as the duration), then copied it over to my local machine and used runsnakerun (pip install runsnakerun) for visualization (there are other alternatives as well, see documentation link)

Btw your spikes could also be caused by recent changes of the supervisor, also causing high CPU load:
home-assistant/supervisor#5425

@bdraco
Copy link
Member

bdraco commented Dec 9, 2024

In particular, it seems to spend a lot of time within total_seconds. I wonder if a simple fix would be to keep track of timestamps and not datetime objects, but maybe also limit the updates to a certain frequency.

If someone wants to finish #110438 I suspect that will solve the issue

@dominikandreas
Copy link
Author

@bdraco I wasn't aware of this PR, thanks! To me this looks good, pretty much what I had in mind as well. Not sure if I would actually change anything, probably makes sense to just reopen the PR and maybe add some minor comments / naming changes to make @ThomDietrich happy :)

@bdraco
Copy link
Member

bdraco commented Dec 9, 2024

@dominikandreas Feel free to start a fresh PR with that as a base commit and adjust as needed

@gjohansson-ST
Copy link
Member

I started looking at this yesterday given this issue, so I started with #132648 to move it around a bit first.
Let me know if you intend to address this issue with a PR or not 👍

@dominikandreas
Copy link
Author

@gjohansson-ST thanks!
I have very limited time at the moment and I'm happy with whatever gets the job done. I will probably not get around to implementing something until after Christmas. The PR above doesn't seem to address the issue with storing timestamps instead of datetime objects in self.ages (timedelta.total_seconds was the main overhead in my analysis), so I'm not sure how it improves performance, but I only had a quick glance and probably missed something.

@gjohansson-ST
Copy link
Member

Yeah, my PR was only a prep to move the methods outside of the class (very minor). I will look into the substitution of datetime to timestamp once that has been merged in.

@gjohansson-ST
Copy link
Member

The relevant code has been merged so either if you can test your setup using dev code as a custom component or we leave the issue open until 2025.1
Would be interesting to see a new profile after

@vukisz
Copy link

vukisz commented Dec 11, 2024

@gjohansson-ST I would really like to test it and provide profiler results. But I have never tried testing unreleased code as custom component :-)
How can I do that?

@gjohansson-ST
Copy link
Member

Simplest is to run a script which @bdraco has thankfully created.
Open a terminal to your Home Assistant and go to the /config directory.
Then run this which will download and create the statistics integration as a custom component on your system:
curl -o- -sSL https://gist.githubusercontent.com/bdraco/43f8043cb04b9838383fd71353e99b18/raw/core_integration_pr | bash /dev/stdin -d statistics -p 132746
Then restart Home Assistant and it should run statistics including those merged changes.

@vukisz
Copy link

vukisz commented Dec 11, 2024

Thanks @gjohansson-ST
Did that from terminal add on. Created statistics custom component. Restarted. Waited till 22:05 mark (local time) and still happened :-(
Attaching profiler results and SS of a spike.
Also worth mentioning, that got another user in HA community. He also has slowdowns at 5 minute mark and he claims, that it started when he did update from 2024.12.0b1 to 2024.12.0b2. So maybe it's something to do with 2024.12.0b2 changes? Source https://community.home-assistant.io/t/small-cpu-spikings-after-os-14-0-and-core-2024-12-01/807509/12

image
profile.1733947490124625.zip

@bdraco
Copy link
Member

bdraco commented Dec 11, 2024

Either that profile was for a long time (more than the 60s default) or there is a massive number of websocket commands being sent from a client

websocket

@bdraco
Copy link
Member

bdraco commented Dec 11, 2024

You can check by opening inspect in your browser and looking at the websocket. You may need to reload for it to appear

@vukisz
Copy link

vukisz commented Dec 12, 2024

Hi. Thanks for support @bdraco I don't know what was those websocket calls. Can it be google assistant? Ran another profiler round and it did not produce those _async_websocket (http.py) commands. I can provide new profile, but I don't want to overwhelm with too much data.
I did another interesting exercise: started new separate HA instance (latest 2024.12.2). Added shelly 3em (energy tracking), added solax integration (solar consumption), configured energy grid and that's all. No CPU spiking every 5 minutes. But then I thought that my DB is quite big (~7GB). So also copied DB. And then it started happening. Not as high spikings, but still goes from 0% to 3% for 30 sec every 5 minutes. Worth investigating further, or it is considered as negligible cpu usage increase?
image

EDIT: just found out, that I can downgrade core. So I did downgraded new testing instance to 2024.11.3 and CPU spikes disappeared. Then tried: 2024.12.0b1 (no cpu spikes) and 2024.12.0b2 cpu started spiking:
image
I can now provide any required profiles, or destroy that test system in various ways helping to resolve this.
Also downgraded my live to 2024.11.3 no more fan! :-)

@gjohansson-ST
Copy link
Member

In your previous profile there wasn't a single call from statistics so I think it would be good to get a new one (or two with spike).
But I'm anyhow leaning towards that this isn't a problem of statistics at all but something else.

@vukisz
Copy link

vukisz commented Dec 12, 2024

No problems from my side @gjohansson-ST which way you prefer:
Two 1 minute profile sessions with one recorded CPU spike of /5 minute of an hour
One longer 10-11 minute profile session with two CPU spikes in one recorded session

Which core version do you prefer for that? Latest or 2024.12.0b2? If latest then with yesterday's PR applied or not?

I can even do 2024.12.0b1 vs 2024.12.0b2 :-)

@gjohansson-ST
Copy link
Member

Maybe to start with, including yesterday's PR applied, one with and one w/o spike.

It's not written here but are these based on shelly? And are you using the native integration, MQTT or other (I believe I saw tasmota)?

@vukisz
Copy link

vukisz commented Dec 12, 2024

O.k. here you go @gjohansson-ST one 2024.12.2 (with new config\custom_components\statistics) one 60 sec calm period and one 60 sec period with spikes:
2024.12.2.profilerResults.zip
Let me know if you need me to try with different Core versions.
As for MQTT or other devices, it's a fresh HA install without any additional integrations. Only shelly integration, file editor and samba share plugins + 7GB database from live instance. Other then that - 0 helpers, automations or any other additions.

@gjohansson-ST
Copy link
Member

I will look into it a bit later but then we can rule out statistics as the cause as you're simply not using it in your test instance.

@vukisz
Copy link

vukisz commented Dec 12, 2024

You are right. Probably I hijacked wrong issue then :-(
It's not statistics. Probably something with a recorder as it's affected only with noisy sensor being active and my whole 7 GB DB present.
And as mentioned before 2024.12.0b2 ad one change #131702 which mentions changes in recorder. But I don't know python too well to understand what's was changed there

@bdraco
Copy link
Member

bdraco commented Dec 12, 2024

Maybe #132865

@gjohansson-ST
Copy link
Member

@dominikandreas this went a bit sideways.
It would be a bit interesting to get a profiler again with the change implemented, should we leave the issue open until after 2025.1 release and then update it?

@dominikandreas
Copy link
Author

Sorry for the late response. I think #132746 should have fixed the issue, thanks!

One further improvement could be rate limiting for very high frequency sensors, but this can also be achieved via
filters so I think it's fine to close the issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants