-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthcheck for the compute nodes #185
Comments
Ideally we create single health check which executed checks based upon the installed packages as otherwise we will end up with 3 health checks. |
Indeed, the checks on a master node are different then the checks on a compute node, an extra node or a storage node. |
We faced an issue yesterday that the logfiles on a compute node did not have the correct ownership. A separate health check will detect these issues. |
Why would we need a healthcheck for the compute nodes? The only thing that the compute node contains of OVS is the package libovsvolumedriver and or blktap. |
Checking the correct ownership isn't something for the monitoring (you just can't). Also, on the computenodes we need to make sure the /var/log/syslog kern.log ... must have the correct ownership. |
Personally I don't see why we would need a health-check for a compute node. These are the customer's machine, so I'd think it's their own responsibility to make sure they are configured as they should be. If we do want a health-check on a compute node, why not extend the health-check so we can make sure the firewalls are setup correctly, or that the soap dispenser in the toilets is sufficiently filled. |
@wimpers |
I do think we need a health check for the compute nodes:
Maybe testing a connection to the voldv? OPS will have more requirements. |
In case QEMU is installed we can check the health of a vdisk through qemu info. If there are errors with the vdisk it will or timeout or report IO errors. Add the option to verify the health of a single vdisk so you can test a subset of vdisks (f.e. random 5% of the vdisks). |
Currently the health check can only run on nodes with the framework. In the ticket #181 we already raised the need to run it on ASD nodes. We should also be able to run the health check on CPU nodes and check the relevant items there (f.e. permission settings, edge client)
The text was updated successfully, but these errors were encountered: