Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with host DNS resolution when configuring services #469

Closed
verdurin opened this issue Nov 7, 2024 · 5 comments · Fixed by #473
Closed

Problems with host DNS resolution when configuring services #469

verdurin opened this issue Nov 7, 2024 · 5 comments · Fixed by #473

Comments

@verdurin
Copy link

verdurin commented Nov 7, 2024

Following the README procedure, I found that stackhpc.nfs failed at the point of trying to mount NFS because it couldn't resolve the hostname of the Slurm control node:

TASK [stackhpc.nfs : mount the filesystem] ***************************************************************************************************************************************************************************************************
fatal: [slurmtest03-login-0]: FAILED! => {
    "changed": false
}

MSG:

Error mounting /home: mount.nfs: Failed to resolve server slurmtest03-control: Name or service not known

fatal: [slurmtest03-compute-1]: FAILED! => {
    "changed": false
}

MSG:

Error mounting /home: mount.nfs: Failed to resolve server slurmtest03-control: Name or service not known

fatal: [slurmtest03-compute-0]: FAILED! => {
    "changed": false
}

MSG:

Error mounting /home: mount.nfs: Failed to resolve server slurmtest03-control: Name or service not known

I see there has been previous discussion about whether to use IPs or hostnames.

Can work around this for now with custom SSH config, but I think this should be documented more explicitly.

@verdurin
Copy link
Author

verdurin commented Nov 7, 2024

A similar error occurred with MySQL, and I applied the same fix i.e. adding an IP address in the environment vars.

@verdurin verdurin changed the title NFS host resolution fails Problems with host DNS resolution when configuring services Nov 7, 2024
@sjpb
Copy link
Collaborator

sjpb commented Nov 7, 2024

The workaround for a lack of working internal dns is to enable the etc_hosts role.

In environments/$your_env/inventory/groups, at line 54 (probably) you will have this:

[etc_hosts]
# Hosts to manage /etc/hosts e.g. if no internal DNS. See ansible/roles/etc_hosts/README.md

you need to enable this by adding hosts to this group e.g.

[etc_hosts]
# Hosts to manage /etc/hosts e.g. if no internal DNS. See ansible/roles/etc_hosts/README.md
cluster

edit: we should probably turn this on by default TBH

@verdurin
Copy link
Author

verdurin commented Nov 7, 2024

Ah, I thought there should be provision for this, and I just hadn't found it. Will try that now, thanks.

@sjpb
Copy link
Collaborator

sjpb commented Nov 7, 2024

You can see all the available groups in https://github.com/stackhpc/ansible-slurm-appliance/blob/main/environments/common/inventory/groups. Usually there is a role of the same name with documentation.

@verdurin
Copy link
Author

verdurin commented Nov 7, 2024

I think it would be friendlier to newbies to have this on by default, yes.

@wtripp180901 wtripp180901 linked a pull request Nov 11, 2024 that will close this issue
6 tasks
@sjpb sjpb closed this as completed in #473 Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants