Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/rustic: init module #373042

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

nixos/rustic: init module #373042

wants to merge 2 commits into from

Conversation

Ekleog
Copy link
Member

@Ekleog Ekleog commented Jan 11, 2025

Second attempt at #372935.

This has been running smoothlessly on my 4 systems for quite a while, on both aarch64 and x86-64 linux.

Considering the complexity of the module, I'll wait for (either a very long time or) a review from another committer before landing. But I can already say I'm very happy with it personally :)

The only future work I'd see on it would be to add easy toggles to automatically run a FS-level snapshot before backing up, which I'd likely implement for ZFS and maybe BTRFS. Or maybe I'd PR that upstream directly, idk, it's not on the short-term todo-list anyway.

cc @emilylange @happysalada @NobbZ @philipmw

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: changelog 8.has: module (update) This PR changes an existing module in `nixos/` labels Jan 11, 2025
@Ekleog Ekleog mentioned this pull request Jan 11, 2025
13 tasks
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Jan 11, 2025
@happysalada
Copy link
Contributor

@emilylange you had concerns about the code , could you point them out ?

Comment on lines +20 to +21
else if v ? postgres then
v.postgres
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration with PostgreSQL seems overly specific. Nixpkgs has 20+ database packages; should this module grow to include them all? To me, it would make more sense for the operator to compose this module with the existing postgresqlBackup module. The latter would dump the databases on a schedule, then this one would back them up. Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… While it makes sense, it looks to me like the postgresqlBackup can only write to a file. Whereas here I'm using the rustic ability of streaming from stdin.

To give an example, I have one database of 13G (a pretty old matrix-synapse database), and I know some people have much bigger databases. I don't really want to write it to disk each day and then delete it for the backups — especially as I'm considering increasing the frequency of backups, thanks to rustic's deduplication.

Modifying the postgresqlBackup module to add a rustic backup location would be possible, but would likely be pretty complex (from an end-user UI perspective). And I'd argue it'd be even less well-placed there.

As for other databases, I already checked and mysql/mariadb seem to only allow running a global mysqldump without the postgresql-specific pg_dumpall + pg_dump * db; so I don't think it'd make sense to add it as people can just use the command backup type to backup all at once. I unfortunately don't know enough about the other databases to comment.

WDYT? If you have any other idea to support no-write-to-disk backup of databases I'd be happy to try it out!

The only alternative I can think of is something like a dynamic backup type that'd allow users to give one command that generates a JSON of backups, and then it'd generate one backup per such value. But that'd be pretty hard to use, as exhibited by the current postgres backup type — it'd lose the plug-and-play aspect of the nixos expert system :/

Comment on lines +57 to +60
else if o.pipeJsonInto ? command then
" --json | ${o.pipeJsonInto.command} \"$1\""
else if o.pipeJsonInto ? prometheus then
" --json | ${prometheusWriteScript o.pipeJsonInto.prometheus.nodeExporterCollectorFolder} \"$1\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with PostgreSQL, in my opinion the integration with Prometheus is overly specific for this module. There are other observability clients, such as Amazon CloudWatch Agent added after 24.11; should this module know about them all?

How easy would it be for the operator to integrate this module with Prometheus themselves using the generic constructs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT it'd be very hard: the JSON output of rustic would only be put in journalctl, and parsing metrics from there would be of significant complexity.

This being said, if we do allow the pipeJsonInto.command variant, then the complexity is basically this:

{
  prometheusNumericMetrics = [
    "files_new"
    "files_changed"
    "files_unmodified"
    "total_files_processed"
    "total_bytes_processed"
    "dirs_new"
    "dirs_changed"
    "dirs_unmodified"
    "total_dirs_processed"
    "total_dirsize_processed"
    "data_blobs"
    "tree_blobs"
    "data_added"
    "data_added_packed"
    "data_added_files"
    "data_added_files_packed"
    "data_added_trees"
    "data_added_trees_packed"
    "backup_duration"
    "total_duration"
  ];

  prometheusTimeMetrics = [
    "backup_start"
    "backup_end"
  ];

  prometheusAllMetrics = prometheusNumericMetrics ++ prometheusTimeMetrics;

  prometheusJqTemplate = lib.concatMapStrings (m: ''
    rustic_backup_${m}{id=\"{{id}}\"} {{${m}}}
  '') prometheusAllMetrics;

  prometheusWriteScript =
    f:
    pkgs.writeScript "rustic-write-to-prometheus" ''
      #!${pkgs.bash}/bin/bash
      set -euo pipefail

      ${pkgs.jq}/bin/jq -r --arg template "${prometheusJqTemplate}" '
        . as $json
        | $template
        | gsub("{{id}}"; "'"$1"'")
        ${lib.concatMapStrings (
          m: ''| gsub("{{${m}}}"; $json.summary.${m} | tostring)''
        ) prometheusNumericMetrics}
        ${lib.concatMapStrings
          # jq's `fromdate` builtins do not support sub-second accuracy (as of early 2025)
          (
            m:
            ''| gsub("{{${m}}}"; $json.summary.${m} | sub(".[0-9]{0,9}Z"; "Z") | fromdateiso8601 | tostring)''
          )
          prometheusTimeMetrics
        }
      ' > "${f}/rustic-backup-$1.prom.next"
      mv "${f}/rustic-backup-$1.prom.next" "${f}/rustic-backup-$1.prom"
    '';

  # Then pipeJsonInto.command = prometheusWriteScript myFolder
}

This requires quite a bit of knowledge of rustic-specific information, and I thought it'd make sense to integrate that directly into NixOS.

I also do believe that we can accept any further PR to add support for Amazon CloudWatch Agent or similar: just like for postgresql I believe that we can reuse the generic infrastructure to have the target-specific cost be relatively small, and thus we can avoid each user of such system having to re-pay the cost of implementing it by upstreaming everything into nixpkgs.

#!${pkgs.bash}/bin/bash
set -euo pipefail

TZ="" ${cfg.package}/bin/rustic backup${ifFilesArgs + ifCommandArgs + genericArgs + pipeJsonIntoCmd}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear TZ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for the JSON output to be relatively easily parsable: otherwise, the dates in the JSON output are given in a TZ-local timezone, and makes parsing unbelievably hard.

Thinking more about it, maybe it'd make sense to add that only when pipeJsonInto is non-empty?

enable = lib.mkEnableOption "rustic";
package = lib.mkPackageOption pkgs "rustic" { };

checkProfiles = lib.mkOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the operator not want this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that other modules that provide automated checks also provide a way to disable them, so I followed suit 😅 I also don't really see a point, but I guess automated checks are sometimes broken and need to be bypassed? That's also why it's defaulted to true

'';
};

profiles = lib.mkOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that the operator can specify profiles and/or backups, with profiles being files the operator writes himself while backups is generated from a config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that profiles determines the configuration of rustic, whereas backups defines a specific backup target.

The split allows a user to eg. define a profile that's not used for any automated backup, to trigger on-demand backups with sudo rustic -P my-manual-backup.

OTOH, having backups without profiles would likely be pretty useless.

@@ -99,6 +99,8 @@

- [git-worktree-switcher](https://github.com/mateusauler/git-worktree-switcher), switch between git worktrees with speed. Available as [programs.git-worktree-switcher](#opt-programs.git-worktree-switcher.enable)

- [rustic](https://github.com/rustic-rs/rustic), a Rust-based backup tool. Available as [services.rustic](#opt-services.rustic.enable).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this module is quite complex, IMO it necessitates some unit tests.

Copy link
Member Author

@Ekleog Ekleog Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I actually just got some motivation/time and added a basic integration test, validating at least the main functionality (including prometheus metrics generation, though checking the exact syntax would be hard) :)

backupConfigToCmd k {
command = v.postgres // {
filename = "${v.postgres.prefix}/globals.sql";
command = "${pkgs.sudo}/bin/sudo -u postgres ${config.services.postgresql.package}/bin/pg_dumpall --globals-only";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw the ${pkgs.sudo}/bin/sudo is wrong here: you'd need /run/wrappers/bin/sudo given that the store must not have setuid binaries in it.
However, why is it done like this in the first place? I'd argue it's preferable to reduce the permissions here as far as possible to not run all of the units as root.
Ideally, we'd have a readonly postgres user here, but unfortunately the postgresql ensure part is pretty ugly already and I don't see something like "create a readonly superuser" happening there for now[1].

[1] OK, actually I'm starting a bachelor's thesis that tries to get some ideas on state management with NixOS soon. So we may end up with answers eventually, but that's the status quo to work with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this means it might make sense to make the user configurable here and document in the manual what to do (i.e. something like create unix user auth, create corresponding DB user with read-only permissions). Does this make sense?

Copy link
Member Author

@Ekleog Ekleog Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the ${pkgs.sudo}/bin/sudo is actually on purpose: this service does not require gaining any privileges, and will already run as root. So we don't need a wrapper to use sudo only to lower privileges.

The service needs to be run as root anyway, because the password to the backup repo will (should) be readable only by root.

Based on that, the only part we could additionally drop permissions for would be just running the pg_dump/pg_dumpall command. And TBH I'm not sure it'd be worth adding infrastructure just to run these very specific commands without read rights, as they're made to be run as the postgres superuser.

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the password to the backup repo will (should) be readable only by root.

This sounds like a use for LoadCredential though.

Copy link
Member Author

@Ekleog Ekleog Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh I hadn't known about that! It's very interesting, thank you :)

Here are the changes that come to mind now that you mention it:

  • Rework the backups set to not be attrsOf (sum type), and instead be backups.files = attrsOf submodule; backup.postgres = attrsOf submodule and backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value.
  • Add to the common backup options a runAs and a credentialsProfile option, that'd get loaded by systemd credentials and passed as an additional -P argument. runAs runs the systemd service as this user.
  • credentialsProfile is always mandatory, unless backups.credentialsProfile is set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store
  • For files backups, runAs defaults to root
  • For command backups, runAs is mandatory
  • For postgres backups, runAs defaults to postgres

Now, the main question I'm left with is: how to make it as easy as possible for the user to restore/run manual commands like snapshot listing? They can always use -P /root/the-password, but eg. the absence of .toml is pretty counter-intuitive. But OTOH each backup could have different creds, so I can't just add a wrapped rustic to systemPackages. But then maybe we can just ignore this problem anyway.

WDYT, both about the changes ideas and the "run manually" problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I am not part of the postgres maintainers, so take the following with a grain of salt.

A handful of modules across nixpkgs implement either of the approaches documented in https://nixos.org/manual/nixos/unstable/#module-services-postgres-initializing-extra-permissions to bootstrap users, database tables, or enable extensions.

You can find most by searching for either User = "postgres" or $PSQL.

A read-only user for backups should, based on my reading I did a while ago, simply use the pg_read_all_data role from https://www.postgresql.org/docs/15/predefined-roles.html.

For example:

systemd.services.postgresql.postStart = lib.mkAfter ''
  $PSQL -c 'GRANT "pg_read_all_data" TO "rustic-postgres-ro";'
'';

Note that this restarts postgres.service on initial deploy unlike the oneshot/sidecar approach.

Then your actual service that does the backup could use

serviceConfig = {
  User = "rustic-postgres-ro";
  DynamicUser = true;
};

and whatever credentials you may need should likely use the above-mentioned LoadCredentials.

You may also want to set and configure CacheDirectory for the local repo cache.

Copy link
Member Author

@Ekleog Ekleog Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your research about pg_read_all_data!

However, I'd disagree about both DynamicUser and CacheDirectory. By using a regular user with the default rustic repo cache, we allow the user to easily run sudo rustic restore, sudo -u rustic-postgres-ro rustic snapshots, etc.

I do believe this is an important use case, and would feel quite sad if restoring or listing snapshots were any more complex than it could be — good restoring UX is much more important than good backup UX, as restores usually happen in a rush and under stress.

With this in mind, here is the new plan for changes I'm thinking of:

  1. Rework the backups set to not be attrsOf (sum type), and instead be backups.files = attrsOf submodule; backup.postgres = attrsOf submodule; backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value but does add complexity.
  2. Add to the common backup options a runAs and a credentialsProfile option, that'd get loaded by systemd credentials and passed as an additional -P argument. runAs runs the systemd service as this user.
  3. credentialsProfile is always mandatory, unless backups.credentialsProfile is set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store
  4. For files backups, runAs defaults to root
  5. For command backups, runAs is mandatory
  6. For postgres backups, runAs defaults to rustic-postgres-ro, which gets pg_read_all_data permissions on the postgres databases (I'll test this role on my database and confirm bit-perfect equivalence with a backup made with postgres; if it's not I'll just keep postgres for now)

WDYT about this updated plan?

#!${pkgs.bash}/bin/bash
set -euo pipefail

${systemctl} start --no-block rustic-postgres-globals-${k}.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first line in the ExecStart, isn't it? And also the reason for the root requirement, correct?
So I'm wondering why not

  • give less permissions where possible
  • use Wants for this use-case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "give less permissions" part, I think I answered above.

For Wants here, the only reason I did it this way here is for consistency with the rest of the rustic-postgres-db-${k}@$db.service deps — this way all the "deps" of rustic-backup-${k} are handled in the same way.

Do you think it'd be better to handle the only static dep with Wants? If yes I'll be happy to switch to it!

@h7x4 h7x4 added 8.has: module (new) This PR adds a module in `nixos/` 8.has: tests This PR has tests labels Jan 17, 2025
@Ekleog
Copy link
Member Author

Ekleog commented Jan 20, 2025

Great news! I received a greenlight from rustic maintainers to upstream prometheus pushgateway support; so I'll be able to remove all that part from the PR.

Here is my current plan for the future of this PR, please advise (even if you think it's good, just a 👍 helps a lot in making sure I'm going the right way) :)

  1. Rework the backups set to not be attrsOf (sum type), and instead be backups.files = attrsOf submodule; backup.postgres = attrsOf submodule; backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value but does add complexity.
  2. Add to the common backup options a runAs and a credentialsProfile option, that'd get loaded by systemd credentials and passed as an additional -P argument. runAs runs the systemd service as this user.
  3. credentialsProfile is always mandatory, unless backups.credentialsProfile is set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store
  4. For files backups, runAs defaults to root
  5. For command backups, runAs is mandatory
  6. For postgres backups, runAs defaults to rustic-postgres-ro, which gets pg_read_all_data permissions on the postgres databases (I'll test this role on my database and confirm bit-perfect equivalence with a backup made with postgres; if it's not I'll just keep postgres for now). Use the database postStart option
  7. Remove all the prometheus support code, as it should be upstream in rustic hopefully soon (I'll wait for it to land before updating this PR, so we'll have to wait a bit before landing it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (new) This PR adds a module in `nixos/` 8.has: module (update) This PR changes an existing module in `nixos/` 8.has: tests This PR has tests 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants