-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/rustic: init module #373042
base: master
Are you sure you want to change the base?
nixos/rustic: init module #373042
Conversation
@emilylange you had concerns about the code , could you point them out ? |
else if v ? postgres then | ||
v.postgres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration with PostgreSQL seems overly specific. Nixpkgs has 20+ database packages; should this module grow to include them all? To me, it would make more sense for the operator to compose this module with the existing postgresqlBackup
module. The latter would dump the databases on a schedule, then this one would back them up. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… While it makes sense, it looks to me like the postgresqlBackup
can only write to a file. Whereas here I'm using the rustic ability of streaming from stdin.
To give an example, I have one database of 13G (a pretty old matrix-synapse database), and I know some people have much bigger databases. I don't really want to write it to disk each day and then delete it for the backups — especially as I'm considering increasing the frequency of backups, thanks to rustic's deduplication.
Modifying the postgresqlBackup
module to add a rustic
backup location would be possible, but would likely be pretty complex (from an end-user UI perspective). And I'd argue it'd be even less well-placed there.
As for other databases, I already checked and mysql/mariadb seem to only allow running a global mysqldump
without the postgresql-specific pg_dumpall
+ pg_dump
* db; so I don't think it'd make sense to add it as people can just use the command
backup type to backup all at once. I unfortunately don't know enough about the other databases to comment.
WDYT? If you have any other idea to support no-write-to-disk backup of databases I'd be happy to try it out!
The only alternative I can think of is something like a dynamic
backup type that'd allow users to give one command that generates a JSON of backups, and then it'd generate one backup per such value. But that'd be pretty hard to use, as exhibited by the current postgres
backup type — it'd lose the plug-and-play aspect of the nixos expert system :/
else if o.pipeJsonInto ? command then | ||
" --json | ${o.pipeJsonInto.command} \"$1\"" | ||
else if o.pipeJsonInto ? prometheus then | ||
" --json | ${prometheusWriteScript o.pipeJsonInto.prometheus.nodeExporterCollectorFolder} \"$1\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with PostgreSQL, in my opinion the integration with Prometheus is overly specific for this module. There are other observability clients, such as Amazon CloudWatch Agent added after 24.11; should this module know about them all?
How easy would it be for the operator to integrate this module with Prometheus themselves using the generic constructs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT it'd be very hard: the JSON output of rustic would only be put in journalctl
, and parsing metrics from there would be of significant complexity.
This being said, if we do allow the pipeJsonInto.command
variant, then the complexity is basically this:
{
prometheusNumericMetrics = [
"files_new"
"files_changed"
"files_unmodified"
"total_files_processed"
"total_bytes_processed"
"dirs_new"
"dirs_changed"
"dirs_unmodified"
"total_dirs_processed"
"total_dirsize_processed"
"data_blobs"
"tree_blobs"
"data_added"
"data_added_packed"
"data_added_files"
"data_added_files_packed"
"data_added_trees"
"data_added_trees_packed"
"backup_duration"
"total_duration"
];
prometheusTimeMetrics = [
"backup_start"
"backup_end"
];
prometheusAllMetrics = prometheusNumericMetrics ++ prometheusTimeMetrics;
prometheusJqTemplate = lib.concatMapStrings (m: ''
rustic_backup_${m}{id=\"{{id}}\"} {{${m}}}
'') prometheusAllMetrics;
prometheusWriteScript =
f:
pkgs.writeScript "rustic-write-to-prometheus" ''
#!${pkgs.bash}/bin/bash
set -euo pipefail
${pkgs.jq}/bin/jq -r --arg template "${prometheusJqTemplate}" '
. as $json
| $template
| gsub("{{id}}"; "'"$1"'")
${lib.concatMapStrings (
m: ''| gsub("{{${m}}}"; $json.summary.${m} | tostring)''
) prometheusNumericMetrics}
${lib.concatMapStrings
# jq's `fromdate` builtins do not support sub-second accuracy (as of early 2025)
(
m:
''| gsub("{{${m}}}"; $json.summary.${m} | sub(".[0-9]{0,9}Z"; "Z") | fromdateiso8601 | tostring)''
)
prometheusTimeMetrics
}
' > "${f}/rustic-backup-$1.prom.next"
mv "${f}/rustic-backup-$1.prom.next" "${f}/rustic-backup-$1.prom"
'';
# Then pipeJsonInto.command = prometheusWriteScript myFolder
}
This requires quite a bit of knowledge of rustic-specific information, and I thought it'd make sense to integrate that directly into NixOS.
I also do believe that we can accept any further PR to add support for Amazon CloudWatch Agent or similar: just like for postgresql
I believe that we can reuse the generic infrastructure to have the target-specific cost be relatively small, and thus we can avoid each user of such system having to re-pay the cost of implementing it by upstreaming everything into nixpkgs.
#!${pkgs.bash}/bin/bash | ||
set -euo pipefail | ||
|
||
TZ="" ${cfg.package}/bin/rustic backup${ifFilesArgs + ifCommandArgs + genericArgs + pipeJsonIntoCmd} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clear TZ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for the JSON output to be relatively easily parsable: otherwise, the dates in the JSON output are given in a TZ-local timezone, and makes parsing unbelievably hard.
Thinking more about it, maybe it'd make sense to add that only when pipeJsonInto
is non-empty?
enable = lib.mkEnableOption "rustic"; | ||
package = lib.mkPackageOption pkgs "rustic" { }; | ||
|
||
checkProfiles = lib.mkOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the operator not want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that other modules that provide automated checks also provide a way to disable them, so I followed suit 😅 I also don't really see a point, but I guess automated checks are sometimes broken and need to be bypassed? That's also why it's defaulted to true
''; | ||
}; | ||
|
||
profiles = lib.mkOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that the operator can specify profiles
and/or backups
, with profiles
being files the operator writes himself while backups
is generated from a config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that profiles
determines the configuration of rustic
, whereas backups
defines a specific backup target.
The split allows a user to eg. define a profile
that's not used for any automated backup, to trigger on-demand backups with sudo rustic -P my-manual-backup
.
OTOH, having backups
without profiles
would likely be pretty useless.
@@ -99,6 +99,8 @@ | |||
|
|||
- [git-worktree-switcher](https://github.com/mateusauler/git-worktree-switcher), switch between git worktrees with speed. Available as [programs.git-worktree-switcher](#opt-programs.git-worktree-switcher.enable) | |||
|
|||
- [rustic](https://github.com/rustic-rs/rustic), a Rust-based backup tool. Available as [services.rustic](#opt-services.rustic.enable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this module is quite complex, IMO it necessitates some unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I actually just got some motivation/time and added a basic integration test, validating at least the main functionality (including prometheus metrics generation, though checking the exact syntax would be hard) :)
backupConfigToCmd k { | ||
command = v.postgres // { | ||
filename = "${v.postgres.prefix}/globals.sql"; | ||
command = "${pkgs.sudo}/bin/sudo -u postgres ${config.services.postgresql.package}/bin/pg_dumpall --globals-only"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw the ${pkgs.sudo}/bin/sudo
is wrong here: you'd need /run/wrappers/bin/sudo
given that the store must not have setuid binaries in it.
However, why is it done like this in the first place? I'd argue it's preferable to reduce the permissions here as far as possible to not run all of the units as root.
Ideally, we'd have a readonly postgres user here, but unfortunately the postgresql ensure part is pretty ugly already and I don't see something like "create a readonly superuser" happening there for now[1].
[1] OK, actually I'm starting a bachelor's thesis that tries to get some ideas on state management with NixOS soon. So we may end up with answers eventually, but that's the status quo to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this means it might make sense to make the user configurable here and document in the manual what to do (i.e. something like create unix user auth, create corresponding DB user with read-only permissions). Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the ${pkgs.sudo}/bin/sudo
is actually on purpose: this service does not require gaining any privileges, and will already run as root. So we don't need a wrapper to use sudo only to lower privileges.
The service needs to be run as root anyway, because the password to the backup repo will (should) be readable only by root
.
Based on that, the only part we could additionally drop permissions for would be just running the pg_dump/pg_dumpall
command. And TBH I'm not sure it'd be worth adding infrastructure just to run these very specific commands without read rights, as they're made to be run as the postgres superuser.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the password to the backup repo will (should) be readable only by root.
This sounds like a use for LoadCredential
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh I hadn't known about that! It's very interesting, thank you :)
Here are the changes that come to mind now that you mention it:
- Rework the
backups
set to not be attrsOf (sum type), and instead be backups.files = attrsOf submodule; backup.postgres = attrsOf submodule and backup.command = attrsOf submodule. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value. - Add to the common backup options a
runAs
and acredentialsProfile
option, that'd get loaded by systemd credentials and passed as an additional-P
argument.runAs
runs the systemd service as this user. credentialsProfile
is always mandatory, unlessbackups.credentialsProfile
is set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store- For files backups,
runAs
defaults to root - For command backups,
runAs
is mandatory - For postgres backups,
runAs
defaults topostgres
Now, the main question I'm left with is: how to make it as easy as possible for the user to restore/run manual commands like snapshot listing? They can always use -P /root/the-password
, but eg. the absence of .toml
is pretty counter-intuitive. But OTOH each backup could have different creds, so I can't just add a wrapped rustic to systemPackages. But then maybe we can just ignore this problem anyway.
WDYT, both about the changes ideas and the "run manually" problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am not part of the postgres maintainers, so take the following with a grain of salt.
A handful of modules across nixpkgs implement either of the approaches documented in https://nixos.org/manual/nixos/unstable/#module-services-postgres-initializing-extra-permissions to bootstrap users, database tables, or enable extensions.
You can find most by searching for either User = "postgres"
or $PSQL
.
A read-only user for backups should, based on my reading I did a while ago, simply use the pg_read_all_data
role from https://www.postgresql.org/docs/15/predefined-roles.html.
For example:
systemd.services.postgresql.postStart = lib.mkAfter ''
$PSQL -c 'GRANT "pg_read_all_data" TO "rustic-postgres-ro";'
'';
Note that this restarts postgres.service
on initial deploy unlike the oneshot/sidecar approach.
Then your actual service that does the backup could use
serviceConfig = {
User = "rustic-postgres-ro";
DynamicUser = true;
};
and whatever credentials you may need should likely use the above-mentioned LoadCredentials
.
You may also want to set and configure CacheDirectory
for the local repo cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your research about pg_read_all_data
!
However, I'd disagree about both DynamicUser
and CacheDirectory
. By using a regular user with the default rustic repo cache, we allow the user to easily run sudo rustic restore
, sudo -u rustic-postgres-ro rustic snapshots
, etc.
I do believe this is an important use case, and would feel quite sad if restoring or listing snapshots were any more complex than it could be — good restoring UX is much more important than good backup UX, as restores usually happen in a rush and under stress.
With this in mind, here is the new plan for changes I'm thinking of:
- Rework the
backups
set to not be attrsOf (sum type), and instead bebackups.files = attrsOf submodule; backup.postgres = attrsOf submodule; backup.command = attrsOf submodule
. This will allow to define them in terms of one another (esp. if we ever get a mysql/mariadb backup option that'd literally be an alias for command with a predefined command), and the sum type doesn't actually bring much value but does add complexity. - Add to the common backup options a
runAs
and acredentialsProfile
option, that'd get loaded by systemd credentials and passed as an additional-P
argument.runAs
runs the systemd service as this user. credentialsProfile
is always mandatory, unlessbackups.credentialsProfile
is set, in which case it gets automatically used. This is to discourage people from having their credentials in the nix store- For files backups,
runAs
defaults to root - For command backups,
runAs
is mandatory - For postgres backups,
runAs
defaults torustic-postgres-ro
, which getspg_read_all_data
permissions on the postgres databases (I'll test this role on my database and confirm bit-perfect equivalence with a backup made withpostgres
; if it's not I'll just keeppostgres
for now)
WDYT about this updated plan?
#!${pkgs.bash}/bin/bash | ||
set -euo pipefail | ||
|
||
${systemctl} start --no-block rustic-postgres-globals-${k}.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first line in the ExecStart, isn't it? And also the reason for the root
requirement, correct?
So I'm wondering why not
- give less permissions where possible
- use
Wants
for this use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "give less permissions" part, I think I answered above.
For Wants
here, the only reason I did it this way here is for consistency with the rest of the rustic-postgres-db-${k}@$db.service
deps — this way all the "deps" of rustic-backup-${k}
are handled in the same way.
Do you think it'd be better to handle the only static dep with Wants
? If yes I'll be happy to switch to it!
Great news! I received a greenlight from Here is my current plan for the future of this PR, please advise (even if you think it's good, just a 👍 helps a lot in making sure I'm going the right way) :)
|
Second attempt at #372935.
This has been running smoothlessly on my 4 systems for quite a while, on both aarch64 and x86-64 linux.
Considering the complexity of the module, I'll wait for (either a very long time or) a review from another committer before landing. But I can already say I'm very happy with it personally :)
The only future work I'd see on it would be to add easy toggles to automatically run a FS-level snapshot before backing up, which I'd likely implement for ZFS and maybe BTRFS. Or maybe I'd PR that upstream directly, idk, it's not on the short-term todo-list anyway.
cc @emilylange @happysalada @NobbZ @philipmw
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.