Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs suggestion: pydantic models for borg's JSON output #8338

Open
a-gn opened this issue Aug 17, 2024 · 6 comments · May be fixed by #8343
Open

Docs suggestion: pydantic models for borg's JSON output #8338

a-gn opened this issue Aug 17, 2024 · 6 comments · May be fixed by #8343
Milestone

Comments

@a-gn
Copy link

a-gn commented Aug 17, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

It's an ISSUE (suggestion for docs improvement actually).
I don't think system info is needed.

Your borg version (borg -V).

1.4.0.

Describe the problem you're observing.

I wrote a small borg automation project for myself. Parsing borg's JSON output with pydantic was a bit of a pain because I had to write models for this by hand.

I suggest adding these Pydantic v2 models to the same docs, to make it easier to write frontends:

import json
import logging
import typing
from datetime import datetime
from pathlib import Path

import pydantic

_log = logging.getLogger(__name__)


class BaseBorgLogLine(pydantic.BaseModel):
    def get_level(self) -> int:
        """Get the log level for this line as a `logging` level value.

        If this is a log message with a levelname, use it.
        Otherwise, progress messages get `DEBUG` level, and other messages get `INFO`.
        """
        return logging.DEBUG


class ArchiveProgressLogLine(BaseBorgLogLine):
    original_size: int
    compressed_size: int
    deduplicated_size: int
    nfiles: int
    path: Path
    time: float


class FinishedArchiveProgress(BaseBorgLogLine):
    """JSON object printed on stdout when an archive is finished."""

    time: float
    type: typing.Literal["archive_progress"]
    finished: bool


class ProgressMessage(BaseBorgLogLine):
    operation: int
    msgid: typing.Optional[str]
    finished: bool
    message: typing.Optional[str]
    time: float


class ProgressPercent(BaseBorgLogLine):
    operation: int
    msgid: str | None = pydantic.Field(None)
    finished: bool
    message: str | None = pydantic.Field(None)
    current: float | None = pydantic.Field(None)
    info: list[str] | None = pydantic.Field(None)
    total: float | None = pydantic.Field(None)
    time: float

    @pydantic.model_validator(mode="after")
    def fields_depending_on_finished(self) -> typing.Self:
        if self.finished:
            if self.message is not None:
                raise ValueError("message must be None if finished is True")
            if self.current != self.total:
                raise ValueError("current must be equal to total if finished is True")
            if self.info is not None:
                raise ValueError("info must be None if finished is True")
            if self.total is not None:
                raise ValueError("total must be None if finished is True")
        else:
            if self.message is None:
                raise ValueError("message must not be None if finished is False")
            if self.current is None:
                raise ValueError("current must not be None if finished is False")
            if self.info is None:
                raise ValueError("info must not be None if finished is False")
            if self.total is None:
                raise ValueError("total must not be None if finished is False")
        return self


class FileStatus(BaseBorgLogLine):
    status: str
    path: Path


class LogMessage(BaseBorgLogLine):
    time: float
    levelname: typing.Literal["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
    name: str
    message: str
    msgid: typing.Optional[str]

    def get_level(self) -> int:
        try:
            return getattr(logging, self.levelname)
        except AttributeError:
            _log.warning(
                "could not find log level %s, giving the following message WARNING level: %s",
                self.levelname,
                json.dumps(self),
            )
            return logging.WARNING


_BorgLogLinePossibleTypes = (
    ArchiveProgressLogLine
    | FinishedArchiveProgress
    | ProgressMessage
    | ProgressPercent
    | FileStatus
    | LogMessage
)


class BorgLogLine(pydantic.RootModel[_BorgLogLinePossibleTypes]):
    """A log line from Borg with the `--log-json` argument."""

    def get_level(self) -> int:
        return self.root.get_level()


class _BorgArchive(pydantic.BaseModel):
    """Basic archive attributes."""

    name: str
    id: str
    start: datetime


class _BorgArchiveStatistics(pydantic.BaseModel):
    """Statistics of an archive."""

    original_size: int
    compressed_size: int
    deduplicated_size: int
    nfiles: int


class _BorgLimitUsage(pydantic.BaseModel):
    """Usage of borg limits by an archive."""

    max_archive_size: float


class _BorgDetailedArchive(_BorgArchive):
    """Archive attributes, as printed by `json info` or `json create`."""

    end: datetime
    duration: float
    stats: _BorgArchiveStatistics
    limits: _BorgLimitUsage
    command_line: typing.List[str]
    chunker_params: typing.Any | None = None


class BorgCreateResult(pydantic.BaseModel):
    """JSON object printed at the end of `borg create`."""

    archive: _BorgDetailedArchive


class BorgListResult(pydantic.BaseModel):
    """JSON object printed at the end of `borg list`."""

    archives: typing.List[_BorgArchive]

I think they are correct, I can parse all of borg's outputs in my runs.

Let me know if this is out of scope here and I should suggest it somewhere else :)

@ThomasWaldmann
Copy link
Member

Interesting idea, but I'ld rather would like to have them in the code and also have unit tests that they actually work. So we'll know when something breaks.

I don't use pedantic myself, but I'ld review a PR for such an addition.

@a-gn
Copy link
Author

a-gn commented Aug 23, 2024

Are you saying you'd prefer them to be importable from borg's code? I'm asking because the docs say that the internals aren't stable and that users should use the CLI. Or do you just mean that they should be in the code for tests, and automatically copied in the docs too?

@ThomasWaldmann
Copy link
Member

Yeah, internal apis are not stable. Guess not even the JSON is not fully stable, there might be quite some changes coming in borg2...

But we could have the models in the code and tests that they actually work for the current version.

@RonnyPfannschmidt
Copy link
Contributor

It might be a good reason to introduce private naming and public naming

@a-gn
Copy link
Author

a-gn commented Aug 24, 2024

It might be a good reason to introduce private naming and public naming

I'd suggest having a very very small public stable API, and the models could be in a module per version? from borg.public.json_models.v1 import BorgLogLine, from borg.public.json_models.v2 import BorgLogLine. Maybe later with small util types that expose information in Python with a common API.

a-gn added a commit to a-gn/borg that referenced this issue Aug 26, 2024
a-gn added a commit to a-gn/borg that referenced this issue Aug 26, 2024
@a-gn a-gn linked a pull request Aug 26, 2024 that will close this issue
@a-gn
Copy link
Author

a-gn commented Aug 26, 2024

I opened a pull request there; I can add more tests if you validate those.

I'm on macOS and don't use homebrew so installing borg here is kind of a mess (missing pkg-config). I will try later.

a-gn added a commit to a-gn/borg that referenced this issue Aug 31, 2024
@ThomasWaldmann ThomasWaldmann added this to the 1.4.1 milestone Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants