Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to document submodules without runtime implications #757

Open
aaronsteers opened this issue Nov 9, 2024 · 3 comments
Open

Ability to document submodules without runtime implications #757

aaronsteers opened this issue Nov 9, 2024 · 3 comments

Comments

@aaronsteers
Copy link

aaronsteers commented Nov 9, 2024

Problem Description

In general, I understand and agree with the pdoc use of __all__ to determine what to document.

However, when it comes to submodules within the __init__.py of a parent module, there are drawbacks to importing all submodules to the parent level.

Specifically, an import of submodules can increase memory footprint (in theory) and can cause otherwise unrepresented circular dependency issues (confirmed through my own experience).

There may also be implications when considering submodules that have imports that are only valid given the inclusion of certain python "extras". There's a risk that recursively importing submodules at runtime could cause unexpected failures in those cases.

I have loosely/tentatively confirmed that putting submodule imports in if typing.TYPE_CHECKING: seems to avoid these issues, but I'm not sure of the other implications.

Proposal

Either by promoting the TYPE_CHECKING workaround, or by another method, I'd like to find a path forward where submodules would be available for docs generation without risking adverse runtime impacts on the package itself.

Alternatives

A second option would be for pdoc to adopt a behavior of always documenting submodules if they are not prefixed with "_".

A third option would be for me to cleverly adapt my own docs/generate.py script to accomplish the same effect. (Not sure if this is possible.)

A fourth option would be to come up with html template to accomplish the same effect. (I'm pretty sure this is not possible.)

Additional context

Here is a ChatCPT conversation that explores implications and options: https://chatgpt.com/share/672ecbd4-47e0-8004-bf55-269227d96b9f

@aaronsteers
Copy link
Author

aaronsteers commented Nov 9, 2024

Workaround...

It's a lot of code, but basically we recurse all subdirectories manually and use naming conventions to add submodule files and folders explicitly in the docs/generate.py script.

Note: Submodules that don't have __all__ declared at all will receive a warning notification for their sub-submodules are being declared twice. This is because when __all__ is omitted, pdoc does auto-document submodules.

docs/generate.py

import os
import pathlib
import shutil
from typing import cast

import pdoc


def run() -> None:
    public_modules = [
        "airbyte_cdk",
    ]

    # Walk all subdirectories and add them to the `public_modules` list
    # if they do not begin with a "_" character.
    for parent_dir, dirs, files in os.walk(pathlib.Path("airbyte_cdk")):
        for dir_name in dirs:
            if "/." in parent_dir or "/_" in parent_dir:
                continue

            if dir_name.startswith((".", "_")):
                continue

            print(f"Found module dir: {parent_dir + '|' + dir_name}")

            # Check if the directory name does not begin with a "_"
            module = (parent_dir + "." + dir_name).replace("/", ".")
            if "._" not in module and not module.startswith("_"):
                public_modules.append(module)

        for file_name in files:
            if not file_name.endswith(".py"):
                continue
            if file_name in ["py.typed"]:
                continue
            if file_name.startswith((".", "_")):
                continue

            print(f"Found module file: {'|'.join([parent_dir, file_name])}")
            module = cast(str, ".".join([parent_dir, file_name])).replace("/", ".").removesuffix(".py")
            public_modules.append(module)

    # recursively delete the docs/generated folder if it exists
    if pathlib.Path("docs/generated").exists():
        shutil.rmtree("docs/generated")

    pdoc.render.configure(
        template_directory="docs",
        show_source=True,
        search=True,
        logo="https://docs.airbyte.com/img/logo-dark.png",
        favicon="https://docs.airbyte.com/img/favicon.png",
        mermaid=True,
        docformat="google",
    )
    nl = "\n"
    print(f"Generating docs for public modules: {nl.join(public_modules)}")
    pdoc.pdoc(
        *set(public_modules),
        output_directory=pathlib.Path("docs/generated"),
    )


if __name__ == "__main__":
    run()

@aaronsteers
Copy link
Author

aaronsteers commented Dec 7, 2024

@mhils - Returning here after some time testing. My workaround above only works to generate the submodules, but I realized after posting that it still doesn't make them appear in the left-hand nav of submodules. Is there any workaround so that my documentation readers can freely navigate submodules without my importing them directly to the parent module? As noted above, the import of submodules into their parent modules causes circular dependencies and is problematic in terms of runtime implications.

Another option I considered (but I don't know how to implement) would be to have a left-hand nav where the first section of the nav was the index of all submodules, ideally with our current position bolded or similar.

Basically - any way to show submodules and/or module hierarchy navigation without having to import submodules into parent modules would be amazing. I'm also very willing to contribute time if you can point me in the right direction. And, worth noting, the behavior of a missing __all__ is exactly what I want in regards to submodules - but some of my modules already have an __all__ to import classes and functions - and modifying that to include submodules seems untenable at this point. 😕

Thanks in advance for your help on this!

@elliot-100
Copy link
Contributor

elliot-100 commented Dec 16, 2024

Not sure if I've missed something obvious, but can't you use __all__ (perhaps built dynamically) in a documentation build process, even if you don't use it for regular use of the modules?

I guess I don't get why 'runtime impact' is a problem in itself, as documentation is not generated in a production environment (or is it, for some reason?)

But as I say I may be missing something obvious. Perhaps a minimal example would help get across what the actual issue is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants