Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate pyodide repodata, add more mocks #27

Closed
wants to merge 16 commits into from

Conversation

bollwyvl
Copy link
Contributor

@bollwyvl bollwyvl commented Mar 11, 2023

References

Changes

beyond jupyterlite/jupyterlite#979

  • rename package.json piplite key to pyodideKernel
  • add schema for package.json
  • move install_on_import to PyodideAddon
  • refactor some more common Warehouse/Repodata tools
  • update prompt_toolkit shims
  • restore consistency tests previously handled by doit
  • add (temporary) requirements-dev.txt, go back to RTD wheels
    • too many warnings otherwise

  • add repodata "totality" to check
    • look at the union of all repodataUrls (including Pyodide's) to ensure all packages have a wheel
  • use install_on_import on the demo site
    • ideally against pyodide-core
    • also use example in local/CI tests
  • add/generate some more "badly behaved" wheels
    • ugly metadata
    • weird names
  • handle non-wheels found in repodata (e.g. sqlite)
  • add I (isort) to ruff

@github-actions
Copy link
Contributor

lite-badge 👈 Try it on ReadTheDocs

@bollwyvl bollwyvl added the enhancement New feature or request label Mar 11, 2023
@bollwyvl bollwyvl force-pushed the gh-14-repodata branch 3 times, most recently from ad6ad52 to c22fd31 Compare March 12, 2023 18:20
@bollwyvl bollwyvl marked this pull request as ready for review March 13, 2023 02:17
@bollwyvl
Copy link
Contributor Author

I think this is ready for review. Landing this without docs feels kinda bad, not sure what to do about that.

I am still not 100% convinced on:

  • what to call the feature at the command line/config file (currently --pyodide-install-on-import)
  • whether the piplite wheels and pyodide wheels+ should be mixed in the same folder

@bollwyvl
Copy link
Contributor Author

Of note on benchmarking this vs the current latest: by going to the smaller pyodide-core to really prove out the offline stuff, the savings of .whl downloads are nearly dominated by the difference in uncompressed pyodide assets as served by RTD vs jsdelivr:

main #27
wheels 4.5mb 2.4mb
pyodide.asm.data 3.2mb 5.0mb
pyodide_py.tar 36kb 165kb

Both .data and .tar are served as application/octet-stream, which appears to not be something most servers/proxies/CDN will compress with br or gzip 😿 .

@@ -5,7 +5,7 @@ node_modules/
*.egg-info/
.ipynb_checkpoints
*.tsbuildinfo
jupyterlite_pyodide_kernel/labextension
src/jupyterlite_pyodide_kernel/labextension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this folder in this PR? Are we expected more Python packages to be added to the repo later?

Would need to check how the releaser handles that. It probably breaks it without extra config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to keep jupyterlite_pyodider_kernel to the top-level for now until we really need to move it down to a nested src folder or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src layout is widely used in a lot of packages, but i'll grant i haven't used it with the releaser.

This avoids the possibility of it shadowing the as-installed package in site-packages. For some of the coverage stuff to work, it took weird junk like deleting the source and forcing a non--e . install.

This layout is supported by the all the build backends at this point, and releaser probably needs to be updated, if it can't yet do that.

It is weird that PEP 621 doesn't have a canonical place that information, but that's the brave new world.

Copy link
Member

@jtpio jtpio Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. For now it's probably just going to create more issues. Happy to reconsider it if we track that in another issue. So the releaser config can properly be handled.

But for now it would be nice to keep it at the top-level so we can continue making releases (also it's not really related to the main change of this PR).

@jtpio
Copy link
Member

jtpio commented Mar 13, 2023

Thanks @bollwyvl for porting this PR.

Looking at the diff and its relative complexity, I'm wondering if this is the right approach going forward.

Taking a step back, it looks like we are trying to handle package management for Pyodide here downstream? Since a couple of these packages are built-in Pyodide by default, maybe there is a way to upstream some of that?

Otherwise maybe the Pyodide kernel should provide the right hooks for building and bundling the right packages at build time. Similar to what the xeus python kernel does. Probably the drawback here would be an added complexity of the build requirements as building a Pyodide distribution might not be as straightforward as pulling packages from emscripten forge and conda forge.

Just thinking out loud here.

@bollwyvl
Copy link
Contributor Author

we are trying to handle package management for Pyodide here downstream?

We're working within the well-structured JSON format of repodata.json, similar to, and compatible with, emulating the Warehouse API, in an extensible way, that also works with dynamically-installed PyPI packages.

It also:

right hooks for building and bundling the right packages at build time.

That's what this does. It just doesn't change the upstream repodata.json at rest, and merges them at runtime, including from federated extensions.

maybe there is a way to upstream some of that?

Upstream has already made this mostly possible, by making the live repodata part of the JS and python API. Unfortunately, getting this out is still locked behind an enormous install, which is not really their problem, as they are generally concerned with full custom distributions. pyodide/pyodide#3573

what the xeus python kernel does.

which doesn't work with dynamically-installed packages (from PyPI or otherwise), last i checked.

as building a Pyodide distribution might not be as straightforward

Yep, it's pretty big, and while i am maintaining the pyodide-build-feedstock, I don't think making that even an optional dependency is going to be good play. The docs probably still stand for folk that are doing heavy things (c/c++, rust, nodejs, whatever build chains). Given this problem can almost be solved in pure python, I'm hesitant to add any of those).

The complexity for resolving which packages are needed is locked in the dodo.py over on core (or was taken out?), and not a site builder feature at this time, because pip doesn't have a stable API. poetry was better (what conda-lock uses), but it also kept changing so much, conda-lock had to vendor parts of it.

micropip doesn't really answer the mail as an out-of-band solver, either, last i checked, but there have been some improvements that will land with the forthcoming major pyodide release... but still probably not composable.

micropip.freeze in the browser is closer, emits the dynamically installed packages (with file_name: https://... plus all packages in the pyodide distribution, which isn't super helpful.

We could add a piplite.freeze which generated a smaller subset (e.g. everything served from PyPI or this repo), so that a content builder could:

  • write some content with %pip install
  • run piplite.freeze (it's not async 🎉)
  • copy out the JSON repodata.json to {lite_dir}/pypi/repodata.json
  • rebuild the site
  • remove all the %pip and just import
  • rebuild the site

@bollwyvl
Copy link
Contributor Author

Ok, RTD, back to building with an explicit jlpm dev

@jtpio
Copy link
Member

jtpio commented Mar 13, 2023

Unfortunately, getting this out is still locked behind an enormous install, which is not really their problem

Well it doesn't mean it should be ours. While improving the loading time is definitely something we want for lite users, I'm not sure doing it here at the kernel level is the right place. Of course this could be a way to iterate more easily, but likely these mocks and the handling of packages is something other users of Pyodide in general will want to have.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Mar 13, 2023

doing it here at the kernel level

Open to any other alternative that answers all the same mail. This package is only a kernel now, and provides the entire this-WASM-thing-is-a-python hallucination: where do you propose something should change?

@jtpio
Copy link
Member

jtpio commented Mar 15, 2023

I guess something that builds just on top of Pyodide would be nice. Maybe it could be a side thing for now, not sure.

Was also wondering about the state of reusing Emscripten Forge packages in Pyodide, or if they were still planning to more the packages out of tree (need to lookup the right issues).

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Mar 15, 2023

builds just on top of Pyodide

I dunno: this kernel builds on top of pyodide in a lower-level way than almost any other package (maybe p*script, but i don't track what they're doing, having given all the polite feedback I could). I don't know how much more basal we can get, as we control the worker, the pyodide, the filesystem, and everything else. This kernel is the edge case for all of the upstream and downstream design problems, and no one is going to go very far out of their way to solve our problems until this package can solve their problems.

Newer micropip (not compatible with our pyodide) has some new features that will be helpful, but won't solve the jedi "problem" directly without building an entire distribution... on the next release. pyodide-pack is... crazy, experimental, uses hella nodejs, and, like xeus, doesn't leave the system in a state that's always appropriate for interactive, exploratory computing. The issue i linked above is likely all I can do at this point, and would rather have something that works today, which we can swap out later once it has demonstrated its value, and can be consumed upstream.

Leaving the repodataUrls in the schema, and the ability to work with it in the .ts, is fine, and moving the CLI to another package in this repo is probably fine, e.g. jupyterlite-pyodide-repodata, if that's the way we want to play it. Moving it to another repo is.... yet another repo, another docs site, another build chain, another test suite, and more releases... and if it's not part of the default install, every user will keep pulling/parsing that 1.5mb of jedi every kernel start.

Indeed: the packages I'm patching aren't really a problem for upstream IPython on a real machine, and it should certainly not make jedi or prompt-toolkit optional because... [extras] are broken. And jedi works fine in a fully-controlled WASM environment where you can pre-warm the cache... but not feasible with dynamic package installation, which is still one of the marquee features of this kernel.

As for jupyterlab_widgets: I broached the topic of an ipywidgets-base (e.g. the kernel only pieces, without a dependency on client packages), and they were receptive to the idea... in ipywidgets 9.

So basically: we either sit here, waiting for things to get more to our liking upstream and downstream, or just accept the complexity here in a single codebase, test the hell out of it, and get back to shipping new features.

the state of reusing Emscripten Forge packages in Pyodide

I've barked up that tree a lot, and they're pretty set on wheels... even though PyPI hasn't budged on allowing uploading wasm-32 wheels. And they have a lot of stuff that isn't (and shouldn't be wheels). So dunno.

From an site owner/site user perspective: I think the reverse (xeus being able to use packages from PyPI, both in environment.yml#pip: and dynamically with a semi-portable %pip) is a bit of an easier ship to turn. These could point at pyodide-built packages, on CDN, where necessary, or a full pyodide distribution could go on emscripten-forge.

@jtpio
Copy link
Member

jtpio commented Mar 15, 2023

So basically: we either sit here, waiting for things to get more to our liking upstream and downstream, or just accept the complexity here in a single codebase, test the hell out of it, and get back to shipping new features.

Yeah that's it. I mean this kernel is becoming quite complex now and as mentioned above it looks like a lot of what it is doing should belong somewhere else.

Now that the Pyodide kernel can be iterated on and released independently from JupyterLite maybe could just bump major versions more freely if we decide to make drastic changes or move some logic out of the repo.

I'll have another look at the PR soon.

I think the reverse (xeus being able to use packages from PyPI, both in environment.yml#pip: and dynamically with a semi-portable %pip) is a bit of an easier ship to turn

Yep, I think it's close:

@bollwyvl
Copy link
Contributor Author

Welp, again: nothing changed upstream (or in neighboring packages) in the last week: I'd still like to be able to ship packages this way, still allowing for unplanned imports. Open to other suggestions, but still see this as something the kernel has to manage, or at least be aware of, for the time being.

bollwyvl added a commit to bollwyvl/pyodide-kernel that referenced this pull request Mar 31, 2023
@bollwyvl bollwyvl mentioned this pull request Mar 31, 2023
8 tasks
jtpio added a commit that referenced this pull request Mar 31, 2023
* Update to Pyodide 0.23.0a1

* Update CDN

* bump bottom jupyterlite-core

* bump to pyodide 0.23.0

* bump python versions in subpackages

* replace upstream fixtures

* fix proxy instance check

* backport repo checks from #27

* more pytest config

* ignore libarchive warning by env var

* roll back pytest warning settings

---------

Co-authored-by: Jeremy Tuloup <[email protected]>
@bollwyvl bollwyvl mentioned this pull request Mar 31, 2023
5 tasks
@rth
Copy link

rth commented Mar 31, 2023

Both .data and .tar are served as application/octet-stream, which appears to not be something most servers/proxies/CDN will compress with br or gzip

This was fixed in Pyodide 0.23.0 if you set MIME type to wasm, JsDelivir will happily br compress binary files. Though it depends also on where you deploy if it's not JsDelivr and whether you can set the MIME type manually.

I'll try to have a look in more detail at this PR to understand what exactly you need and whether we could share more of the implementation. I mean we have been talking about micropip-related changes for some time, but someones need to work on the implementation :) and this hasn't been exactly at top of my priorities. Will try to look more into this now that 0.23 is out.

@bollwyvl
Copy link
Contributor Author

you set MIME type

Welp, we're strictly not in the hosting business, but enable and encourage others to do so, ideally for free, and triggered by something as simple as dragging a notebook into the Git(Lab/Hub) UI and waiting to get a site built.

So hosting features that need flags, configuration, or extra accounts are not going to help those folks. I don't know enough about the inner plumbing of Git(Hub|Lab) Pages or ReadTheDocs to even document how one might might force them to do fancy MIME tricks.

what exactly you need

Well, need is of course a big word: we already take plenty, and of course are very thankful 🤗

Basically: we'd like an owner of a jupyterlite site to be able to, offline, calculate how to replace one or more individual packages in an otherwise-full repodata.json, while using the professionally-managed jsdeliver distribution for everything else. By having it pre-indexed, it would know imports, which would make it play nice-nice with the auto import mechanism. Actually building packages is mostly out of scope: usually folk just need their wheel, maybe a dependency or two, and then a bunch of wheels from the internet.

Concretely, we'd use this feature to replace just the 1mb of jedi (and parso, by just _not loading it).

share more of the implementation.

There's some more detail over on pyodide/pyodide#3573

Spitballing if micropip owned the feature, something like this, on the server/build machine:

pip install micropip

installs micropip and maybe packaging, but not the whole pyodide-build stack, or even pydantic

Running, still in the server, in a directory that had my jedi-0.0.0-none-any.whl:

import micropip, json

repodata_patch = micropip.repodata_patch([./"jedi-0.0.0-none-any.whl"])

with open('repodata-patch.json') as fd:
  json.dump(repodata_patch, fd)

At runtime, after starting pyodide, the data in repodata-pacch would be merged on top of the separately-configured and -loaded repodata.json data so that loadPackage, micropip.install, loadOnImport would all agree, this is, in fact, the jedi we are looking for, and the user could get to interactive computing faster.

@jtpio
Copy link
Member

jtpio commented Sep 15, 2023

As a comparison the Xeus Python kernel for JupyterLite offloads most of the dependency management and the creation of the environment to empack: https://github.com/emscripten-forge/empack

empack can be used to pack Wasm environments for the web so they can be used outside Jupyter too: https://github.com/emscripten-forge/sample-python-repl

Could there be something similar for Pyodide? So the Pyodide kernel does not have to deal with shims, mocks and all the complexity that comes with package management?

@bollwyvl
Copy link
Contributor Author

Yes, there have been a number of features added in micropip and pyodide-lock that would improve some of these things, but were not available until pydodide 0.24.0 without a full compiler stack. This complexity, inherited by the site owner, is probably just as onerous as the xeus-*/empack one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants