From fdf7c6d81544070aaf9ce63aaf34a73bd7eb2227 Mon Sep 17 00:00:00 2001 From: Phillip Mobley Date: Tue, 11 Feb 2025 14:34:17 -0500 Subject: [PATCH] Updated documentation to include packaging requirements (#526) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * ci: Pin twine in release workflow (#512) * ci: Pin twine in release workflow Signed-off-by: oliver könig * maybe fix? Signed-off-by: oliver könig * fix Signed-off-by: oliver könig --------- Signed-off-by: oliver könig Signed-off-by: Phillip Mobley * ci: Version bump to 0.7.0rc1.dev0 (#513) Signed-off-by: oliver könig Co-authored-by: oliver könig Signed-off-by: Phillip Mobley * Enforce Dataframe Backend Checks (#514) * Add module and to backend Signed-off-by: Ryan Wolf * Add backend tests Signed-off-by: Ryan Wolf * Fix tests Signed-off-by: Ryan Wolf * Add switch backend tests Signed-off-by: Ryan Wolf * Update modules to use module interface Signed-off-by: Ryan Wolf * Directly invoke module init Signed-off-by: Ryan Wolf * Fix call method Signed-off-by: Ryan Wolf * Fix shuffle call method Signed-off-by: Ryan Wolf * Add docs and more tests Signed-off-by: Ryan Wolf * Fix list formatting in docs Signed-off-by: Ryan Wolf * Address Sarah and Praateek's reviews Signed-off-by: Ryan Wolf * Fix modifier get_backend to backend Signed-off-by: Ryan Wolf * Address Ayush's review Signed-off-by: Ryan Wolf --------- Signed-off-by: Ryan Wolf Signed-off-by: Phillip Mobley * Updated documentation to include packaging requirements Signed-off-by: Phillip Mobley * Fixed formatting issues. Signed-off-by: Phillip Mobley Signed-off-by: Phillip Mobley * Enable ADD ID to work with CPU/GPU both (#479) * Enable ADD ID to work with CPU/GPU both Signed-off-by: Vibhu Jawa * Make Test runable in a CPU only environment Signed-off-by: Vibhu Jawa * Fix pytest skipping behavior in CPU/GPU environment Signed-off-by: Vibhu Jawa * Raise error instead of skipping test Signed-off-by: Vibhu Jawa --------- Signed-off-by: Vibhu Jawa Signed-off-by: Phillip Mobley * Add Pooling Strategy Option for embedding creation (#491) * Add pooling stratedgy Signed-off-by: Vibhu Jawa * Ensure pytest is importable in a CPU only environment Signed-off-by: Vibhu Jawa * Fix last token based on Avinash's feedback Signed-off-by: Vibhu Jawa * Fix indexing issues Signed-off-by: Vibhu Jawa * Merge in main Signed-off-by: Vibhu Jawa * Fix Doc-string Signed-off-by: Vibhu Jawa * Address Sarah's reviews Signed-off-by: Vibhu Jawa --------- Signed-off-by: Vibhu Jawa Signed-off-by: Phillip Mobley * Add Partition On Logic (#519) * add partition_on logic Signed-off-by: Vibhu Jawa * Add Docstring based on Sarah's review Signed-off-by: Vibhu Jawa * Apply Praateek's suggestion and skip test with using pytest.mark.gpu Signed-off-by: Vibhu Jawa * Apply Praateek's suggestion and force index=False Signed-off-by: Vibhu Jawa --------- Signed-off-by: Vibhu Jawa Signed-off-by: Phillip Mobley * Add improved cleaning methods from Nemotron-CC (#517) * Add improved cleaning features Signed-off-by: Ryan Wolf * Fix cleaning tests Signed-off-by: Ryan Wolf * Update documentation and CLI scripts Signed-off-by: Ryan Wolf * Address Sarah and Lawrence's reviews Signed-off-by: Ryan Wolf --------- Signed-off-by: Ryan Wolf Signed-off-by: Phillip Mobley * Update model nomenclature (#497) * Update model nomenclature Signed-off-by: Sarah Yurick * minor notebook grammar Signed-off-by: Sarah Yurick * add lawrence's suggestion Signed-off-by: Sarah Yurick --------- Signed-off-by: Sarah Yurick Signed-off-by: Phillip Mobley * small add_id backend fix (#525) Signed-off-by: Vibhu Jawa Signed-off-by: Phillip Mobley * benchmark readme updates (#508) * benchmark readme updates Signed-off-by: Lawrence Lane * benchmark image update Signed-off-by: Lawrence Lane * benchmark text update Signed-off-by: Lawrence Lane --------- Signed-off-by: Lawrence Lane Signed-off-by: Phillip Mobley * Removal logic for fuzzy / exact (no class abstraction) (#509) Signed-off-by: Phillip Mobley * ci: Limit unit-test duration (#534) Signed-off-by: oliver könig Signed-off-by: Phillip Mobley * Enforce Dataframe Backend Checks (#514) * Add module and to backend Signed-off-by: Ryan Wolf * Add backend tests Signed-off-by: Ryan Wolf * Fix tests Signed-off-by: Ryan Wolf * Add switch backend tests Signed-off-by: Ryan Wolf * Update modules to use module interface Signed-off-by: Ryan Wolf * Directly invoke module init Signed-off-by: Ryan Wolf * Fix call method Signed-off-by: Ryan Wolf * Fix shuffle call method Signed-off-by: Ryan Wolf * Add docs and more tests Signed-off-by: Ryan Wolf * Fix list formatting in docs Signed-off-by: Ryan Wolf * Address Sarah and Praateek's reviews Signed-off-by: Ryan Wolf * Fix modifier get_backend to backend Signed-off-by: Ryan Wolf * Address Ayush's review Signed-off-by: Ryan Wolf --------- Signed-off-by: Ryan Wolf * small add_id backend fix (#525) Signed-off-by: Vibhu Jawa Signed-off-by: Phillip Mobley * Removal logic for fuzzy / exact (no class abstraction) (#509) Signed-off-by: Phillip Mobley --------- Signed-off-by: oliver könig Signed-off-by: Phillip Mobley Signed-off-by: Ryan Wolf Signed-off-by: Vibhu Jawa Signed-off-by: Sarah Yurick Signed-off-by: Lawrence Lane Co-authored-by: oliver könig Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ryan Wolf Co-authored-by: Vibhu Jawa Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Co-authored-by: L.B. Co-authored-by: Praateek Mahajan --- README.md | 1 + docs/user-guide/image/gettingstarted.rst | 1 + 2 files changed, 2 insertions(+) diff --git a/README.md b/README.md index 55c54faf2..e5fc03c0c 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ This section explains how to install NeMo Curator and use the Python library, Py Before installing NeMo Curator, ensure that the following requirements are met: - Python 3.10 or higher + - packaging >= 22.0 - Ubuntu 22.04/20.04 - NVIDIA GPU (optional) - Volta™ or higher ([compute capability 7.0+](https://developer.nvidia.com/cuda-gpus)) diff --git a/docs/user-guide/image/gettingstarted.rst b/docs/user-guide/image/gettingstarted.rst index 49248bc70..075b30404 100644 --- a/docs/user-guide/image/gettingstarted.rst +++ b/docs/user-guide/image/gettingstarted.rst @@ -13,6 +13,7 @@ Install NeMo Curator To install the image curation modules of NeMo Curator, ensure you meet the following requirements: * Python 3.10 or higher + * packaging >= 22.0 * Ubuntu 22.04/20.04 * NVIDIA GPU * Volta™ or higher (compute capability 7.0+)