Skip to content

Commit

Permalink
Updated documentation to include packaging requirements (#526)
Browse files Browse the repository at this point in the history
* ci: Pin twine in release workflow (#512)

* ci: Pin twine in release workflow

Signed-off-by: oliver könig <[email protected]>

* maybe fix?

Signed-off-by: oliver könig <[email protected]>

* fix

Signed-off-by: oliver könig <[email protected]>

---------

Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* ci: Version bump to 0.7.0rc1.dev0 (#513)

Signed-off-by: oliver könig <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Enforce Dataframe Backend Checks (#514)

* Add module and to backend

Signed-off-by: Ryan Wolf <[email protected]>

* Add backend tests

Signed-off-by: Ryan Wolf <[email protected]>

* Fix tests

Signed-off-by: Ryan Wolf <[email protected]>

* Add switch backend tests

Signed-off-by: Ryan Wolf <[email protected]>

* Update modules to use module interface

Signed-off-by: Ryan Wolf <[email protected]>

* Directly invoke module init

Signed-off-by: Ryan Wolf <[email protected]>

* Fix call method

Signed-off-by: Ryan Wolf <[email protected]>

* Fix shuffle call method

Signed-off-by: Ryan Wolf <[email protected]>

* Add docs and more tests

Signed-off-by: Ryan Wolf <[email protected]>

* Fix list formatting in docs

Signed-off-by: Ryan Wolf <[email protected]>

* Address Sarah and Praateek's reviews

Signed-off-by: Ryan Wolf <[email protected]>

* Fix modifier get_backend to backend

Signed-off-by: Ryan Wolf <[email protected]>

* Address Ayush's review

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Updated documentation to include packaging requirements

Signed-off-by: Phillip Mobley <[email protected]>

* Fixed formatting issues. Signed-off-by: Phillip Mobley <[email protected]>

Signed-off-by: Phillip Mobley <[email protected]>

* Enable ADD ID to work with CPU/GPU both (#479)

* Enable ADD ID to work with CPU/GPU both

Signed-off-by: Vibhu Jawa <[email protected]>

* Make Test runable in a CPU only environment

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix pytest skipping behavior in CPU/GPU environment

Signed-off-by: Vibhu Jawa <[email protected]>

* Raise error instead of skipping test

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Add Pooling Strategy Option for embedding creation (#491)

* Add pooling stratedgy

Signed-off-by: Vibhu Jawa <[email protected]>

* Ensure pytest is importable in a CPU only environment

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix last token based on Avinash's feedback

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix indexing issues

Signed-off-by: Vibhu Jawa <[email protected]>

* Merge in main

Signed-off-by: Vibhu Jawa <[email protected]>

* Fix Doc-string

Signed-off-by: Vibhu Jawa <[email protected]>

* Address Sarah's reviews

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Add Partition On Logic  (#519)

* add partition_on logic

Signed-off-by: Vibhu Jawa <[email protected]>

* Add Docstring based on Sarah's review

Signed-off-by: Vibhu Jawa <[email protected]>

* Apply Praateek's suggestion and skip test with using pytest.mark.gpu

Signed-off-by: Vibhu Jawa <[email protected]>

* Apply Praateek's suggestion and force index=False

Signed-off-by: Vibhu Jawa <[email protected]>

---------

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Add improved cleaning methods from Nemotron-CC (#517)

* Add improved cleaning features

Signed-off-by: Ryan Wolf <[email protected]>

* Fix cleaning tests

Signed-off-by: Ryan Wolf <[email protected]>

* Update documentation and CLI scripts

Signed-off-by: Ryan Wolf <[email protected]>

* Address Sarah and Lawrence's reviews

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Update model nomenclature (#497)

* Update model nomenclature

Signed-off-by: Sarah Yurick <[email protected]>

* minor notebook grammar

Signed-off-by: Sarah Yurick <[email protected]>

* add lawrence's suggestion

Signed-off-by: Sarah Yurick <[email protected]>

---------

Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* small add_id backend fix (#525)

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* benchmark readme updates (#508)

* benchmark readme updates

Signed-off-by: Lawrence Lane <[email protected]>

* benchmark image update

Signed-off-by: Lawrence Lane <[email protected]>

* benchmark text update

Signed-off-by: Lawrence Lane <[email protected]>

---------

Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Removal logic for fuzzy / exact (no class abstraction) (#509)

Signed-off-by: Phillip Mobley <[email protected]>

* ci: Limit unit-test duration (#534)

Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Enforce Dataframe Backend Checks (#514)

* Add module and to backend

Signed-off-by: Ryan Wolf <[email protected]>

* Add backend tests

Signed-off-by: Ryan Wolf <[email protected]>

* Fix tests

Signed-off-by: Ryan Wolf <[email protected]>

* Add switch backend tests

Signed-off-by: Ryan Wolf <[email protected]>

* Update modules to use module interface

Signed-off-by: Ryan Wolf <[email protected]>

* Directly invoke module init

Signed-off-by: Ryan Wolf <[email protected]>

* Fix call method

Signed-off-by: Ryan Wolf <[email protected]>

* Fix shuffle call method

Signed-off-by: Ryan Wolf <[email protected]>

* Add docs and more tests

Signed-off-by: Ryan Wolf <[email protected]>

* Fix list formatting in docs

Signed-off-by: Ryan Wolf <[email protected]>

* Address Sarah and Praateek's reviews

Signed-off-by: Ryan Wolf <[email protected]>

* Fix modifier get_backend to backend

Signed-off-by: Ryan Wolf <[email protected]>

* Address Ayush's review

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>

* small add_id backend fix (#525)

Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>

* Removal logic for fuzzy / exact (no class abstraction) (#509)

Signed-off-by: Phillip Mobley <[email protected]>

---------

Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Phillip Mobley <[email protected]>
Signed-off-by: Ryan Wolf <[email protected]>
Signed-off-by: Vibhu Jawa <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
Co-authored-by: oliver könig <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ryan Wolf <[email protected]>
Co-authored-by: Vibhu Jawa <[email protected]>
Co-authored-by: Sarah Yurick <[email protected]>
Co-authored-by: L.B. <[email protected]>
Co-authored-by: Praateek Mahajan <[email protected]>
  • Loading branch information
8 people authored Feb 11, 2025
1 parent 334a331 commit fdf7c6d
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ This section explains how to install NeMo Curator and use the Python library, Py
Before installing NeMo Curator, ensure that the following requirements are met:

- Python 3.10 or higher
- packaging >= 22.0
- Ubuntu 22.04/20.04
- NVIDIA GPU (optional)
- Volta™ or higher ([compute capability 7.0+](https://developer.nvidia.com/cuda-gpus))
Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/image/gettingstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Install NeMo Curator
To install the image curation modules of NeMo Curator, ensure you meet the following requirements:

* Python 3.10 or higher
* packaging >= 22.0
* Ubuntu 22.04/20.04
* NVIDIA GPU
* Volta™ or higher (compute capability 7.0+)
Expand Down

0 comments on commit fdf7c6d

Please sign in to comment.