Welcome to the Apache Hudi community! We appreciate your interest in contributing to this open-source data lake platform. This guide will walk you through the process of making your first contribution.
If you are new to the project, we recommend starting with issues listed in https://github.com/apache/hudi-rs/contribute.
Testing and reporting bugs are also valueable contributions. Please follow the issue template to file bug reports.
All issues tagged for a release can be found in the corresponding milestone page, see https://github.com/apache/hudi-rs/milestones.
Features, bugs, and p0
issues that are targeting the next release can be found in
this project view. Pull requests won't be tracked in the project
view, instead, they will be linked to the corresponding issues.
- Install Rust, e.g. as described here
- Have a compatible Python version installed (check
python/pyproject.toml
for current requirement)
For most of the time, use dev commands specified in python/Makefile
, it applies to both Python
and Rust modules. You don't need to cd
to the root directory and run cargo
commands.
To setup python virtual env, run
make setup-venv
Note
This will run python
command to setup the virtual environment. You can either change that to python3.X
,
or simply alias python
to your local python3.X
installation, for example:
echo "alias python=/Library/Frameworks/Python.framework/Versions/3.12/bin/python3" >> ~/.zshrc`
Once activate virtual env, build the project for development by
make develop
This will install hudi
dependency built from your local repo to the virtual env.
For Rust,
# For all tests
make test-rust
# or
cargo test --workspace
# For all tests in a crate / package
cargo test -p hudi-core
# For a specific test case
cargo test -p hudi-core table::tests::hudi_table_get_schema
For Python,
# For all tests
make test-python
# or
pytest -s
# For a specific test case
pytest tests/test_table_read.py -s -k "test_read_table_has_correct_schema"
Run test commands to make sure the code is working as expected:
make test
Run check commands and follow the suggestions to fix the code:
make check
When submitting a pull request, please follow these guidelines:
- Title Format: The pull request title must follow the format outlined in
the conventional commits spec. This is a standardized format for commit
messages, and also allows us to auto-generate change logs and release notes. Since only the
main
branch requires this format, and we always squash commits and then merge the PR, incremental commits' messages do not need to conform to it. - Line Count: A general guideline is to keep the PR's diff, i.e., max(added lines, deleted lines), less than 1000 lines. Keeping PRs concise makes it easier for reviewers to thoroughly examine changes without experiencing fatigue. If your changes exceed this limit, consider breaking them down into smaller, logical PRs that address specific aspects of the feature or bug fix.
- Coverage Requirements: All new features and bug fixes must include appropriate unit tests to ensure functionality and prevent regressions. Tests should cover both typical use cases and edge cases. Ensure that new tests pass locally before submitting the PR.
- Code Comments: Properly designed APIs and code should be self-explanatory and make in-code comments redundant. In case that complex logic or non-obvious implementations are absolutely unavoidable, please add comments to explain the code's purpose and behavior.
We use codecov to generate code coverage report and enforce code coverage rate. See codecov.yml for the configuration.
To help with contributing to the project, please explore Hudi's documentation for further learning.
We expect all community members to follow our Code of Conduct.