Releases: iterative/datachain
Releases · iterative/datachain
0.3.1
What's Changed
- Fix typo in
filter
method docstings by @mnrozhkov in #250 - Skip if not SQLite Improvements by @dtulga in #254
- Autodetect Studio branch by @dreadatour in #253
- Autodetect Studio branch fix for 'main' branch by @dreadatour in #257
- Removing
metastore
argument fromClient.parse_url()
by @ilongin in #256 - Autodetect Studio branch fix for 'main' branch by @dreadatour in #258
- Parallel UDF optimizations by @dreadatour in #211
Full Changelog: 0.3.0...0.3.1
0.3.0
0.2.18
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #238
- Adding
DataChain.column(...)
and fixing functions and types by @ilongin in #226
Full Changelog: 0.2.17...0.2.18
0.2.17
What's Changed
- Update readme by @dmpetrov in #233
- Arrow nrows fix by @dberenbaum in #221
- only combine final step for limit by @mattseddon in #230
- Fix renaming object or normal signal with
.mutate()
by @ilongin in #217 - Fixing too many files open, and adding reconnect by @dtulga in #229
Full Changelog: 0.2.16...0.2.17
0.2.16
What's Changed
- improve efficiency of examples by @mattseddon in #214
- fix select then distinct chain by @mattseddon in #213
- rename DataChain's create_empty to from_records by @mattseddon in #215
- do not modify datachain max limit in show by @mattseddon in #225
- Rename cleanup_temp_tables to cleanup_tables in warehouse and catalog by @amritghimire in #218
Full Changelog: 0.2.15...0.2.16
0.2.15
What's Changed
- Arrow improvements by @dberenbaum in #126
- prevent cryptic error messages when running llm claude examples by @mattseddon in #194
- remove reference to missing notebook by @mattseddon in #193
- Fix for nested lists of models in schema by @ilongin in #195
- Support for
DataChain.batch_map()
by @dberenbaum in #191 - renamed clip.py by @dberenbaum in #201
- validation error handling improved by @volkfox in #203
- modified Mistral prompt and changed to the DataModel by @volkfox in #205
- make datachain show respect existing limit by @mattseddon in #206
- JSON tutorial by @volkfox in #207
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #197
- Move 'create_pre_udf_table' function to warehouse module by @dreadatour in #187
New Contributors
- @pre-commit-ci made their first contribution in #197
Full Changelog: 0.2.14...0.2.15
0.2.14
What's Changed
- Fixing test warnings by @dtulga in #158
- Update DataChain.subtract() to work without legacy file signals by @rlamy in #157
- Optimizations: low-hanging fruits by @dreadatour in #178
- from_values with array of arrays by @dmpetrov in #183
- remove collect_one from example by @mattseddon in #186
- fixing a missed import for codegen schemas by @volkfox in #184
- avoid instantiating filesystem for path operations by @skshetry in #176
- Remove get_possibly_stale_jobs from metastore by @amritghimire in #189
New Contributors
- @amritghimire made their first contribution in #189
Full Changelog: 0.2.13...0.2.14
0.2.13
What's Changed
- DataChain.from_storage: add last_modified and is_latest to the columns by @skshetry in #165
- fix for using new column from
.mutate()
in.order_by()
by @ilongin in #171 - Renaming
File.write()
toFile.save()
by @ilongin in #172 - storage: index as a dir if no glob by @shcheklein in #108
Docs
- first shot at LLM eval tutorial by @volkfox in #145
- Update README.rst by @volkfox in #161
- docs: update README.rst by @eltociear in #168
Maintenance
- skip mypy hook on pre-commit.ci by @skshetry in #164
- ci: disable azure and gs remote tests on macOS by @skshetry in #174
- ci: run s3 tests on Windows, be more careful while skipping by @skshetry in #175
- fix test for ch wrt datetime precision by @skshetry in #169
- Adding tests for exporting image files and
File.write()
by @ilongin in #149
New Contributors
- @eltociear made their first contribution in #168
Full Changelog: 0.2.12...0.2.13
0.2.12
What's Changed
- Python API to manage the dataset registry by @dreadatour in #29
- cli: hide subcommands from the listing by @skshetry in #79
- datachain: rename include_sys kwarg to sys by @skshetry in #69
- Adding
DataChain.export_files(...)
by @ilongin in #30 - Update cv tutorial:
fashion_product_images
by @mnrozhkov in #62 - Add and clean up docstrings in datachain api by @dberenbaum in #63
- docs: fix invalid python code inside docstrings by @skshetry in #85
- Hide traceback for xfails in Studio test runs by @rlamy in #87
- Rename UDF to UDFStep for clarity, and remove from root namespace by @rlamy in #88
- Fix mutate() by @dmpetrov in #78
- update pytest-servers to 0.5.5 by @mattseddon in #94
- Remove vendored-code-specific folders by @dtulga in #95
- Rename repository references to datachain by @dtulga in #93
- do not overwrite version with None in DatasetQuery constructor by @mattseddon in #92
- always include sys signals by @skshetry in #81
- Add more UniqueId fields by @rlamy in #90
- Added more generalize
SignalsSchema.;get_signals()
method instead ofget_file_signals(...)
by @ilongin in #86 - Added input params to
distinct()
by @ilongin in #96 - Fix for
order_by
with sub signals by @ilongin in #82 - Remove legacy signals in from_storage() by @rlamy in #72
- Updates to examples by @dberenbaum in #77
- More docs updates by @dberenbaum in #100
- Add 'update' param to DataChain.from_storage method by @dreadatour in #99
- Fix repository reference in Notebook by @dtulga in #105
- fix(ux): remove reference to DatasetQuery by @shcheklein in #104
- datachain: implement to_parquet by @skshetry in #97
- File refactor by @dberenbaum in #102
- fixing regressions from switching to ModelStore.add() by @volkfox in #109
- add ModelStore to top level imports by @dmpetrov in #112
- add truncate option to show and update default width of output by @mattseddon in #116
- merge/join: exclude sys signals by @skshetry in #120
- Added
descending
parameter toDataChain.order_by(...)
by @ilongin in #122 - remove get_value() from DataModel by @dmpetrov in #119
- Add file modes for binary/text by @dberenbaum in #107
- remove docstring from DataModel.pydantic__init_subclass by @skshetry in #123
- Examples cleanup by @dberenbaum in #111
- rename ModelStore.add() to register() by @dmpetrov in #113
- datachain: generalize data access functions into collect(), and collect_flatten by @skshetry in #121
- Add nrows for partial parsing of csv/parquet by @dberenbaum in #124
- Update index.md by @volkfox in #128
- Picture for getting started by @volkfox in #127
- moving pic to the right place by @volkfox in #131
- cleanup signal refs in examples by @dberenbaum in #129
- cleanup api reference index by @dberenbaum in #130
- Fix for text and images files export by @ilongin in #135
- update computer vision quick start example by @mattseddon in #136
- update computer vision image example by @mattseddon in #139
- Huggingface test updates and bug fix by @dberenbaum in #140
- Readme update by @dmpetrov in #133
- readme: fix link to image by @dmpetrov in #143
- Update badge by @skshetry in #144
- don't depend on datachain from PATH to exec processes by @skshetry in #118
- dc: try to fix dataset_stats for DataChain.from_storage() generated dataset by @skshetry in #151
New Contributors
- @dreadatour made their first contribution in #29
- @mnrozhkov made their first contribution in #62
Full Changelog: 0.2.11...0.2.12
0.2.11
What's Changed
- cleanup model store/registry by @dberenbaum in #74
- slice nested signals by @dberenbaum in #75
- To pandas - hierarchical multi header by @dmpetrov in #22
- Use cloudpickle for parallel UDF processing by @dtulga in #65
Full Changelog: 0.2.10...0.2.11