Releases: lancedb/lance
Releases · lancedb/lance
v0.2.3 Bugfix release; breaks dataset proto schema
What's Changed
- [C++] Project schema via field Ids and Schema intersection by @eddyxu in #305
- when writing in batches, handle all na arrays properly by @changhiskhan in #306
- [C++] Use LanceFragment to build I/O exec plan by @eddyxu in #307
- [CI] Fix Github Action warning to upgrade nodejs 12 based actions by @eddyxu in #309
- Update README.md by @changhiskhan in #310
- Temporarily pin duckdb to 0.5.1 by @changhiskhan in #313
- Notebook for new blog post on versioning by @changhiskhan in #311
- [C++] Fix reading dictionary values from manifest files by @eddyxu in #314
Full Changelog: v0.2.2...v0.2.3
v0.2.2 Python notebooks and CV dataset conversion.
What's Changed
- [DOC] Update README.md by @jaichopra in #294
- [DUCKDB] Script to upload lance extension zip by @changhiskhan in #295
- [C++] Scan Node reads multiple files by @eddyxu in #300
- [Python] Add lance.util.duckdb to help install the extension transparently by @changhiskhan in #301
- [Python] Notebook fixes by @changhiskhan in #303
- [Python] Make dataset conversion a feature by @changhiskhan in #304
Full Changelog: v0.2.1...v0.2.2
v0.2.1 Bug fix release
Fixed bug affecting writes of fixed size list arrays as well as datagen code for Coco.
Updated to Arrow 10.0 newly released.
What's Changed
- remove duplicate test_mac.sh by @changhiskhan in #284
- Fix build on intel mac by @eddyxu in #286
- [C++] Fix write fixed list array bug by @eddyxu in #288
- Upgrade Apache Arrow to 10.0 by @eddyxu in #266
- temporary hack to fix pytorch loader until it can handle a versioned … by @changhiskhan in #293
- fix image_id alignment in coco datagen by @changhiskhan in #289
Full Changelog: v0.2.0...v0.2.1
v0.2.0 Dataset Versioning, DuckDB extension built with CUDA
Highlights
- Lance Dataset versioning support
- Duckdb Extension supports building against PyTorch with Cuda
- Revamp README and documentation.
What's Changed
- Fetch Dataset Versions by @eddyxu in #272
- Readability improvement for metadata class by @Renkai in #275
- [DuckDB] Enable DuckDB extension to build/run on CUDA-enabled PyTorch by @changhiskhan in #273
- [Python] Support multi-versioned dataset by @eddyxu in #278
- [Document] Add logo/README refresh by @jaichopra in #279
- [Python] Fetch dataset versions. by @eddyxu in #280
- [Python] Support group size and rows_per_file customization via write_dataset API by @eddyxu in #281
- [Python] use new write API in python benchmark by @eddyxu in #282
Full Changelog: v0.1.5...v0.2.0
v0.1.5 Pandas Extension Type, Jupyter Notebook and Document Improvements
What's Changed
- Add model inference notebook by @changhiskhan in #244
- update README.md to simplify comms, prior to blog post by @jaichopra in #248
- Jaichopra/rebrand lance by @jaichopra in #249
- Exclude jupyter notebook from github language stats by @changhiskhan in #251
- linguist fix by @changhiskhan in #253
- restore skipped test since extension types are working on mac by @changhiskhan in #256
- ingestion example by @changhiskhan in #252
- Update README.md by @jaichopra in #261
- [CI] pin arrow 9.0 in GHA by @eddyxu in #268
- Update README.md by @jaichopra in #264
- Pandas extension dtype for image by @changhiskhan in #267
- Reorganize tutorial notebooks by @changhiskhan in #265
- Merge two Schemas by @eddyxu in #263
- Versioning support with Appending Dataset by @eddyxu in #262
- Change datagen to use public https image urls by @changhiskhan in #271
Full Changelog: v0.1.4...v0.1.5
v0.1.4: Var-length binary decoder performance improvements, Open Discord server for community.
What's Changed
- CLI to inspect lance dataset by @eddyxu in #231
- Generate primary key for Oxford Pet dataset by @eddyxu in #233
- Fix datagen test by @eddyxu in #234
- Add discord link and fix typo in README by @eddyxu in #236
- Improve VarBinaryDecoder::Take performance by accumulating small batches by @eddyxu in #239
Full Changelog: v0.1.3...v0.1.4
Document improvements and bug fixes
v0.1.2
- Lance now supports projection for nested column (e.g., "annotations.name")
- There's also a fast path for CountRows to get the record count by looking at metadata
- Finally, Lance now supports writing optional key-value metadata (
pa.Table.schema.metadata
)
What's Changed
- GH release automation improvement by @changhiskhan in #212
- Refactor benchmark and dataset generation by @changhiskhan in #205
- install fsspec and s3fs by @changhiskhan in #213
- Add ArrayFromJSON and TableFromJSON help functions for easy testing. by @eddyxu in #214
- [C++] Merge two schemas by @eddyxu in #217
- Support nested column projection by @eddyxu in #220
- Implement CountRows for fast path of countings by @eddyxu in #221
- Write schema metadata to Manifest by @eddyxu in #222
Full Changelog: v0.1.1...v0.1.2
v0.1.1
Fix up Mac wheel to enable extension types for MacOS
What's Changed
- run quickstart notebook on every commit by @changhiskhan in #209
- update README.md to reflect new URI and bindings by @jaichopra in #211
- [Python] Exclude relocating packages in homebrew by @eddyxu in #210
New Contributors
- @jaichopra made their first contribution in #211
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Highlights
- Documentation is now live and a Quickstart Notebook is available
- Lance is now integrated with pytorch and supports multiple workers.
- Vision-specific extension types like Box2d provides vectorized iou and Image types that make it easy to perform IO and go between bytes, PIL, numpy, and tensors.
What's Changed
- Setting BatchSize via ScanBuilder by @eddyxu in #135
- Move Expression based schema project to Schema class by @eddyxu in #137
- Refactor I/O exec nodes by @eddyxu in #136
- Simplify RecordBatchReader to use Project.next() by @eddyxu in #139
- Convert bdd100k dataset in python benchmarks by @eddyxu in #131
- Fix the condition of Scan advancing batch id by @eddyxu in #143
- Initial PyTorch Dataset support by @eddyxu in #134
- Example training code over oxford pet dataset by @eddyxu in #144
- Test writing fixed size list and fixed size binary via WriteTable by @eddyxu in #151
- Fix fixed size length calculation by @eddyxu in #152
- Provide binary to profiling scans by @eddyxu in #149
- Multi-worker support in Pytorch Dataset by @eddyxu in #147
- Vision specific extension types by @changhiskhan in #146
- lance dataset that overrides Dataset.scanner and Dataset.head by @changhiskhan in #158
- Pickle Image by @changhiskhan in #160
- Only load manifest once within the dataset and share Manifest amount the readers by @eddyxu in #155
- Improve ergonomic of the Pytorch dataset and Generate embeddings for oxford pet by @eddyxu in #157
- Fix PlainEncoder to read empty page by @eddyxu in #164
- Convert coco annotations from the list of structs to struct of lists by @eddyxu in #166
- Convert coco bounding box format to [x0,y0,x1,y1] format. by @eddyxu in #169
- Image Array by @changhiskhan in #168
- Fix writing and reading extension type by @eddyxu in #172
- Coco improvements by @changhiskhan in #174
- Support partitioning and group size control in coco dataset generation. by @eddyxu in #175
- Extension type improvements to support 3d types by @changhiskhan in #173
- Support converting PIL from Image in pytorch Dataset by @eddyxu in #176
- Minor fix for 3d extension types by @changhiskhan in #177
- MS coco dataset training by @eddyxu in #163
- Change version import to relative import by @eddyxu in #181
- [python] Mix of minor improvements by @changhiskhan in #182
- Automatically build document and publish to Github Pages by @eddyxu in #180
- [benchmarks] simplify the datagen code and remove partitioning for now by @changhiskhan in #183
- Fix PlainDecoder handle empty filtered array by @eddyxu in #187
- [python] minor improvements by @changhiskhan in #190
- Fix bug that attempt to partitioned columns which does not exist in the file. by @eddyxu in #189
- Pass filter indices via Limit and Return empty array in GetListArray by @eddyxu in #191
- Exclude filter columns from projection by @eddyxu in #194
- action to bump version for new release by @changhiskhan in #199
- [C++] [BUG] Adjust offset when the batch size is set for reading by @eddyxu in #201
- GH action to upload wheels and also make reusable yml by @changhiskhan in #200
- [Python] Test projection in Python Torch Dataset by @eddyxu in #202
- Fix typo of calculating offsets for slicing index by @eddyxu in #206
- Changhiskhan/tutorial by @changhiskhan in #167
Full Changelog: v0.0.5...v0.1.0