Releases · lancedb/lance

04 May 18:28

v0.4.5

2972ae2

v0.4.5 Preview private API for merging columns

Welcome @Mause as our newest contributor! Also, a big thank you for your work on the duckdb extension framework.

In this release we added a preview of the feature to do distributed column additions. This makes it possible to distribute Lance Fragments across nodes, add a new column to each Fragment, and then write a new Lance dataset version manifest with the updated schema and files.

What's Changed

add support for aws profile by @Renkai in #807
Upgrade Arrow to 37 by @changhiskhan in #810
Schema intersection by @eddyxu in #814
Add a check to make sure field names don't contain periods by @changhiskhan in #816
fix(docs): correct link to docs.rs by @Mause in #819
update arrow version in duckdb extension by @changhiskhan in #817
Do not use lifetime on FileWriter by @eddyxu in #820
Setting field ID after merging the fields. by @eddyxu in #821
[Rust] Project schema by schema by @eddyxu in #822
Merge batches from multiple datafiles in the same Fragment by @eddyxu in #815
Update README.md by @jaichopra in #809
[Python] Provide a private / distributed add column api in Python by @eddyxu in #823

New Contributors

@Mause made their first contribution in #819

Full Changelog: v0.4.4...v0.4.5

Contributors

eddyxu, changhiskhan, and 3 other contributors

Assets 2

25 Apr 20:53

changhiskhan

v0.4.4

5c550e1

v0.4.4 Various bug fixes

#805 fixed an integer overflow bug in the plain decoder that resulted in high latency for Take (and consequently high latency for the vector search). We'll be adding continuous performance benchmarks soon to prevent issues like this from being released in the future.

We also fixed a gap in cosine similarity where the vectors does not line up perfectly with SIMD strides on the platform.

DiskANN progress is continuing. First milestone will be an in-memory version to support smaller datasets. A compressed, disk-based version will follow soon after that.

What's Changed

Fix L2 simd benchmark by @eddyxu in #793
bugfix for dataset overwrite method by @gsilvestrin in #794
[Rust] Minor SIMD benchmark fix set minimal CPU target for AVX2 by @eddyxu in #795
Persist simple diskann index by @eddyxu in #787
Fix offset overflow in plain decoder by @eddyxu in #805
Fix cosine similarity when missing simd alignment by @changhiskhan in #808

Full Changelog: v0.4.3...v0.4.4

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

20 Apr 06:16

changhiskhan

v0.4.3

b5a7a68

v0.4.3 Bug fixes and code cleanup

What's Changed

[Rust] L2 distance on not aligned data by @eddyxu in #779
[Rust] Move L2 to linalg module by @eddyxu in #781
[Rust] Build DiskANN index by @eddyxu in #763
Refactor cosine distance into linalg module by @eddyxu in #786
google cloud storage fixes by @gsilvestrin in #782
Fix unaligned normalization bug on arm64 by @eddyxu in #789
Speed up vector index tests by reducing dataset size by @changhiskhan in #790

Full Changelog: v0.4.2...v0.4.3

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

14 Apr 17:57

changhiskhan

v0.4.2

2adbb2f

v0.4.2 Polars, GCS, and distributed lances

A warm welcome to @hzhang86 as Lance's newest contributor. Thanks for adding TPCH benchmarks for Lance to establish a baseline. This is really helpful for us to focus performance optimization roadmap.

This release is packed with valuable features:

Direct polars scan without needing to pull everything into memory is added.
We expose FileFragment's to allow distributed processing engines like Spark to access parts of a Lance dataset easily
Last but not least, we've added support for reading Lance data directly from GS buckets

What's Changed

[Rust] FileReader read range API by @eddyxu in #752
Support direct polars scan by @changhiskhan in #755
[Rust] Persist graph using lance file format. by @eddyxu in #756
Refactor PQ and OPQ training function to make it usable widely by @eddyxu in #758
Matrix::centroids method by @eddyxu in #759
[Python] Set minimal version of Polars for python tests by @eddyxu in #765
[Rust] Refactor RecordBatchStream trait by @eddyxu in #766
[Rust] Expose DataFragment as pubilc dataset api. by @eddyxu in #769
Revert "[Python] Set minimal version of Polars for python tests (#765)" by @gsilvestrin in #770
add python script to compare lance performance vs parquet TPCH by @hzhang86 in #749
Expose index metadata by @changhiskhan in #768
Google Cloud Storage support. by @gsilvestrin in #773
[Python] Expose DataFragment via dataset by @eddyxu in #774
Get S3 credentials from_env by @changhiskhan in #775
Fix duckdb build by @eddyxu in #776
[Rust] A arrow kernel to compute hash value of the array. by @eddyxu in #777

New Contributors

@hzhang86 made their first contribution in #749

Full Changelog: v0.4.1...v0.4.2

Contributors

eddyxu, gsilvestrin, and 2 other contributors

Assets 2

05 Apr 21:30

changhiskhan

v0.4.1

ecc1d18

v0.4.1 Support Append in Vector Search

The vector search in Lance now supports live updates. Previously, when you added new vectors to the dataset, you would be required to rebuild the index. Now, the index is "inherited" and the vector search results are the combination of ANN search on the indexed data and KNN on the new Appended data. So there's a small latency increase and the recall should be the same or better.

This provides a smooth performance curve until you have inserted enough new data that re-indexing is warranted.

What's Changed

Adding secret to publish task by @gsilvestrin in #742
[Rust] make distance function to take slice instead of Float32Array by @eddyxu in #748
Vector search should support appending new rows by @changhiskhan in #593
windows lapack support by @gsilvestrin in #743
Fix LanceDataset.to_batches by @changhiskhan in #751

Full Changelog: v0.4.0...v0.4.1

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

30 Mar 22:22

changhiskhan

v0.4.0

2922f54

v0.4.0 Windows support

A warm welcome to @gsajko ! Thanks for making our tutorial notebook easier to use and understand!

Note: OPQ is disabled in windows for the vector index. This will be addressed once LAPACK support is added.

What's Changed

small fixes by @gsajko in #725
Windows support by @gsilvestrin in #724

New Contributors

@gsajko made their first contribution in #725

Full Changelog: v0.3.19...v0.4.0

Contributors

gsilvestrin and gsajko

Assets 2

27 Mar 17:58

changhiskhan

v0.3.19

8aa5345

v0.3.19 Bug fix for filter predicates on large-utf8 type

Also fix publishing to crates.io

What's Changed

Make contract clear for KNN nodes by @eddyxu in #729
Refactor Scan I/O plan by @eddyxu in #731
[Rust] Use folked sqlparser to unblock rust crate release by @eddyxu in #732
[Rust] Fix filter on large UTF8 columns by @eddyxu in #733

Full Changelog: v0.3.18...v0.3.19

Contributors

eddyxu

Assets 2

24 Mar 07:45

changhiskhan

v0.3.18

369850d

v0.3.18 Bug fix release for binary offsets

Fix for incorrect offset for string/variable list columns as reported in #720 (comment)

Thanks @lucazanna for the feedback!

What's Changed

Train OPQ and write rotation matrix to index file by @eddyxu in #713
removing warnings by @gsilvestrin in #721
[Bug] Fix IVF merge sort when refine factor is presented. by @eddyxu in #722
Add input / output schema contract to Global Take by @eddyxu in #728
Fix offsets for Binary/Lists/LargeLists by @gsilvestrin in #727

Full Changelog: v0.3.17...v0.3.18

Contributors

eddyxu, gsilvestrin, and lucazanna

Assets 2

22 Mar 02:05

changhiskhan

v0.3.17

ec02352

v0.3.17 Support for nested dict columns

A warm welcome to @haoxins , a new contributor who has helped improve Lance documentation.

This release adds support for list-of-dict columns (thanks @lucazanna for reporting the bug in #715).

Also included in this release are various vector index improvements for scalability and more progress towards OPQ implementation.

What's Changed

docs: fix the links by @haoxins in #701
repair macos build for duckdb extension by @changhiskhan in #705
filter evaluation with flat search by @changhiskhan in #704
fix flaky test by @changhiskhan in #706
[Bug] Fix transpose in MatrixView.data() by @eddyxu in #711
Refactored variable length encoders by @gsilvestrin in #710
add notebook for q&a bot by @changhiskhan in #707
Allow iteratively train PQ by @eddyxu in #712
Use relative eq and fix a compiling warning by @eddyxu in #714
docs: fix the mod path by @haoxins in #718
Composable vector search pipeline by @eddyxu in #716
Fix CI failure by increasing epsilon for test_train_pq_iteratively by @eddyxu in #719
Implement support for list of Dictionaries by @gsilvestrin in #664

New Contributors

@haoxins made their first contribution in #701

Full Changelog: v0.3.16...v0.3.17

Contributors

eddyxu, gsilvestrin, and 3 other contributors

Assets 2

18 Mar 06:48

changhiskhan

v0.3.16

27d36e8

v0.3.16 Filte pushdown improvements

Welcome @wangfenjin to lance contributors. Thanks for submitting a bug fix for the Lance DuckDB extensions 🔥

This release contains 2 workarounds for arrow limitations:

Lance datasets now support <field> LIKE '%' and <field> IN (<values>) filters to be passed in as string. Generic SQL syntax supported by datafusion is now accepted. This is a break from standard pyarrow Dataset behavior which only accepts arrow compute Expression, which is not present in rust and also does not support introspection in python for developers to build custom adapter.
When concatenating arrow dictionary arrays, the dict values are duplicated. There is currently no concrete plans to change this behavior in Arrow. Instead, we fix that at write time in Lance.

What's Changed

Changed encoders to handle multiple Arrays by @gsilvestrin in #681
Train kmeans iteratively by @eddyxu in #688
Changed writers to handle multiple Arrays by @gsilvestrin in #691
Streaming PQ by @eddyxu in #689
[Bug] PQ training generates empty centroids by @eddyxu in #693
Allow append mode even if dataset doesn't already exist by @ananis25 in #690
Support "LIKE" and "IN" in filters by @eddyxu in #696
fix typo by @wangfenjin in #697
Improve indexing performance by @eddyxu in #699
Compute PQ distortion. by @eddyxu in #695
Bugfix for BinaryEncoder positions by @gsilvestrin in #698

New Contributors

@wangfenjin made their first contribution in #697

Full Changelog: v0.3.15...v0.3.16

Contributors

eddyxu, gsilvestrin, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: lancedb/lance

v0.4.5 Preview private API for merging columns

What's Changed

New Contributors

Contributors

v0.4.4 Various bug fixes

What's Changed

Contributors

v0.4.3 Bug fixes and code cleanup

What's Changed

Contributors

v0.4.2 Polars, GCS, and distributed lances

What's Changed

New Contributors

Contributors

v0.4.1 Support Append in Vector Search

What's Changed

Contributors

v0.4.0 Windows support

What's Changed

New Contributors

Contributors

v0.3.19 Bug fix for filter predicates on large-utf8 type

What's Changed

Contributors

v0.3.18 Bug fix release for binary offsets

What's Changed

Contributors

v0.3.17 Support for nested dict columns

What's Changed

New Contributors

Contributors

v0.3.16 Filte pushdown improvements

What's Changed

New Contributors

Contributors