v0.20.0
What's Changed
Breaking Changes 🛠
- feat!: allow passing down existing dataset for write by @wjones127 in #3119
- fix!: low recall with cosine/dot on v3 index types by @BubbleCal in #3141
New Features 🎉
- feat: start recording index details in the mainifest, cache index type lookup by @westonpace in #3131
- feat: make dataset version serializable by @albertlockett in #3143
- feat: support 4bit PQ on new IVF_PQ by @BubbleCal in #3144
- feat: add
commit_batch
API by @wjones127 in #3142 - feat: allow async stream for writing and appending to a dataset by @HoKim98 in #3146
- feat: add dictionary encoding by @broccoliSpicy in #3134
- feat(rust): make JSON serialization of DataType and Field public by @wjones127 in #3161
- feat: expose the table provider by @westonpace in #3162
- feat: support write multi fragments or empty fragment in one spark task by @SaintBacchus in #3183
- feat: add drop to dataset by @chenkovsky in #3184
- feat: upgrade arrow (to 53) & datafusion (to 42) by @westonpace in #3201
Bug Fixes 🐛
- fix: fix error about schema is not writable pd to pa by @Jay-ju in #3109
- fix: handle filter on empty partition by @eddyxu in #3151
- fix: fix dynamodb drop table by @LuQQiu in #3152
- fix: full text search index broken after optimize_indices() by @BubbleCal in #3145
- fix: fix performance regression introduced during reader refactor by @westonpace in #3170
- fix: panic if all docs are deleted in a posting list by @BubbleCal in #3163
- fix: full text search may produce dup results when search over multiple columns by @BubbleCal in #3189
- fix: fix typing for _write_fragment by @chenkovsky in #3171
- fix: fix storage options for dataset builder by @chenkovsky in #3156
- fix: fix storage options for ray by @chenkovsky in #3164
Performance Improvements 🚀
- perf: optimize reading transactions in commit loop by @wjones127 in #3117
- perf: improve PQ computing distances by @BubbleCal in #3150
- perf: improve constructing dist table by @BubbleCal in #3155
- perf: improve dot distance computing by @BubbleCal in #3169
Other Changes
- refactor: remove the queue in LanceArrowWriter to reduce memory usage for spark sink by @SaintBacchus in #3110
New Contributors
- @Jay-ju made their first contribution in #3109
- @chenkovsky made their first contribution in #3171
- @imotai made their first contribution in #3078
- @yanghua made their first contribution in #3193
Full Changelog: v0.19.2...v0.20.0