Releases: lnx-search/lnx
Releases · lnx-search/lnx
🐛 0.5.1 - Fix sorting
This is a simple bug fix for the date field sorting.
🚀 0.5.0 - QOL document handling changes
This update is a breaking change from 0.4.0 in the area of removing documents.
What's new
- Multivalue and single-value inserts are now supported. Things like
{"title": "foo"}
are supported now. - Deleting documents now expect single-value entries not multi-value. This doesn't change the behaviour due to only the first value in the old version which leads to some confusing behaviour.
- Lax values are now supported, so things like
date
fields can be any of an i64 timestamp, u64 timestamp or an RFC 3339 formatted string. - If a field has an incompatible value according to the schema and cannot be converted, an error is returned instead of erroring in logs but returning 200 OK.
What's fixed
- Date fields are now correctly handled on upload.
- The writer-actor no longer panics if an invalid type is given to the document which is different to the schema defined type.
🐛 0.4.1 - Schema behavour fix
This is a small fix that prevents the un-expected behaviour when fast-fuzzy is disabled on the server but attempting to be enabled
on the index.
Fixed
- This fixes situations like #14 .
🚀 0.4.0 - Fast-Fuzzy update
0.4 is out! This brings with it a massive set of performance improvements and relevancy options to tune to your linking 😄
What's new
- Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). This is an opt-in feature via --enable-fast-fuzzy and then the "use_fast_fuzzy": true on index creation payload.
- Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like the etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This can be toggled on a per-index basis using
strip_stop_words
defaults tofalse
)
Breaking behaviour
- The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
- The system uses much higher memory when fast-fuzzy is enabled.
Notes on relevancy
- The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.
Details for nerds 🤓
- We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
- The frequency dictionaries are made from traditional word dictionaries and the google n-gram corpus, merging these two gives us correctly spelt frequency dicts.
- The jump in performance is roughly from 400 searches a second to 4000 searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).
🖖 Experminal Release
This is the first experimental release of 0.4 this includes a couple of breaking changes and some new features.
WARNING: This is an experimental build, and should not be relied upon for production systems
What's new
- Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). (This is an opt-in feature via
--enable-fast-fuzzy
and then the"use_fast_fuzzy": true
on index creation payload.) - Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like
the
etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This is currently not changeable)
Breaking behaviour
- The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
- The system uses much higher memory when fast-fuzzy is enabled.
Notes on relevancy
- The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.
Details for nerds 🤓
- We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
- The frequency dictionaries are made from traditional word dictionaries and the google ngram corpus, merging these two gives us correctly spelt frequency dicts.
- The jump in performance is roughly from
400
searches a second to4000
searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).
Version 0.3.0
What's changed
- Getting a document directly has been changed from an
int-int
format to being a singular u64 integer, this is returned as a string for compatibility with languages where overflowing may occur when parsing (JS). This type of change has also be reflected in any other areas where you supply the document id. - Searching via
mode=more-like-this
has changed from expecting a query name calledref_document
to justdocument
and expects a document id. - Returned results now return a
document_id
instead ofref_address
to make it easier to understand its purpose. - Specialised field name has been added
_id
if you define this in your schema, the system will ignore it and add its own.
What's been fixed
- Searching by document / more-like-this queries now work, before this would lead to a panic if the searcher wasn't the original searcher that retrieved the document.
- Getting a document directly no longer panics.
🚀 First Release
The first release of lnx!
This release includes:
- Standard Queries, Fuzzy Queries, More-like-this queries.
- Token-based authorization.
- TLS support.
- Multiple storage backend choices
- Order by sorting
Beta Build
This is the base implementation of lnx, thins are STC but this adds as a good baseline.