Skip to content

Releases: lnx-search/lnx

🐛 0.5.1 - Fix sorting

31 Aug 19:18
Compare
Choose a tag to compare

This is a simple bug fix for the date field sorting.

🚀 0.5.0 - QOL document handling changes

28 Aug 13:35
0d16c72
Compare
Choose a tag to compare

This update is a breaking change from 0.4.0 in the area of removing documents.

What's new

  • Multivalue and single-value inserts are now supported. Things like {"title": "foo"} are supported now.
  • Deleting documents now expect single-value entries not multi-value. This doesn't change the behaviour due to only the first value in the old version which leads to some confusing behaviour.
  • Lax values are now supported, so things like date fields can be any of an i64 timestamp, u64 timestamp or an RFC 3339 formatted string.
  • If a field has an incompatible value according to the schema and cannot be converted, an error is returned instead of erroring in logs but returning 200 OK.

What's fixed

  • Date fields are now correctly handled on upload.
  • The writer-actor no longer panics if an invalid type is given to the document which is different to the schema defined type.

🐛 0.4.1 - Schema behavour fix

27 Aug 20:50
de23872
Compare
Choose a tag to compare

This is a small fix that prevents the un-expected behaviour when fast-fuzzy is disabled on the server but attempting to be enabled
on the index.

Fixed

  • This fixes situations like #14 .

🚀 0.4.0 - Fast-Fuzzy update

26 Aug 12:33
Compare
Choose a tag to compare

0.4 is out! This brings with it a massive set of performance improvements and relevancy options to tune to your linking 😄

What's new

  • Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). This is an opt-in feature via --enable-fast-fuzzy and then the "use_fast_fuzzy": true on index creation payload.
  • Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like the etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This can be toggled on a per-index basis using strip_stop_words defaults to false)

Breaking behaviour

  • The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
  • The system uses much higher memory when fast-fuzzy is enabled.

Notes on relevancy

  • The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.

Details for nerds 🤓

  • We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
  • The frequency dictionaries are made from traditional word dictionaries and the google n-gram corpus, merging these two gives us correctly spelt frequency dicts.
  • The jump in performance is roughly from 400 searches a second to 4000 searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).

🖖 Experminal Release

25 Aug 23:19
Compare
Choose a tag to compare
Pre-release

This is the first experimental release of 0.4 this includes a couple of breaking changes and some new features.

WARNING: This is an experimental build, and should not be relied upon for production systems

What's new

  • Fast-Fuzzy: A hyper optimised mode for search as you type experiences, this uses pre-computational spell-correction for high-speed correction, this improves performance by about 10x (both throughput and latency). (This is an opt-in feature via --enable-fast-fuzzy and then the "use_fast_fuzzy": true on index creation payload.)
  • Stop words: This was introduced to try to increase the search relevancy, before the system would be matching 17,000 results out of 20,000 just because you included words like the etc... Now if the system detects more than 1 word, providing that they are not all stop words; Any stop words will be removed from the query. (This is currently not changeable)

Breaking behaviour

  • The system on both fast-fuzzy and more-like-this queries will have a much different performance characteristic now where some common results you might use for testing will now be invalid.
  • The system uses much higher memory when fast-fuzzy is enabled.

Notes on relevancy

  • The fast fuzzy system is almost at the same level as the current default (Levenshtein distance) if not maybe a little better in places, especially in non-English languages.

Details for nerds 🤓

  • We used the symspell algorithm along with pre-computed frequency dictionaries to do spell correction over Levenshtein distance which corrects entire sentences in the time it takes the traditional method to do one word.
  • The frequency dictionaries are made from traditional word dictionaries and the google ngram corpus, merging these two gives us correctly spelt frequency dicts.
  • The jump in performance is roughly from 400 searches a second to 4000 searches a second (this was done on the small movies dataset, a larger dataset with around 2 million documents was also used which produced a similar growth in performance).

Version 0.3.0

20 Aug 14:49
Compare
Choose a tag to compare

What's changed

  • Getting a document directly has been changed from an int-int format to being a singular u64 integer, this is returned as a string for compatibility with languages where overflowing may occur when parsing (JS). This type of change has also be reflected in any other areas where you supply the document id.
  • Searching via mode=more-like-this has changed from expecting a query name called ref_document to just document and expects a document id.
  • Returned results now return a document_id instead of ref_address to make it easier to understand its purpose.
  • Specialised field name has been added _id if you define this in your schema, the system will ignore it and add its own.

What's been fixed

  • Searching by document / more-like-this queries now work, before this would lead to a panic if the searcher wasn't the original searcher that retrieved the document.
  • Getting a document directly no longer panics.

🚀 First Release

18 Aug 21:39
9643294
Compare
Choose a tag to compare

The first release of lnx!

This release includes:

  • Standard Queries, Fuzzy Queries, More-like-this queries.
  • Token-based authorization.
  • TLS support.
  • Multiple storage backend choices
  • Order by sorting

Beta Build

18 Aug 08:57
18064ec
Compare
Choose a tag to compare

This is the base implementation of lnx, thins are STC but this adds as a good baseline.