0.6.8 (2023-12-14)
- Produce builds for Python 3.12 (#236)
- Add a simple configuration API
- Add surface projections (#230)
- For chiTra compatibility SudachiPy can now directly produce different tokens in the surface field.
- Original surface is accessible via
Morheme.raw_surface()
method - It is possible to customize projection dictionary-wise, via Config object, passing it on a dictionary creation, or for a single pre-tokenizer.
0.6.7 (2023-02-16)
- Provide binary wheels for Python 3.11
- Add
Dictionary.lookup()
method which allows you to enumerate morphemes from the dictionary without performing analysis.
0.6.6 (2022-07-25)
- Add boundary matching mode to regex oov handler
- macOS binary builds are now unversal2 (arm+x64)
- Binary builds are universal2
- Caveat: we don't run tests on arm because there are no public arm instances, so builds may be broken without any warning
0.6.5 (2022-06-21)
- Fixed invalid POS tags which appeared when using user-defined POS tags both in user dictionaries and OOV handlers. You are not affected by this bug if you did not use user-defined POS in OOV handlers.
0.6.4 (2022-06-16)
- Remove Python 3.6 support which reached end-of-life status on 2021-12-23
- OOV handler plugins support user-defined POS, similar to Java version
- Added Regex OOV handler
- For details, see Java version changelog
- In Rust/Python Regexes do not support backtracking and backreferences
maxLength
setting defines maximum length in unicode codepoints, not in utf-8 bytes as in Java (will be changed to codepoints later)
0.6.3 (2022-02-10)
- Fixed path resolution algorithm for resources. They are now resolved in the following order (first existing file wins):
- Absolute paths stay as they are
- Relative to "path" value of the config file
- Relative to "resource_dir" parameter of the config object during creation
- For SudachiPy it is the parameter of
Dictionary
constructor
- For SudachiPy it is the parameter of
- Relative to the location of the configuration file
- Relative to the current directory
Dictionary
now has__repr__()
function which displays absolute paths to dictionaries in use.Dictionary
now haspos_of()
function which returns a POS tuple for a given POS id.PosMatcher
supports set operations- union (
m1 | m2
) - intersection (
m1 & m2
) - difference (
m1 - m2
) - negation (
~m1
)
- union (
0.6.2 (2021-12-09)
- Fix analysis differences with 0.5.4
0.6.1 (2021-12-08)
- Added Fuzzing (see
sudachi-fuzz
subdirectory), Sudachi.rs seems to be pretty robust towards arbitrary inputs (no crashes and panics)- Issues like WorksApplications#182 should never occur more
- ~5% analysis speed improvement over 0.6.0
- Added support for Unicode combining symbols, now Sudachi.rs/py should be much better with emoji (🎅🏾) and more complex Unicode (İstanbul)
- Added partial dictionary read functionality, it is now possible to skip reading certain fields if they are not needed
- Improved startup times, especially for debug builds
- See Python changelog
0.6.0 (2021-11-11)
- Full feature parity with Java version
- ~15% analysis speed improvement over 0.6.0-rc1
- Added dictionary build functionality
- Added an option to perform analysis without sentence splitting
- Use it with
--split-sentences=no
- Use it with
- Added bindings for dictionary build (undocumented and not supported as API).
sudachipy build
andsudachipy ubuild
should work once more- Report on build times and dictionary part sizes can differ from the original SudachiPy
0.6.0-rc1 (2021-10-26)
- First release of Sudachi.rs
- SudachiPy compatible Python bindings
- ~30x speed improvement over original SudachiPy
- Dictionary build mode will be done before 0.6.0 final (See #13)
- Analysis: feature parity with Python and Java version
- Dictionary build is not supported in rc1
- ~2x faster than Java version (with sentence splitting)
- No public API at the moment (contact us if you want to use Rust version directly, internals will significantly change and names are not finalized)
- Mostly compatible with SudachiPy 0.5.4
- We provide binary wheels for popular platforms
- ~30x faster than 0.5.4
- IgnoreYomigana input text plugin is now supported (and enabled by default)
- We provide binary wheels for convenience (and additional speed on Linux)
- List of deprecated SudachiPy API:
MorphemeList.empty(dict: Dictionary)
- This also needs a dictionary as an argument.
Morpheme.split(mode: SplitMode)
Morpheme.get_word_info()
- Most of instance attributes are not exported: e.g.
Dictionary.grammar
,Dictionary.lexicon
.- See API reference page for supported APIs.
- Dictionary Build is not supported:
sudachipy build
andsudachipy ubuild
will not work, please use 0.5.3 in another virtual environment for the time being until the feature is implemented: #13