Releases: EricLBuehler/mistral.rs
Releases Β· EricLBuehler/mistral.rs
v0.1.6
What's Changed
- Causal Masking and model selection from
.toml
files by @EricLBuehler in #278 - Remove sliding window mask from quantized phi3 by @EricLBuehler in #280
- Fix Causal Mask by @EricLBuehler in #282
- Fix mask caching by @EricLBuehler in #283
- More intelligent scheduler by @EricLBuehler in #279
- Use
warn!
macro by @EricLBuehler in #289 - Use a public repo for tests tokenizer.json by @EricLBuehler in #290
- Implement Speculative Decoding by @EricLBuehler in #242
- Add X-LoRA support for GGUF by @EricLBuehler in #293
- Add some "senseful" fallbacks for
isq
by @LLukas22 in #272 - Implement dynamic LoRA swapping by @EricLBuehler in #262
- More verbose logging when loading locally by @EricLBuehler in #298
- Make speculative decoding faster without anything fancy by @EricLBuehler in #297
- fix bug with mistralrs cuda by @joshpopelka20 in #299
New Contributors
- @joshpopelka20 made their first contribution in #299
New Features
- Speculative decoding introduced
- GGUF support for Phi 3
- Dynamic LoRA adapter activation support
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- Warmup pass for mistralrs-bench by @EricLBuehler in #270
- Fix short param conflict for LoRA by @EricLBuehler in #271
- Add build.rs to PyO3 to improve compat when extension_module by @EricLBuehler in #274
- Add the quantized phi3 model by @EricLBuehler in #276
Full Changelog: v0.1.4...v0.1.5
v0.1.4
What's Changed
- Major pipeline refactor by @EricLBuehler in #261
- docs: update README.md by @eltociear in #264
- Support EOF in interactive mode by @EricLBuehler in #267
- Fix concat in PhiRotaryEmbedding by @EricLBuehler in #268
- More organized config printing by @EricLBuehler in #269
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Add automatic pypi upload and docker build on release by @EricLBuehler in #255
- Update PyO3 to take dict by @EricLBuehler in #257
Full Changelog: v0.1.2...v0.1.3
v0.1.2
New features
- Initial
async
integrations (#198, #236) thanks to @lucasavila00. - More flexibility with
bos
andeos
tokens (#248) - Intermediate loading for ISQ models on CPU (#229)
- Fixed Phi 3 128k finally, it is fully working now! (#251)
Changelog
- Update README.md by @KPCOFGS in #224
- Fix api_dir_list! and show better error by @EricLBuehler in #225
- Default to
none
when cannot find token by @EricLBuehler in #226 - docs: update ADAPTER_MODELS.md by @eltociear in #227
- Fix debug log timing of first token by @lucasavila00 in #231
- Implement intermediate loading for ISQ on CPU by @EricLBuehler in #229
- Async sampling by @lucasavila00 in #198
- Fix quantized example by @lucasavila00 in #237
- Source bos, eos tokens from generation_config.json by @EricLBuehler in #243
- Sliding window for phi3 by @EricLBuehler in #244
- Fix docker images by @LLukas22 in #249
- Remove forced max seq len for llama models by @EricLBuehler in #250
- Fix Phi3 128k finally: use position ids to switch between short/long scaling by @EricLBuehler in #251
- Update README.md by @criminact in #253
- Async channels by @lucasavila00 in #236
New Contributors
- @KPCOFGS made their first contribution in #224
- @eltociear made their first contribution in #227
- @criminact made their first contribution in #253
Full Changelog: v0.1.0...v0.1.2
v0.1.0
Update version