0.0.7
What's Changed
- Sync with Triton 24.04
- Bump TRT-LLM version to 0.9.0
- Add support for
llama-2-7b-chat
,llama-3-8b
, andllama-3-8b-instruct
for both vLLM and TRT-LLM - Improve error checking and error messages of building TRT-LLM engines
- Log the underlying
convert_checkpoint.py
andtrtllm-build
commands for reproducibility/visibility - Don't call
convert_checkpoint.py
if converted weights are already found - Call
convert_checkpoint.py
via subprocess to improve total memory usage - Attempt to cleanup failed trtllm models in model repository if import or engine building fails, rather than leaving the model repository in an unfinished state.
- Update tests to wait for both HTTP and GRPC server endpoints to be ready before testing
- Fixes intermittent
ConnectionRefusedError
in CI tests
- Fixes intermittent
Full Changelog: 0.0.6...0.0.7