These instructions cover the usage of the latest stable release of SHARK. For a more bleeding edge release please install the nightly releases.
Tip
Please note as we are prepping the next stable release, please use nightly releases for usage.
Our current user guide requires that you have:
- Access to a computer with an installed AMD Instinct™ MI300x Series Accelerator
- Installed a compatible version of Linux and ROCm on the computer (see the ROCm compatability matrix)
This section will help you install Python and set up a Python environment with venv.
Officially we support Python versions: 3.11, 3.12, 3.13
The rest of this guide assumes you are using Python 3.11.
To install Python 3.11 on Ubuntu:
sudo apt install python3.11 python3.11-dev python3.11-venv
which python3.11
# /usr/bin/python3.11
Setup your Python environment with the following commands:
# Set up a virtual environment to isolate packages from other envs.
python3.11 -m venv 3.11.venv
source 3.11.venv/bin/activate
First install a torch version that fulfills your needs:
# Fast installation of torch with just CPU support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
For other options, see https://pytorch.org/get-started/locally/.
Next install shark-ai:
pip install shark-ai[apps]
Tip
To switch from the stable release channel to the nightly release channel,
see nightly_releases.md
.
python -m shortfin_apps.sd.server --help
As part of our current release we support serving SDXL and Llama 3.1 variants as well as an initial release of sharktank
, SHARK's model development toolkit which is leveraged in order to compile these models to be high performant.
To get started with SDXL, please follow the SDXL User Guide
To get started with Llama 3.1, please follow the Llama User Guide.
- Once you've set up the Llama server in the guide above, we recommend that you use SGLang Frontend by following the Using
shortfin
withsglang
guide - If you would like to deploy LLama on a Kubernetes cluster we also provide a simple set of instructions and deployment configuration to do so here.
- Finally, if you'd like to leverage the instructions above to run against a different variant of Llama 3.1, it's supported. However, you will need to generate a gguf dataset for that variant (explained in the user guide). In future releases, we plan to streamline these instructions to make it easier for users to compile their own models from HuggingFace.