Dockerfiles are available here to help you get started.
Pre-built packages are available at the locations indicated here.
- Checkout the source tree:
git clone --recursive https://github.com/Microsoft/onnxruntime cd onnxruntime
- Install cmake-3.13 or higher from https://cmake.org/download/.
Open Developer Command Prompt for Visual Studio version you are going to use. This will properly setup the environment including paths to your compiler, linker, utilities and header files.
.\build.bat --config RelWithDebInfo --build_shared_lib --parallel
The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing --cmake_generator "Visual Studio 16 2019"
to .\build.bat
./build.sh --config RelWithDebInfo --build_shared_lib --parallel
By default, ORT is configured to be built for a minimum target Mac OS X version of 10.12. The shared library in the release Nuget(s) and the Python wheel may be installed on Mac OS X versions of 10.12+.
- Please note that these instructions build the debug build, which may have performance tradeoffs
- To build the version from each release (which include Windows, Linux, and Mac variants), see these .yml files for reference: CPU, GPU
- The build script runs all unit tests by default (for native builds and skips tests by default for cross-compiled builds).
- If you need to install protobuf 3.6.1 from source code (cmake/external/protobuf), please note:
- CMake flag protobuf_BUILD_SHARED_LIBS must be turned OFF. After the installation, you should have the 'protoc' executable in your PATH. It is recommended to run
ldconfig
to make sure protobuf libraries are found. - If you installed your protobuf in a non standard location it would be helpful to set the following env var:
export CMAKE_ARGS="-DONNX_CUSTOM_PROTOC_EXECUTABLE=full path to protoc"
so the ONNX build can find it. Also runldconfig <protobuf lib folder path>
so the linker can find protobuf libraries.
- CMake flag protobuf_BUILD_SHARED_LIBS must be turned OFF. After the installation, you should have the 'protoc' executable in your PATH. It is recommended to run
- If you'd like to install onnx from source code (cmake/external/onnx), use:
export ONNX_ML=1 python3 setup.py bdist_wheel pip3 install --upgrade dist/*.whl
x86_32 | x86_64 | ARM32v7 | ARM64 | |
---|---|---|---|---|
Windows | YES | YES | YES | YES |
Linux | YES | YES | YES | YES |
Mac OS X | NO | YES | NO | NO |
OS | Supports CPU | Supports GPU | Notes |
---|---|---|---|
Windows 10 | YES | YES | VS2019 through the latest VS2015 are supported |
Windows 10 Subsystem for Linux |
YES | NO | |
Ubuntu 16.x | YES | YES | Also supported on ARM32v7 (experimental) |
Mac OS X | YES | NO |
- GCC 4.x and below are not supported.
OS/Compiler | Supports VC | Supports GCC | Supports Clang |
---|---|---|---|
Windows 10 | YES | Not tested | Not tested |
Linux | NO | YES(gcc>=4.8) | Not tested |
Mac OS X | NO | Not tested | YES (Minimum version required not ascertained) |
For other system requirements and other dependencies, please see this section.
Description | Command | Additional description |
---|---|---|
Basic build | build.bat (Windows) ./build.sh (Linux) |
|
Debug build | --config RelWithDebInfo | Debug build |
Use OpenMP | --use_openmp | OpenMP will parallelize some of the code for potential performance improvements. This is not recommended for running on single threads. |
Build using parallel processing | --parallel | This is strongly recommended to speed up the build. |
Build Shared Library | --build_shared_lib | |
Build Python wheel | --build_wheel | |
Build C# and C packages | --build_csharp | |
Build WindowsML | --use_winml --use_dml --build_shared_lib |
WindowsML depends on DirectML and the OnnxRuntime shared library. |
Build Java package | --build_java | Creates an onnxruntime4j.jar in the build directory, implies --build_shared_lib |
The complete list of build options can be found by running ./build.sh (or .\build.bat) --help
Execution Providers
- NVIDIA CUDA
- NVIDIA TensorRT
- Intel DNNL/MKL-ML
- Intel nGraph
- Intel OpenVINO
- Android NNAPI
- Nuphar Model Compiler
- DirectML
- ARM Compute Library
- Rockchip RKNPU
Options
Architectures
- Install CUDA and cuDNN
- ONNX Runtime is built and tested with CUDA 10.1 and cuDNN 7.6 using the Visual Studio 2019 14.12 toolset (i.e. Visual Studio 2019 v16.5). ONNX Runtime can also be built with CUDA versions from 9.1 up to 10.1, and cuDNN versions from 7.1 up to 7.4.
- The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the
--cuda_home parameter
- The path to the cuDNN installation (include the
cuda
folder in the path) must be provided via the cuDNN_PATH environment variable, or--cudnn_home parameter
. The cuDNN path should containbin
,include
andlib
directories. - The path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_7.dll is found.
.\build.bat --use_cuda --cudnn_home <cudnn home path> --cuda_home <cuda home path>
./build.sh --use_cuda --cudnn_home <cudnn home path> --cuda_home <cuda home path>
A Dockerfile is available here.
-
Depending on compatibility between the CUDA, cuDNN, and Visual Studio 2017 versions you are using, you may need to explicitly install an earlier version of the MSVC toolset.
-
CUDA 10.0 is known to work with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions
-
CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4)
- To install the 14.11 MSVC toolset, see this page.
- To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options:
-
Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11
For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g..\build.amd64.1411.bat --use_cuda
-
Alternatively, if you have CMake 3.13 or later you can specify the toolset version via the
--msvc_toolset
build script parameter. e.g..\build.bat --msvc_toolset 14.11
-
If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations. If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory.
See more information on the TensorRT Execution Provider here.
- Install CUDA and cuDNN
- The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.2 and cuDNN 7.6.5.
- The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the
--cuda_home parameter
. The CUDA path should containbin
,include
andlib
directories. - The path to the CUDA
bin
directory must be added to the PATH environment variable so thatnvcc
is found. - The path to the cuDNN installation (path to folder that contains libcudnn.so) must be provided via the cuDNN_PATH environment variable, or
--cudnn_home parameter
.
- Install TensorRT
- The TensorRT execution provider for ONNX Runtime is built on TensorRT 7.x and is tested with TensorRT 7.0.0.11.
- The path to TensorRT installation must be provided via the
--tensorrt_home parameter
.
.\build.bat --cudnn_home <path to cuDNN home> --cuda_home <path to CUDA home> --use_tensorrt --tensorrt_home <path to TensorRT home>
./build.sh --cudnn_home <path to cuDNN e.g. /usr/lib/x86_64-linux-gnu/> --cuda_home <path to folder for CUDA e.g. /usr/local/cuda> --use_tensorrt --tensorrt_home <path to TensorRT home>
Dockerfile instructions are available here
- ONNX Runtime v1.2.0 or higher requires TensorRT 7 support, at this moment, the compatible TensorRT and CUDA libraries in JetPack 4.4 is still under developer preview stage. Therefore, we suggest using ONNX Runtime v1.1.2 with JetPack 4.3 which has been validated.
git clone --single-branch --recursive --branch v1.1.2 https://github.com/Microsoft/onnxruntime
- Indicate CUDA compiler. It's optional, cmake can automatically find the correct cuda.
export CUDACXX="/usr/local/cuda/bin/nvcc"
- Modify tools/ci_build/build.py
- "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
+ "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
- Modify cmake/CMakeLists.txt
- set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson TX1/Nano
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_62,code=sm_62") # Jetson TX2
- Build onnxruntime with --use_tensorrt flag
./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
See instructions for additional information and tips.
See more information on DNNL and MKL-ML here.
./build.sh --use_dnnl
See more information on the nGraph Execution Provider here.
.\build.bat --use_ngraph
./build.sh --use_ngraph
See more information on the OpenVINO Execution Provider here.
-
Install the Intel® Distribution of OpenVINOTM Toolkit Release 2020.2 for the appropriate OS and target hardware :
Follow documentation for detailed instructions.
-
Configure the target hardware with specific follow on instructions:
- To configure Intel® Processor Graphics(GPU) please follow these instructions: Windows, Linux
- To configure Intel® MovidiusTM USB, please follow this getting started guide: Linux
- To configure Intel® Vision Accelerator Design based on 8 MovidiusTM MyriadX VPUs, please follow this configuration guide: Windows, Linux
- To configure Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA, please follow this configuration guide: Linux
-
Initialize the OpenVINO environment by running the setupvars script as shown below:
- For Linux run:
$ source <openvino_install_directory>/bin/setupvars.sh
- For Windows run:
C:\ <openvino_install_directory>\bin\setupvars.bat
.\build.bat --config RelWithDebInfo --use_openvino <hardware_option>
Note: The default Windows CMake Generator is Visual Studio 2017, but you can also use the newer Visual Studio 2019 by passing --cmake_generator "Visual Studio 16 2019"
to .\build.bat
./build.sh --config RelWithDebInfo --use_openvino <hardware_option>
--use_openvino
: Builds the OpenVINO Execution Provider in ONNX Runtime.
<hardware_option>
: Specifies the default hardware target for building OpenVINO Execution Provider. This can be overriden dynamically at runtime with another option (refer to OpenVINO-ExecutionProvider.md for more details on dynamic device selection). Below are the options for different Intel target devices.
Hardware Option | Target Device |
---|---|
CPU_FP32 |
Intel® CPUs |
GPU_FP32 |
Intel® Integrated Graphics |
GPU_FP16 |
Intel® Integrated Graphics with FP16 quantization of models |
MYRIAD_FP16 |
Intel® MovidiusTM USB sticks |
VAD-M_FP16 |
Intel® Vision Accelerator Design based on 8 MovidiusTM MyriadX VPUs |
VAD-F_FP32 |
Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA |
For more information on OpenVINO Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document OpenVINO-ExecutionProvider.md in $onnxruntime_root/docs/execution_providers
See more information on the NNAPI Execution Provider here.
To build ONNX Runtime with the NN API EP, first install Android NDK (see Android Build instructions)
The basic build commands are below. There are also some other parameters for building the Android version. See Android Build instructions for more details.
./build.bat --android --android_sdk_path <android sdk path> --android_ndk_path <android ndk path> --use_dnnlibrary
./build.sh --android --android_sdk_path <android sdk path> --android_ndk_path <android ndk path> --use_dnnlibrary
See more information on the Nuphar Execution Provider here.
- The Nuphar execution provider for ONNX Runtime is built and tested with LLVM 9.0.0. Because of TVM's requirement when building with LLVM, you need to build LLVM from source. To build the debug flavor of ONNX Runtime, you need the debug build of LLVM.
- Windows (Visual Studio 2017):
REM download llvm source code 9.0.0 and unzip to \llvm\source\path, then install to \llvm\install\path cd \llvm\source\path mkdir build cd build cmake .. -G "Visual Studio 15 2017 Win64" -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_DIA_SDK=OFF msbuild llvm.sln /maxcpucount /p:Configuration=Release /p:Platform=x64 cmake -DCMAKE_INSTALL_PREFIX=\llvm\install\path -DBUILD_TYPE=Release -P cmake_install.cmake
Note that following LLVM cmake patch is necessary to make the build work on Windows, Linux does not need to apply the patch. The patch is to fix the linking warning LNK4199 caused by this LLVM commit
diff --git "a/lib\\Support\\CMakeLists.txt" "b/lib\\Support\\CMakeLists.txt"
index 7dfa97c..6d99e71 100644
--- "a/lib\\Support\\CMakeLists.txt"
+++ "b/lib\\Support\\CMakeLists.txt"
@@ -38,12 +38,6 @@ elseif( CMAKE_HOST_UNIX )
endif()
endif( MSVC OR MINGW )
-# Delay load shell32.dll if possible to speed up process startup.
-set (delayload_flags)
-if (MSVC)
- set (delayload_flags delayimp -delayload:shell32.dll -delayload:ole32.dll)
-endif()
-
# Link Z3 if the user wants to build it.
if(LLVM_WITH_Z3)
set(Z3_LINK_FILES ${Z3_LIBRARIES})
@@ -187,7 +181,7 @@ add_llvm_library(LLVMSupport
${LLVM_MAIN_INCLUDE_DIR}/llvm/ADT
${LLVM_MAIN_INCLUDE_DIR}/llvm/Support
${Backtrace_INCLUDE_DIRS}
- LINK_LIBS ${system_libs} ${delayload_flags} ${Z3_LINK_FILES}
+ LINK_LIBS ${system_libs} ${Z3_LINK_FILES}
)
set_property(TARGET LLVMSupport PROPERTY LLVM_SYSTEM_LIBS "${system_libs}")
- Linux Download llvm source code 9.0.0 and unzip to /llvm/source/path, then install to /llvm/install/path
cd /llvm/source/path
mkdir build
cd build
cmake .. -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
cmake -DCMAKE_INSTALL_PREFIX=/llvm/install/path -DBUILD_TYPE=Release -P cmake_install.cmake
.\build.bat --use_tvm --use_llvm --llvm_path=\llvm\install\path\lib\cmake\llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release
- These instructions build the release flavor. The Debug build of LLVM would be needed to build with the Debug flavor of ONNX Runtime.
./build.sh --use_tvm --use_llvm --llvm_path=/llvm/install/path/lib/cmake/llvm --use_mklml --use_nuphar --build_shared_lib --build_csharp --enable_pybind --config=Release
Dockerfile instructions are available here
See more information on the DirectML execution provider here.
.\build.bat --use_dml
The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows.
See more information on the ACL Execution Provider here.
- Supported backend: i.MX8QM Armv8 CPUs
- Supported BSP: i.MX8QM BSP
- Install i.MX8QM BSP:
source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh
- Install i.MX8QM BSP:
- Set up the build environment
source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
- See Build ARM below for information on building for ARM devices
- Configure ONNX Runtime with ACL support:
cmake ../onnxruntime-arm-upstream/cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON -Donnxruntime_DEV_MODE=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -Donnxruntime_USE_CUDA=OFF -Donnxruntime_USE_NSYNC=OFF -Donnxruntime_CUDNN_HOME= -Donnxruntime_USE_JEMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_EIGEN_FOR_BLAS=ON -Donnxruntime_USE_OPENBLAS=OFF -Donnxruntime_USE_ACL=ON -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_MKLML=OFF -Donnxruntime_USE_OPENMP=ON -Donnxruntime_USE_TVM=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_BRAINSLICE=OFF -Donnxruntime_USE_NUPHAR=OFF -Donnxruntime_USE_EIGEN_THREADPOOL=OFF -Donnxruntime_BUILD_UNIT_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
The -Donnxruntime_USE_ACL=ON
option will use, by default, the 19.05 version of the Arm Compute Library. To set the right version you can use:
-Donnxruntime_USE_ACL_1902=ON
, -Donnxruntime_USE_ACL_1905=ON
or -Donnxruntime_USE_ACL_1908=ON
;
- Build ONNX Runtime library, test and performance application:
make -j 6
- Deploy ONNX runtime on the i.MX 8QM board
libonnxruntime.so.0.5.0
onnxruntime_perf_test
onnxruntime_test_all
- Build ACL Library (skip if already built)
cd ~
git clone https://github.com/Arm-software/ComputeLibrary.git
cd ComputeLibrary
sudo apt install scons
sudo apt install g++-arm-linux-gnueabihf
scons -j8 arch=arm64-v8a Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native
- Set environment variables to set include directory and shared object library path.
export CPATH=~/ComputeLibrary/include/:~/ComputeLibrary/
export LD_LIBRARY_PATH=~/ComputeLibrary/build/
- Build onnxruntime with --use_acl flag
./build.sh --use_acl
See more information on the RKNPU Execution Provider here.
- Supported platform: RK1808 Linux
- See Build ARM below for information on building for ARM devices
- Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake:
set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++) set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
-
Download rknpu_ddk to any directory.
-
Build ONNX Runtime library and test:
./build.sh --arm --use_rknpu --parallel --build_shared_lib --build_dir build_arm --config MinSizeRel --cmake_extra_defines RKNPU_DDK_PATH=<Path To rknpu_ddk> CMAKE_TOOLCHAIN_FILE=<Path To tool.cmake> ONNX_CUSTOM_PROTOC_EXECUTABLE=<Path To protoc>
-
Deploy ONNX runtime and librknpu_ddk.so on the RK1808 board:
libonnxruntime.so.1.2.0 onnxruntime_test_all rknpu_ddk/lib64/librknpu_ddk.so
.\build.bat --use_openmp
./build.sh --use_openmp
- OpenBLAS
- Windows: See build instructions here
- Linux: Install the libopenblas-dev package
sudo apt-get install libopenblas-dev
.\build.bat --use_openblas
./build.sh --use_openblas
OnnxRuntime supports build options for enabling debugging of intermediate tensor shapes and data.
Dump tensor input/output shapes for all nodes to stdout.
# Linux
./build.sh --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1
# Windows
.\build.bat --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1
Dump tensor input/output shapes and output data for all nodes to stdout.
# Linux
./build.sh --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=2
# Windows
.\build.bat --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=2
To disable this functionality after previously enabling, set onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=0 or delete CMakeCache.txt.
- add
--x86
argument when launching.\build.bat
- Must be built on a x86 OS
- add --x86 argument to build.sh
We have experimental support for Linux ARM builds. Windows on ARM is well tested.
This method rely on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every ARM instruction to x86. This is much faster than compiling natively on a low-end ARM device and avoids out-of-memory issues that may be encountered. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts.
Here we have an example for Raspberrypi3 and Raspbian. Please note, it doesn't work for Raspberrypi 1 or Zero. And if your operating system is different than the docker file uses, it also may not work.
The whole build process may take hours.
You can get the package in minutes, but it's very hard to setup. Cross compiling was never easy. But if you have a large code base(e.g. you are adding a fancy execution provider to onnxruntime), this is the only way you can do.
tldr: Go to https://www.linaro.org/downloads/, get one for "64-bit Armv8 Cortex-A, little-endian" and "Linux Targeted", not "Bare-Metal Targeted". Extract it to your build machine and add the bin folder to your $PATH env. Then skip this part.
You can use GCC or Clang. Both works, but here we only talk gcc.
In GCC's world, we use:
- "build" to describe the type of system on which GCC is being configured and compiled
- "host" to describe the type of system on which GCC runs.
- "target" to describe the type of system for which GCC produce code
When not doing cross compile, usually "build" = "host" = "target". When you do cross compile, usually "build" = "host" != "target". For example, you may build GCC on x86_64, then run GCC on x86_64, then generate binaries that target aarch64. In this case,"build" = "host" = x86_64 Linux, target is aarch64 Linux.
Then you can either build GCC from source code by your self, or get a prebuilt one from a vendor like Ubuntu, linaro. Please choose the same compiler version as your target operating system has, that would be the best. If you can't, choose the latest stable one and you'll have to static link to gcc libs.
When you get the compiler, please run
aarch64-linux-gnu-gcc -v
You'll see outputs like:
Using built-in specs.
COLLECT_GCC=/usr/bin/aarch64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-linux-gnu/9/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../gcc-9.2.1-20190827/configure --bindir=/usr/bin --build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float --disable-dependency-tracking --disable-gold --disable-libgcj --disable-libgomp --disable-libmpx --disable-libquadmath --disable-libssp --disable-libunwind-exceptions --disable-shared --disable-silent-rules --disable-sjlj-exceptions --disable-threads --with-ld=/usr/bin/aarch64-linux-gnu-ld --enable-__cxa_atexit --enable-checking=release --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++ --enable-linker-build-id --enable-lto --enable-nls --enable-obsolete --enable-plugin --enable-targets=all --exec-prefix=/usr --host=x86_64-redhat-linux-gnu --includedir=/usr/include --infodir=/usr/share/info --libexecdir=/usr/libexec --localstatedir=/var --mandir=/usr/share/man --prefix=/usr --program-prefix=aarch64-linux-gnu- --sbindir=/usr/sbin --sharedstatedir=/var/lib --sysconfdir=/etc --target=aarch64-linux-gnu --with-bugurl=http://bugzilla.redhat.com/bugzilla/ --with-gcc-major-version-only --with-isl --with-newlib --with-plugin-ld=/usr/bin/aarch64-linux-gnu-ld --with-sysroot=/usr/aarch64-linux-gnu/sys-root --with-system-libunwind --with-system-zlib --without-headers --enable-gnu-indirect-function --with-linker-hash-style=gnu
Thread model: single
gcc version 9.2.1 20190827 (Red Hat Cross 9.2.1-3) (GCC)
Please check the value of "--build", "--host", "--target", and if it has special args like "--with-arch=armv8-a", "--with-arch=armv6 --with-tune=arm1176jz-s --with-fpu=vfp --with-float=hard". And you must know what kind of flags your target hardware need. It may largely differ. For example, if you just get normal ARMv7 compiler and use it for raspberry pi V1 straightly, it won't work, because raspberry pi only has ARMv6. Usually every hardware vendor will provide a toolchain for you, please check how that one was built.
A target env is identifed by four things:
- Arch: x86_32, x86_64, armv6,armv7,arvm7l,aarch64,...
- OS: bare-metal or linux.
- Libc: gnu libc/ulibc/musl/...
- ABI: ARM has mutilple ABIs like eabi, eabihf...
You can get all these information from the previous output, please be sure they are all correct.
You may get it from https://github.com/protocolbuffers/protobuf/releases/download/v3.11.2/protoc-3.11.2-linux-x86_64.zip . Please unzip it after downloading. The version must match the one onnxruntime is using. Currently we are using 3.11.2.
(Skip this part if you don't use python)
Dump the root file system of the target operating system to your build machine. We'll call that folder "sysroot" and use it for build onnxruntime python extension. Before doing that, you should install python3 dev package(which contains the C header files) and numpy python package on the target machine first.
Below are some examples.
If the target OS is raspbian-buster, please download the RAW image from their website then run:
$ fdisk -l 2020-02-13-raspbian-buster.img
Disk 2020-02-13-raspbian-buster.img: 3.54 GiB, 3787456512 bytes, 7397376 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xea7d04d6
Device | Boot | Start | End | Sectors | Size | Id | Type |
---|---|---|---|---|---|---|---|
2020-02-13-raspbian-buster.img1 | 8192 | 532479 | 524288 | 256M | c | W95 FAT32 (LBA) | |
2020-02-13-raspbian-buster.img2 | 532480 | 7397375 | 6864896 | 3.3G | 83 | Linux |
You'll find the the root partition starts at the 532480 sector, which is 532480 * 512=272629760 bytes from the beginning.
Then run:
$ mkdir /mnt/pi
$ mount -r -o loop,offset=272629760 2020-02-13-raspbian-buster.img /mnt/pi
You'll see all raspbian files at /mnt/pi. However you can't use it yet. Because some of the symlinks are broken, you must fix them first. In /mnt/pi, run
$ find . -type l -exec realpath {} \; |grep 'No such file'
It will show which are broken. Then you can fix them by running:
$ mkdir /mnt/pi2
$ cd /mnt/pi2
$ sudo tar -C /mnt/pi -cf - . | sudo tar --transform 'flags=s;s,^/,/mnt/pi2/,' -xf -
Then /mnt/pi2 is the sysroot folder you'll use in the next step.
If the target OS is Ubuntu, you can get an image from https://cloud-images.ubuntu.com/. But that image is in qcow2 format. Please convert it before run fdisk and mount.
qemu-img convert -p -O raw ubuntu-18.04-server-cloudimg-arm64.img ubuntu.raw
The remaining part is similar to raspbian.
If the target OS is manylinux2014, you can get it by: Install qemu-user-static from apt or dnf. Then run the docker Ubuntu:
docker run -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -it --rm quay.io/pypa/manylinux2014_aarch64 /bin/bash
The "-v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static" arg is not needed on Fedora.
Then, inside the docker, run
cd /opt/python
./cp35-cp35m/bin/python -m pip install numpy==1.16.6
./cp36-cp36m/bin/python -m pip install numpy==1.16.6
./cp37-cp37m/bin/python -m pip install numpy==1.16.6
./cp38-cp38/bin/python -m pip install numpy==1.16.6
These commands will take a few hours because numpy doesn't have such a prebuilt package yet. When it is finished, open a second window and run
docker ps
From the output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5a796e98db05 quay.io/pypa/manylinux2014_aarch64 "/bin/bash" 3 minutes ago Up 3 minutes affectionate_cannon
You'll see the docker instance id is: 5a796e98db05. Then please use the following command to export the root filesystem as the sysroot for future use.
docker export 5a796e98db05 -o manylinux2014_aarch64.tar
Save the following content as tool.cmake
SET(CMAKE_SYSTEM_NAME Linux)
SET(CMAKE_SYSTEM_VERSION 1)
SET(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
SET(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++)
SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
SET(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
SET(CMAKE_FIND_ROOT_PATH /mnt/pi)
If you don't have a sysroot, you can delete the last line.
- Append
-DONNX_CUSTOM_PROTOC_EXECUTABLE=/path/to/protoc -DCMAKE_TOOLCHAIN_FILE=path/to/tool.cmake
to your cmake args, run cmake and make to build it. If you want to build python package as well, you can use cmake args like:
-Donnxruntime_GCC_STATIC_CPP_RUNTIME=ON -DCMAKE_BUILD_TYPE=Release -Dprotobuf_WITH_ZLIB=OFF -DCMAKE_TOOLCHAIN_FILE=path/to/tool.cmake -Donnxruntime_ENABLE_PYTHON=ON -DPYTHON_EXECUTABLE=/mnt/pi/usr/bin/python3 -Donnxruntime_BUILD_SHARED_LIB=OFF -Donnxruntime_DEV_MODE=OFF -DONNX_CUSTOM_PROTOC_EXECUTABLE=/path/to/protoc "-DPYTHON_INCLUDE_DIR=/mnt/pi/usr/include;/mnt/pi/usr/include/python3.7m" -DNUMPY_INCLUDE_DIR=/mnt/pi/folder/to/numpy/headers
After running cmake, run
$ make
Copy the setup.py file from the source folder to the build folder and run
python3 setup.py bdist_wheel -p linux_aarch64
However, if your targets manylinux, unfortunately their tools doesn't work in cross-compiling scenario. You must run it in a docker like:
docker run -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -v `pwd`:/tmp/a -w /tmp/a --rm quay.io/pypa/manylinux2014_aarch64 /opt/python/cp37-cp37m/bin/python3 setup.py bdist_wheel
If you only want to target a specfic Linux distro(like Ubuntu), you don't need to do that.
Docker build runs on a Raspberry Pi 3B with Raspbian Stretch Lite OS (Desktop version will run out memory when linking the .so file) will take 8-9 hours in total.
sudo apt-get update
sudo apt-get install -y \
sudo \
build-essential \
curl \
libcurl4-openssl-dev \
libssl-dev \
wget \
python3 \
python3-pip \
python3-dev \
git \
tar
pip3 install --upgrade pip
pip3 install --upgrade setuptools
pip3 install --upgrade wheel
pip3 install numpy
# Build the latest cmake
mkdir /code
cd /code
wget https://cmake.org/files/v3.13/cmake-3.13.5.tar.gz;
tar zxf cmake-3.13.5.tar.gz
cd /code/cmake-3.13.5
./configure --system-curl
make
sudo make install
# Prepare onnxruntime Repo
cd /code
git clone --recursive https://github.com/Microsoft/onnxruntime
# Start the basic build
cd /code/onnxruntime
./build.sh --config MinSizeRel --update --build
# Build Shared Library
./build.sh --config MinSizeRel --build_shared_lib
# Build Python Bindings and Wheel
./build.sh --config MinSizeRel --enable_pybind --build_wheel
# Build Output
ls -l /code/onnxruntime/build/Linux/MinSizeRel/*.so
ls -l /code/onnxruntime/build/Linux/MinSizeRel/dist/*.whl
Using Visual C++ compilers
-
Download and install Visual C++ compilers and libraries for ARM(64). If you have Visual Studio installed, please use the Visual Studio Installer (look under the section
Individual components
after choosing tomodify
Visual Studio) to download and install the corresponding ARM(64) compilers and libraries. -
Use
.\build.bat
and specify--arm
or--arm64
as the build option to start building. Preferably useDeveloper Command Prompt for VS
or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable.
Install Android NDK in Android Studio or https://developer.android.com/ndk/downloads
./build.bat --android --android_sdk_path <android sdk path> --android_ndk_path <android ndk path> --android_abi <android abi, e.g., arm64-v8a (default) or armeabi-v7a> --android_api <android api level, e.g., 27 (default)>
./build.sh --android --android_sdk_path <android sdk path> --android_ndk_path <android ndk path> --android_abi <android abi, e.g., arm64-v8a (default) or armeabi-v7a> --android_api <android api level, e.g., 27 (default)>
Android Archive (AAR) files, which can be imported directly in Android Studio, will be generated in your_build_dir/java/build/outputs/aar.
If you want to use NNAPI Execution Provider on Android, see docs/execution_providers/NNAPI-ExecutionProvider.md.