Halide v15.0.0
What's Changed
General Notes
-
Support for RISC V Vector architectures.
-
Python-related:
- Halide builds for Python are now being built and provided to PyPI, so it is now possible to use the Halide Python bindings simply by
pip install halide
- Major improvements were made to the Python bindings, with many missing or incomplete sections of the API added or filled in.
- We now support the use of Generators from Python (for both JIT and AOT usage).
- The standard CMake rules now support generating a Python extension directly.
- Support for Python was removed from Halide's Makefiles; you must use CMake to build the Python bindings
- Halide builds for Python are now being built and provided to PyPI, so it is now possible to use the Halide Python bindings simply by
-
Halide::Func now allows you to (optionally) constrain the type(s) of Exprs that the Func can contain, and/or the dimensionality of the Func.
-
Added a new way to use the JIT (
compile_to_callable
) that allows calling a jitted function with the same syntax as for AOT-compiled functions, allowing more control over JIT lifespan, as well as thread-safe arguments without requiring ParamMap -
General improvements to SIMD codegen
-
Several rarely-used parts of the C++ Generator API were deprecated, and the way that autoschedulers are specified for AOT compilation is now completely different (but better for future expandability).
-
CMake builds now require >= v3.22
-
WABT usage requires >= v1.0.30
-
LLVM 12 is no longer supported
-
The target flag disable_llvm_loop_opt is deprecated, as it's now the default behavior. This means that we have turned off llvm's autovectorization and loop unrolling. This should not affect any schedules with manually-specified vectorization and unrolling, other than trimming code size a little. However, schedules that do not vectorize or unroll may slow down because they were (intentionally or not) relying on llvm to do it automatically. If you see a performance regression with Halide 15, try turning on the enable_llvm_loop_opt target flag.
Notable bug fixes
- Make Halide::round behave as documented (#7012)
- Incorrect folding of saturating_sub (#6883)
- The check for race conditions didn't consider where clauses (#6808)
- Performance regression for x86 for certain LLVM versions (#6783)
- Fusing a specialization drops compute_withs from generated code (#6770)
- Incorrect output when realize condition depends on tuple call (#6915)
- Python extensions should default to throwing exceptions rather than calling abort() for errors (#6986)
- Python bindings didn't support
bool
buffers (#7006) - Python bindings didn't support
float16
buffers (#7060) - Python extensions that executed on GPU didn't copy back to host properly (#6869)
- Fix bugs in
div_round_to_zero
andfast_integer_divide_round_to_zero
(#7008) - Bugs in
add_requirement()
(#7045)
Major changes
- Augment Halide::Func to allow for constraining Type and Dimensionality by @steven-johnson in #6734 and #6735
- Add Target support for architectures with implementation specific vector size. by @zvookin in #6786
- Add support for vscale vector code generation. by @zvookin in #6802
- Remove Python bindings from Makefiles by @alexreinking in #6821
- Add a new, alternate JIT-call convention by @steven-johnson in #6777
- Pip packaging by @alexreinking in #6886 and #6938
- Define a Generator framework in Python by @steven-johnson in #6764
- Make Halide::round behave as documented by @abadams in #7012
Minor changes
-mtune=
/-mcpu=
support for x86 AMD CPU's by @LebedevRI in #6655- Enable deprecations warnings by @steven-johnson in #6555
- Fix GPU depredication/scalarization by @shoaibkamil in #6669
- Allow PyPipeline and PyFunc to realize() scalar buffers by @steven-johnson in #6674
- Future-proof 'processor
to
tune processor` by @LebedevRI in #6673 - Fix ctors for Realization by @steven-johnson in #6675
-mtune=native
CPU autodetection for AMD Zen 3 CPU by @LebedevRI in #6648- Clean up Python extensions in python_bindings by @steven-johnson in #6670
- Halide::Tools::save_image() should accept buffers with
const
types by @steven-johnson in #6679 - Fix "set but not used" warnings/errors by @steven-johnson in #6683
- Drop support for LLVM12 by @steven-johnson in #6686
- Upgrade to clang-format 13 by @steven-johnson in #6689
- Always mark _ucon as 'unused' in Codegen_C by @steven-johnson in #6691
- Add
break
to avoid 'possible unintentional fallthru' warning by @steven-johnson in #6694 - Silence "unknown warning" in Clang 13 by @steven-johnson in #6693
- Fixes for top-of-tree LLVM by @steven-johnson in #6697
- Python: make Func implicitly convertible to Stage (#6702) by @steven-johnson in #6704
- llvm no longer wants a type suffix on vst intrinsics by @abadams in #6701
- Fix type-mangling for vst on arm32 for LLVM15 by @steven-johnson in #6705
- Remove the last remaining call to getPointerElementType() by @steven-johnson in #6715
- ARM vst mangling needs to be conditional on opaque ptrs by @steven-johnson in #6716
- Combine string constants in combine_strings() by @steven-johnson in #6717
- Update CodeGen_PTX_Dev to use new PassManager by @steven-johnson in #6718
- Closure functions for parallel tasks should be internal, not external by @steven-johnson in #6720
- Smarten type_of<> for fn ptrs; fix async_parallel for C backend by @steven-johnson in #6719
- Remove legacy::FunctionPassManager usage in Codegen_PTX_Dev by @steven-johnson in #6722
get_amd_processor()
: implement detection for the rest of supported AMD CPU's by @LebedevRI in #6711- Add Func::output_type() method by @steven-johnson in #6724
- Grab-bag of minor Python fixes by @steven-johnson in #6725
- Remove
rounding_halving_sub
and non-existent arm rhsub instructions by @rootjalex in #6723 - Faster
widening_mul(int16x, int16x) -> int32x
for x86 (AVX2 and SSE2) by @rootjalex in #6677 - Add missing #include in ThreadPool.h by @steven-johnson in #6738
- Fix regression from #6734 by @steven-johnson in #6739
- Add forwarding for the recently-added Func::output_type() method by @steven-johnson in #6741
- Silence "unscheduled update stage" warnings in msan_generator.cpp by @steven-johnson in #6740
- Add pycache to toplevel .gitignore file by @steven-johnson in #6743
- Silence "may be used uninitialized" in Buffer::for_each_element() by @steven-johnson in #6747
- Update WABT to 1.0.29 by @steven-johnson in #6748
- Update hannk README link to hosted models page by @steven-johnson in #6749
- Add a
HalideError
base class to Python bindings by @steven-johnson in #6750 - Add GeneratorFactoryProvider to generate_filter_main() by @steven-johnson in #6755
- Minor metadata-related cleanups by @steven-johnson in #6759
- Expand the x86 SIMD variants tested in correctness_vector_reductions by @steven-johnson in #6762
- Fix Param::set_estimate for T=void by @steven-johnson in #6766
- add_python_aot_extension should use FUNCTION_NAME for the .so output … by @steven-johnson in #6767
- Fix fundamental confusion about target/tune CPU by @LebedevRI in #6765
- Fix annoying typo in Func.h by @steven-johnson in #6774
- Add execute_generator() API by @steven-johnson in #6771
- Allow overriding of
Generator::init_from_context()
for debug purposes by @steven-johnson in #6760 - Convert some assert-only usage of output_types() -> types() by @steven-johnson in #6779
- [miscompile] Don't de-negate and change direction of shifts-by-unsigned by @LebedevRI in #6782
- Move some options from execute_generator back to generate_filter_main by @steven-johnson in #6787
- LLVM codegen: register AA pipeline if LLVM is older than 14 by @LebedevRI in #6785
- Update the list of fused_pairs and run validate_fused_group for specalization definitions too by @vksnk in #6770
- halide_type_of<>() should always be constexpr by @steven-johnson in #6790
- Define an AbstractGenerator interface by @steven-johnson in #6637
- hexagon_scatter test should run only if target has HVX by @steven-johnson in #6793
- slow tests should support sharding by @steven-johnson in #6780
- Add missing include to test_sharding.h by @steven-johnson in #6795
- Pacify clang-tidy by @steven-johnson in #6796
- Silence a "possibly uninitialized" warning by @steven-johnson in #6797
- Make all tests default to
-fvisibility=hidden
by @steven-johnson in #6799 - Minor typedef cleanup by @steven-johnson in #6800
- Fix auto_schedule/machine_params parsing by @steven-johnson in #6804
- Rewrite strided loads of 4 in AlignLoads by @vksnk in #6806
- Fix two minor bugs triggered by an or reduction with early-out by @abadams in #6807
- [CMake] Mark multi-threaded tests as such by @LebedevRI in #6810
- Rework .gitignore by @alexreinking in #6822
- Update presets to format version 3 by @alexreinking in #6824
- Fix for top-of-tree LLVM by @steven-johnson in #6825
- Tweak python apps for better Blaze/Bazel compatibility by @steven-johnson in #6823
- Apply CMAKE_C_COMPILER_LAUNCHER to initmod clang calls by @alexreinking in #6831
- Scrub Python from Makefile after buildbot update by @alexreinking in #6833
- Remove unused function in callable_generator.cpp by @steven-johnson in #6834
- Disable testing for apps/linear_algebra on x86-32-linux/Make by @steven-johnson in #6836
- Rearrange subdirectories in python_bindings by @steven-johnson in #6835
- Better lowering of halving_sub and rounding_halving_add by @abadams in #6827
- Check RDom::where predicates for race conditions by @alexreinking in #6842
- Remove Generator::value_tracker and friends by @steven-johnson in #6845
- Add placeholder code for bfloat16 in Python (#6849) by @steven-johnson in #6850
- Fix the PLUGINS argument to properly join multiple arguments by @steven-johnson in #6851
- Add autoscheduling to the generator_aot_stubuser test by @steven-johnson in #6855
- Silence Adams2019 Autoscheduler by @steven-johnson in #6854
- [vulkan phase0] Add adts for containers and memory allocation to runtime by @derek-gerstmann in #6829
- Promote Reinterpret Intrinsic into an Reinterpret IR Node by @LebedevRI in #6853
- Python source reorg by @alexreinking in #6867
- Fix simd_op_check for top-of-tree LLVM by @steven-johnson in #6874
- Use pmaddubsw 8-bit horizontal widening adds (Fixes #6859) by @rootjalex in #6873
- [Codegen_LLVM] Radically simplify
visit(const Reinterpret *op)
by @LebedevRI in #6865 - [Codegen] Fail to codegen
Call::undef
, just likeCall::signed_integer_overflow
by @LebedevRI in #6871 - Fix error in Makefile for Adams2019 on OSX by @steven-johnson in #6877
- Refactor/cleanup in Autoscheduler code by @steven-johnson in #6858
- Ensure $CMAKE_{lang}_OUTPUT_EXTENSION is set before using it by @shoaibkamil in #6879
- #6863 - Fixes to make address sanitizer happy for internal runtime classes by @derek-gerstmann in #6880
- [Codegen_LLVM] Define all the things by @LebedevRI in #6866
- Add set-host-dirty/copy-to-host to PythonExtensionGen by @steven-johnson in #6869
- Rewrite PythonExtensionGen to be C++ based by @steven-johnson in #6888
- Fixes to allow compiling with LLVM16 by @steven-johnson in #6889
- Add support for generating x86 sum-of-absolute-difference reductions by @abadams in #6872
- Remove (most) of the env var usage from Adams2019 by @steven-johnson in #6861
- [vulkan phase1] Add SPIR-V IR by @derek-gerstmann in #6882
- Add
auto_schedule
label to Adams2019 and Li2018 tests in CMake by @steven-johnson in #6898 - [Simplify] Drop no-op single-input identity shuffles by @LebedevRI in #6901
- [Codegen_LLVM] Annotate LLVM IR functions with
nounwind
/mustprogress
attributes by @LebedevRI in #6897 - Don't try to fold saturating_sub of VectorReduce by @rootjalex in #6896
- Upgrade clang-format and clang-tidy to v14 (v2) by @steven-johnson in #6902
- Allow AMX instructions with K dimension larger than 4 bytes by @frengels in #6582
- Fix autoscheduling trivial lut wrappers by @abadams in #6905
- Fix broken Makefile rules for autoschedulers on OSX by @steven-johnson in #6906
- LICENSE.txt: Include full text of Apache 2.0 license (not just the 'header' version) by @steven-johnson in #6912
- LICENSE.txt: add spirv license by @steven-johnson in #6913
- LICENSE.txt: add BLAS license. by @steven-johnson in #6914
- Upgrade CMake minimum version to 3.22 by @steven-johnson in #6916
- Remove unused GHA and packaging workflows. by @alexreinking in #6917
- Fix two warnings found with clang 16 by @steven-johnson in #6918
- Fix bug when realize condition depends on tuple call by @abadams in #6915
- Fix wrong install path for *.py files by @steven-johnson in #6921
- Make use of CMake 3.22 features by @alexreinking in #6919
- Make saturating_cast an intrinsic by @rootjalex in #6900
- Halide::Error should not extend std::runtime_error by @steven-johnson in #6927
- Rework internal PYTHONPATH maintenance by @steven-johnson in #6922
- Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does) by @steven-johnson in #6932
- Add build & test presets for release and debug CMake builds by @steven-johnson in #6934
- Add ASAN support to CMake via toolchain file by @steven-johnson in #6920
- Fix badly-merged CMakePresets.json file by @steven-johnson in #6936
- Add minimal useful implementation of extracting and concatenating bits by @abadams in #6928
- Export HalidePythonExtensionHelpers.cmake for installs by @steven-johnson in #6941
- Add/update Python Readme by @steven-johnson in #6939
- Don't throw an exception from generate_filter_main by @steven-johnson in #6946
- Handle saturating_cast in compute_expr_cost() by @rootjalex in #6947
- Two quick build fixes by @alexreinking in #6950
- Remove add_python_aot_extension() rule in CMake by @steven-johnson in #6949
- Build fixes for manylinux2014 by @alexreinking in #6953
- Remove add_python_stub_extension(), adding the functionality to add_halide_generator() instead by @steven-johnson in #6952
- [HVX] Fix state_var issue by @rootjalex in #6894
- Fix RPATH for Python wheels on macOS by @alexreinking in #6958
- Python: don't crash for repr(Expr()) by @steven-johnson in #6962
- Some minor top-level CMakeLists.txt reorganization by @alexreinking in #6957
- CMake packaging fixes by @alexreinking in #6966
- Use CMake target to handle vendored SPIRV headers by @alexreinking in #6968
- Don't cache Halide_ASAN_ENABLED by @alexreinking in #6969
- Lower saturating_cast in bounds inference by @rootjalex in #6970
- Small refactor to remove confusion between CodeGen_LLVM and CodeGen_Internal. by @zvookin in #6973
- Fix XCode by wrapping weights in an OBJECT library by @alexreinking in #6977
- Add test for _Halide_target_export_single_symbol by @steven-johnson in #6983
- Fix markdown links by @alexreinking in #6988
- Improve error-handling in Python Extensions by @steven-johnson in #6986
- Refactor buffer-unpacking code in PythonExtensionGen by @steven-johnson in #6991
- Fixes for Xcode "new" build system. by @alexreinking in #6993
- Fix compiler warnings in Elf.cpp by @steven-johnson in #6992
- [Codegen] Adapt ModuleAddressSanitizerPass/ModuleSanitizerCoveragePass renaming by @MaskRay in #6996
- Apply _Halide_place_dll() to _Halide_gengen (#6999) by @steven-johnson in #7000
- Log target info in performance_fast_pow (#6997) by @steven-johnson in #6998
- Clean up Adams2019 CMake file by @steven-johnson in #7003
- Prohibit C99 VLA usage in runtime code by @steven-johnson in #7005
- Couple small fixes to update RISC V to current LLVM flags and enable vscale use. by @zvookin in #6995
- Fix Python handling of boolean buffers by @steven-johnson in #7006
- Fix some bugs in div_round_to_zero by @abadams in #7008
- [HVX] Simplify constant factor before distributing by @rootjalex in #7009
- Add one-sided widening intrinsics. by @rootjalex in #6967
- Rework Python Extension C++ code (again) by @steven-johnson in #7010
- Add minimum GitHub token permissions for workflow by @varunsh-coder in #7011
- Revert "[HVX] Simplify constant factor before distributing" by @steven-johnson in #7013
- Fix SpecificExpr canonicalization by @rootjalex in #7016
- Appease Python linter by @steven-johnson in #7022
- Don't use
-g
for EMCC by @steven-johnson in #7025 - Temporarily disable testing for apps/fft (#7033) by @steven-johnson in #7035
- Add reinterpret simplifications by @rootjalex in #7029
- Codegen_C for user_context by @steven-johnson in #7031
- Fix Wasm BulkMemory Codgen + Minor fixes to apps/HelloWasm by @steven-johnson in #7026
- Add stack-size-canary test to apps/fft's CMake file by @steven-johnson in #7034
- Handle widen_right_* intrinsics in bounds inference by @vksnk in #7039
- Revert "Temporarily disable testing for apps/fft (#7033)" by @steven-johnson in #7040
- Fix PyExt error handling by @steven-johnson in #7042
- add_requirement() maintenance by @steven-johnson in #7045
- Fix false positive use after free warning. by @zvookin in #7046
- Allow call_intrin to call an LLVM intrinsic with void return type. by @zvookin in #7048
- Allow CodeGen_LLVM::codegen_buffer_pointer to support vectors. by @zvookin in #7049
- Don't mutate GeneratorParams in PythonGenerators by @steven-johnson in #7052
- Allow redefinition of Generators when in interactive mode by @steven-johnson in #7053
- Upgrade wabt to 1.0.30 by @steven-johnson in #7058
- Add support for float16 buffer in python extension by @stevesuzuki-arm in #7060
- Add a terminate_handler to try to report unhandled exceptions by @steven-johnson in #7038
- Improve MSAN under JIT by @steven-johnson in #7059
- Autoscheduler test reorg, part 1 by @steven-johnson in #7064
- Autoscheduler test reorg, part 2 by @steven-johnson in #7065
- Autoscheduler test reorg, part 3 by @steven-johnson in #7067
- pacify clang-tidy by removing unused "using" by @steven-johnson in #7071
- Add pip packaging workflow to GHA by @alexreinking in #6938
- [HVX] Fix DistributeShiftsAsMuls by @rootjalex in #7083
- Support added for dot() instructions in Metal backend
Changes to public API since last release
- Add
add_halide_python_extension_library()
rule by @steven-johnson in #6979 - Add
add_halide_runtime
rule by @steven-johnson in #6985 - Remove deprecated
Halide::Output
type by @steven-johnson in #6685 - Remove deprecated
build()
support from Generators by @steven-johnson in #6684 - Remove deprecated versions of
Func::prefetch()
by @steven-johnson in #6698 - Remove deprecated JIT handler setters by @steven-johnson in #6699
- Drop support for Matlab extensions by @steven-johnson in #6696
- Revise PyStub calling convention for GeneratorParams by @steven-johnson in #6742
- Change stub module names in Python to be
_pystub
rather than_stub
by @steven-johnson in #6830
New Deprecations (Upcoming API changes)
- Deprecate variadic-template version of Realization ctor by @steven-johnson in #6695
- Deprecate GeneratorContext getters with
get_
prefix by @steven-johnson in #6753 - Deprecate disable_llvm_loop_opt (#4113) by @steven-johnson in #6754
- Add Func::type()/types(), deprecate Func::output_type()/output_types() by @steven-johnson in #6772
- Deprecate/remove Generator::get_externs_map() and friends by @steven-johnson in #6844
- Rework autoscheduler API (#6788) by @steven-johnson in #6838
Other Notes
- Although there are commits relating to a Vulkan backend, this release of Halide doesn't provide Vulkan support (it's still a work in progress)
- It's possible that the changes in #6754 can cause performance degradation (but usually only for poorly-schedule Halide code).
New Contributors
- @frengels made their first contribution in #6582
- @varunsh-coder made their first contribution in #7011