Releases: bitbank2/JPEGDEC
Odd-width fixes
fixed a regression of the latest code
fixed an accidental regression of the last release
Added initial support for Progressive decoding
This release adds initial support for progressive decoding. Normally progressive decoding requires a lot of RAM because the entire image must be kept in memory as each successive "scan" is decoded with more detailed DCT blocks. For this release, I added support for decoding the first scan which normally just contains the DC values. This allows me to offer a thumbnail sized (1/8th) decode without requiring more RAM. I can add full progressive support later, but it seems like a very rare MCU that would have the need (and the large spare RAM) to use it.
Shoutout to Tobias Butler (Tuneshine) for sponsoring this feature.
Added crop area feature
This release adds the ability to specify a cropped area for decoding. This can save significant time and memory if your program only needs to decode a sub-region of an image. The crop area must be along MCU boundaries and if set incorrectly, the area will be corrected. Two new methods were added:
setCropArea()
getCropArea()
These are documented in the Wiki
Fixed for Arduino targets
Recent changes for Intel + Arm64 broke the MCU reference code. I stabilized the code again, but had to temporarily disable the M4/M7 SIMD optimizations. I'll work on a new release to re-enable them. Meanwhile... For x86/x64 and Arm64, the optimizations make the code quite a bit faster.
Fix 16-byte alignment of buffers for ESP32-S3 SIMD
I thought I had this working properly in build 1.4.0, but a user alerted me that the 16-byte alignment of the pixel buffer and MCU buffers wasn't always guaranteed, so this release adds specific code to ensure that both of those buffers are always 16-byte aligned.
Added ESP32-S3 SIMD
This release adds initial support for accelerating the decode by using the ESP32-S3's SIMD instructions. My measurements show a 20-40% speedup depending on the options. I wrote a short blog post about how I figured out how to use these instructions here:
https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-s3-has-few-simd.html
Fixed simd conflict on Cortex-M
In the last release I added NEON optimizations for the output stage and unfortunately they were enabled for Cortex-M targets too. This caused a compiler error. This release fixes the issue.
RGBA 32-bit and initial aarch64 SIMD
I corrected some errors in the 16 different permutations of subsampling and scaling options. I also added an experimental set of code to optimize the color conversion for aarch64 (Arm NEON) for the 4:2:0 subsampling, full size output. On my MacBook Air M1, it doubles the decode speed. A 126K 938x698 file decodes in just 8 milliseconds (previously 15 milliseconds). I can optimize this code for x86 and Arm desktop usage, but need to evaluate the cost/benefit of investing the time. I believe my code can beat libjpeg-turbo for certain situations (if I fully deploy SIMD optimizations). Please let me know if you need this code optimized for your desktop application.
fixed compiler warnings for new RGB8888 code
The warnings of pointer type difference (uint32_t * vs uint16_t *) can create errors depending on the compiler settings. This change casts all of the pointers to the correct type.