-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce std.heap.SmpAllocator #22808
Conversation
An allocator intended to be used in -OReleaseFast mode when multi-threading is enabled.
and no need for special handling of wasi and windows since we don't ask for anything more than page-aligned.
In main, now this allocator is chosen by default when compiling without libc in ReleaseFast or ReleaseSmall, and not targeting WebAssembly.
rotate a couple times before resorting to mapping more memory.
it was always returning max_cpu_count
* slab length reduced to 64K * track freelist length with u8s * on free(), rotate if freelist length exceeds max_freelist_len Prevents memory leakage in the scenario where one thread only allocates and another thread only frees.
const cpu_count = @atomicLoad(u32, &global.cpu_count, .unordered); | ||
if (cpu_count != 0) return cpu_count; | ||
const n: u32 = @min(std.Thread.getCpuCount() catch max_thread_count, max_thread_count); | ||
return if (@cmpxchgStrong(u32, &global.cpu_count, 0, n, .monotonic, .monotonic)) |other| other else n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be an atomicStore
, unless u expect Thread.getCpuCount()
to return different results on different threads.
} | ||
const cpu_count = getCpuCount(); | ||
assert(cpu_count != 0); | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, this should probably use t.mutex.lock()
otherwise this is a spinlock. Maybe after the first for (0..cpu_count)
tryLocks?
An allocator that is designed for ReleaseFast optimization mode, with multi-threading enabled.
This allocator is a singleton; it uses global state and only one should be instantiated for the entire process.
This is a "sweet spot" - the implementation is about 200 lines of code and yet competitive with glibc performance.
Basic Design
Each thread gets a separate freelist, however, the data must be recoverable when the thread exits. We do not directly learn when a thread exits, so occasionally, one thread must attempt to reclaim another thread's resources.
Above a certain size, those allocations are memory mapped directly, with no storage of allocation metadata. This works because the implementation refuses resizes that would move an allocation from small category to large category or vice versa.
Each allocator operation checks the thread identifier from a threadlocal variable to find out which metadata in the global state to access, and attempts to grab its lock. This will usually succeed without contention, unless another thread has been assigned the same id. In the case of such contention, the thread moves on to the next thread metadata slot and repeats the process of attempting to obtain the lock.
By limiting the thread-local metadata array to the same number as the CPU count, ensures that as threads are created and destroyed, they cycle through the full set of freelists.
Performance Data Points
This is building hello world with glibc vs SmpAllocator:
0.14.0-dev.3145+6a6e72fff
)stage3/bin/zig build -p glibc -Doptimize=ReleaseFast -Dno-lib -Dforce-link-libc
stage3/bin/zig build -p SmpAllocator -Doptimize=ReleaseFast -Dno-lib
, which now uses SmpAllocator rather than DebugAllocator with this build configurationA particularly allocation-heavy ast-check:
Building the self-hosted compiler:
more performance data points
How to use it
Put something like this in your main function:
Follow-up issues