-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page-agnostic block trampolines #317
base: master
Are you sure you want to change the base?
Conversation
See #271 |
Seems like the cross build for PowerPC was always running on 4KiB pages. Ready for review. |
Do not assume a certain page size at compile time, as it might change during runtime. Android 15 has support for 16 KiB pages and expects an application to support both 4 KiB and 16 KiB pages.
Instead of performing a PC-relative load with PAGE_SIZE as negative offset, we fix up the trampoline in the `alloc_trampolines` function.
Much like libdispatch project, shim.c implements APIs that are missing on platforms. Currently, this only affects Windows.
Some block trampolines hard-code their page sizes for simplicity. Check if we violate that assumption when building in debug mode.
@davidchisnall would it be possible for you to grant me permission to set up a self-hosted aarch64 runner? I’d like to add an Android CI using the Pre-Release 16 KiB Page Size arm64-v8a System Image. Additionally, we should consider consolidating some runners, possibly removing CIs with Clang versions older than 14. |
Android Test ResultsSystemI've setup an Android Toolchain and Emulator with the following packages:
It correctly reports a 16 KiB page size: emu64a16k:/ # getconf PAGESIZE
16384 Test ResultsMaster (771f7c4)
Log
This branch
Log
|
ping :) |
Sorry, I've been off ill for a couple of weeks and just catching up. If I understand correctly, this is now embedding the displacement in the code and loading it. I don't really like that: it increases the code size, and that wastes more space in the data page, for a fairly niche corner case. On 32-bit platforms, we are very unlikely to ever see page sizes that are not 4 KiB. On 64-bit ones, we typically have, at most, two out of 4, 16, and 64 KiB as the smallest granule size. It would be less code to dynamically select between those cases. Note that the only down side in using larger pages is if we need more instructions to find the address of the data page. This means that, if you need to support both 16 and 64 KiB page, it probably isn't worth supporting 16 KiB, just round up to 64 KiB. On AArch64, I think we just need an extra adrp instruction. That, unfortunately, pushes the code size up to 20 bytes per trampoline. We currently store both the block pointer and the block invoke pointer in the data page. Given that the block pointer is a fixed offset, you possibly could have one page of block pointers and three pages of code, with the displacement calculated by rounding down the current address and subtracting offsets, if you could fit that in three more instructions. If not, we'll just have to waste some space on platforms with 16 KiB pages. |
On AArch64 we can probably patch the imm of the ADR op code, thus not increasing the size of the trampoline. From https://developer.arm.com/documentation/dui0801/h/A64-General-Instructions/ADR
Would this mean that we have to allocate 16 pages when running on a platform with 4KiB pages? |
Yes, but that's still only 128 KiB of total (code + data regions), and then we allocate trampolines from within that. I think these days it's probably fine to waste 120 KiB of memory in a process on a 64-bit system. For systems with 4 KiB pages, it's better to keep the current code, but if a system supports 16 KiB and 64 KiB page sizes, the effort in supporting 16 KiB probably isn't worth it. The size that we care about is the minimum protection granule: we need to be able to |
Got it. So you prefer this approach to patching op code immediates? |
If we have to patch immediates, I guess that's okay. Ideally I'd prefer that we have 1-2 versions per arch and we select the appropriate one. |
Huh that is actually the simplest thing to do: Just have 4k, 16k, 64k trampolines for AArch64 (and probably RISC-V and PowerPC) and copy the correct trampoline based on |
This is a block trampoline implementation which is page-agnostic, instead relying on a fix-up when copied into the RX buffer. It is currently UB to run libobjc2 on platforms with a page size other than 4KiB (except PPC64 which is configured as 64KiB in libobjc2).
There is a also a hard-coded page size in
pool.h
which I have not addressed yet.Before rewriting all block trampolines, I want to ensure my approach is sound, so I have only adjusted the AArch64 trampoline for now. Suggestions are welcome.