-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JitArm64_Integer: Carry flag optimizations #13251
Open
Sintendo
wants to merge
12
commits into
dolphin-emu:master
Choose a base branch
from
Sintendo:carry-opts
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+191
−87
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When the immediate is zero, we can load the carry flag from memory directly to the destination register. Before: 0x394bd3b8 ldrb w24, [x29, #0x2f4] 0x2a1803f9 mov w25, w24 After: 0x394bd3b9 ldrb w25, [x29, #0x2f4]
The result is either -1 or 0 depending on the state of the carry flag. This can be done with a csetm instruction. Before: 0x1280001a mov w26, #-0x1 ; =-1 0x1a1f035a adc w26, w26, wzr After: 0x5a9f23fa csetm w26, lo
When both the input register and the carry flag are constants, the result can be precomputed. Before: 0x52800016 mov w22, #0x0 ; =0 0x2a3603f6 mvn w22, w22 After:
Same optimization we did for subfex. Skip loading the carry flag into a temporary register first when we're dealing with zero. Before: 0x394bd3b8 ldrb w24, [x29, #0x2f4] 0x2a1803f9 mov w25, w24 After: 0x394bd3b9 ldrb w25, [x29, #0x2f4]
Similar to what we did for subfex, but for 0. Before: 0x5280001b mov w27, #0x0 ; =0 0x1a1f037b adc w27, w27, wzr After: 0x1a9f37fb cset w27, hs
Same thing we did for subfex. Before: 0x1280001a mov w26, #-0x1 ; =-1 0x1a1f035a adc w26, w26, wzr After: 0x5a9f23fa csetm w26, lo
When the input register and carry flags are known, we can always precompute the result. We still materialize the immediate when the condition register needs to be updated, but this seems to be a general problem. I might look into that one day, but for now this'll do. - ConstantFalse Before: 0x52800119 mov w25, #0x8 ; =8 0x2a1903fa mov w26, w25 After: N/A - ConstantTrue Before: 0x52800119 mov w25, #0x8 ; =8 0x1100073a add w26, w25, #0x1 After: N/A
Before: 0x52800019 mov w25, #0x0 ; =0 0x394bd3b8 ldrb w24, [x29, #0x2f4] 0x2b180339 adds w25, w25, w24 After: 0x394bd3b9 ldrb w25, [x29, #0x2f4]
The last commit can remove the Common/Unreachable.h include. LGTM otherwise. |
Before: 0x5280000d mov w13, #0x0 ; =0 0x1a1f01ae adc w14, w13, wzr After: 0x1a9f37ee cset w14, hs
JosJuice
approved these changes
Dec 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements a bunch of missing optimization opportunities regarding the carry flag.
Optimize InPPCState for 0
Before:After:
Optimize InHostCarry for 0
Before:After:
Optimize InHostCarry for -1
Before:After:
Optimize ConstantFalse for 0
Before:After:
Optimize ConstantTrue for 0
Before:After:
Optimize InPPCState for 0
Before:After:
Optimize InHostCarry for 0
Before:After:
Optimize InPPCState for 0
Before:After:
Optimize InHostCarry for -1
Before:After:
Constant folding
Before:After: