You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.
In the above, things like REG_EDI would be the unique IDs of values for the registers. So probably raw_operation wouldn't start at 0.
From here, we'd want rules that do some basic things like copy propagation. This means definition the post-instruction register state. These rules would exist in a separare datalog db, with one instance per function. The idea would be to run these, have them publish messages that we'd store back into the main db, then destroy these instances until they're needed again. Key idea: throwaway databases.
; Orchestrated by creator of throwaway db, unique values for `ValOnFuncEntry`.
#message func_entry_reg_state(FuncEA, RegName, ValOnFuncEntry)
; Sent from the users of the main DB to us, based off of `transfer`.
#message local_transfer(FromEA, ToEA)
; !!!!
; Could have a bunch of messages, e.g. for constants or symbolic values that are
; user-provided, for interactive constant folding / symexec during a speculative execution
; or speculative specialization of a function.
; !!!!
; Base case for beginning of a function: each register has a tombstone `VAL_ON_FUNC_ENTRY` value
; on entry to a function.
post_inst_reg_state(FuncEA, RegName, ValOnFuncEntry)
: func_entry_reg_state(FuncEA, RegName, ValOnFuncEntry).
, !raw_operation(_, FuncEA, FuncEA, WRITE_REGISTER_32, RegName, _, _).
; Any operation that writes out the register value submits its value to the post-register state
; of that instruction.
post_inst_reg_state(InsnEA, RegName, WrittenId)
: raw_operation(_, FuncEA, InsnEA, WRITE_REGISTER_32, RegName, WrittenId, _).
; If we have a flow from `PredInsnEA` to `InsnEA`, and if `InsnEA` doesn't write to
; `RegName`, then propagate the value of `WrittenId` for `RegName`.
post_inst_reg_state(FuncEA, InsnEA, RegName, WrittenId)
: local_transfer(PredInsnEA, InsnEA)
, !raw_operation(_, FuncEA, InsnEA, WRITE_REGISTER_32, RegName, _, _)
, post_inst_reg_state(FuncEA, PredInsnEA, RegName, WrittenId).
I think with the simple rules above, execution would converge toward the following:
the datalog system would explore all possible paths through the function
the datalog system /should/ converge because the above doesn't do any actual constant folding, i.e. it never invents or introduces new values, just copied some around. It's tantamount to repeated matrix multiplication.
For every point in a function, we would be able to express register values in terms of register values from another place in the function. This is something that the GrammaTech people mentioned in their ddisasm paper. Basically, we could say: there exists a path such that the register written at instruction EA1 is read by the instruction at EA2.
I think one thing that becomes apparent from this type of "raw operation" representation is that we'd want to represent conditional register writes that either preserve the register contents or alter them, that way we can model those data-centric flows.
The text was updated successfully, but these errors were encountered:
I think it'd also be interesting to do a form of equality saturation over the IRs. This is be beneficial for loads/stores being converted into things like typed field accesses.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This issue serves to track ideas and goals related to integrating instruction semantics.
Basic example:
From this, we'd like to create the following approximate relations:
In the above, things like
REG_EDI
would be the unique IDs of values for the registers. So probablyraw_operation
wouldn't start at0
.From here, we'd want rules that do some basic things like copy propagation. This means definition the post-instruction register state. These rules would exist in a separare datalog db, with one instance per function. The idea would be to run these, have them publish messages that we'd store back into the main db, then destroy these instances until they're needed again. Key idea: throwaway databases.
I think with the simple rules above, execution would converge toward the following:
For every point in a function, we would be able to express register values in terms of register values from another place in the function. This is something that the GrammaTech people mentioned in their ddisasm paper. Basically, we could say: there exists a path such that the register written at instruction EA1 is read by the instruction at EA2.
I think one thing that becomes apparent from this type of "raw operation" representation is that we'd want to represent conditional register writes that either preserve the register contents or alter them, that way we can model those data-centric flows.
The text was updated successfully, but these errors were encountered: