Skip to content
Yu-Jye Tung edited this page Mar 27, 2022 · 5 revisions

What Is Zero Footprint Opaque Predicates?

For a detailed explanation, see our paper. But if you just want a simplified explanation, this page is for you! The aim of this page is to simplify understanding the “what”, “why”, and “how” questions regarding this project.

It's called Zero Footprint Opaque Predicates because the opaque predicates it constructs aim to resemble all other predicates typically found in an unobfuscated binary.

Why do we want opaque predicates to resemble real predicates? Because once an opaque predicate is detected, deobfuscation is as simple as changing the conditional branch instruction to an unconditional branch instruction. For opaque predicates' effects to last, they need to be stealthy against manual and automated analysis. To be stealthy against manual analysis (e.g., heuristic attacks, pattern matching attacks), the opaque predicates need to resemble real predicates syntactically. To be stealthy against automated analysis, the opaque predicates need to resemble real predicates semantically.

Resembling real predicates syntactically means that it "looks" like real predicates from the disassembly. Whereas, resembling real predicates semantically means that it "behaves" like real predicates at program runtime.

To syntactically resemble real predicates, the problem is two-folds.

Firstly, the construction of the invariants cannot syntactically stand out. To ensure this, our opaque predicates use naturally occurring invariants. An invariant is just a specific program behavior (e.g., x > 10) that is always true at a particular line of code, and every opaque predicate requires one invariant to disguise itself as a conditional branch instruction. Naturally occurring invariants are just invariants that already exist in the original program. We want to make this distinction because other approaches to constructing opaque predicates inject their code to create the invariants (i.e., synthetic invariants) for their opaque predicates. And because they use synthetic invariants --- which are prone to be constructed in similar manners --- every opaque predicate they insert will likely have distinctive features that they all share and thus can stand out from the other real predicates. Naturally occurring invariants create natural diversity among our opaque predicates' invariants since they may not all be constructed in similar manners.

Secondly, the construction of the opaque predicates using the invariants cannot syntactically stand out as well. To ensure this, we first did a study to identify syntactic features that represent real predicates. We then use the tool Rosette to construct our opaque predicates such that the resulting opaque predicates share the syntactic features we identified.

To semantically resemble real predicates, we use value sets as our invariants, so our opaque predicates may take on different values at program runtime. A value set contains all possible values that can be assigned to a variable at a particular line of code. And we can make sure that the value sets we use DO contain all possible values by using abstract interpretation (for this project we use Frama-C's implementation). To use a value set as an invariant, the corresponding opaque predicate has to evaluate each value in the value set to the same truth value (T/F). Thankfully, Rosette can also make sure the constructed opaque predicates have that behavior. Note that the value sets inferred by abstract interpretation can still be easily detected if the value sets are constructed in a local context (i.e., based on a variable's semantics within a basic block). However, abstract interpretation performs whole-program analysis so the value sets it inferred can also be constructed in a global context (i.e., based on a variable's semantics throughout multiple functions).

Clone this wiki locally