This repository contains a first version of a SCDG extractor. During symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is build as follow : Nodes are systems Calls recorded, edges show that some arguments are shared between calls.
First run the SCDG container with volumes like this :
docker run --rm --name="sema-scdg" -v ${PWD}/OutputFolder:/sema-scdg/application/database/SCDG -v ${PWD}/ConfigFolder:/sema-scdg/application/configs -v ${PWD}/InputFolder:/sema-scdg/application/database/Binaries -p 5001:5001 -it sema-scdg bash
In this command:
- The first volume corresponds to the output folder where the results will be put.
- The second volume corresponds to the folder containing the configuration files that will be passed to the docker.
- The third matches the folder containing the binaries that are going to be passed to the container.
Example taking the files already provided, being inside the sema_toolchain folder, run :
docker run --rm --name="sema-scdg" \
-v ${PWD}/database/SCDG:/sema-scdg/application/database/SCDG \
-v ${PWD}/sema_scdg/application/configs:/sema-scdg/application/configs \
-v ${PWD}/database/Binaries:/sema-scdg/application/database/Binaries \
-p 5001:5001 -it sema-scdg bash
If you want to be able to modify the code when the container is running, use
docker run --rm --name="sema-scdg" \
-v ${PWD}/database:/sema-scdg/application/database \
-v ${PWD}/sema_scdg/application:/sema-scdg/application \
-p 5001:5001 -it sema-scdg bash
To run experiments, run inside the container :
pyenv local 3.8.10
python SemaSCDG.py configs/config.ini
Or if you want to use pypy3:
pyenv local pypy3.9-7.3.16
python SemaSCDG.py configs/config.ini
The parameters are put in a configuration file : configs/config.ini
. Feel free to modify it or create new configuration files to run different experiments.
The output of the SCDG are put into database/SCDG/runs/
by default. If you are not using volumes and want to save some runs from the container to your host machine, use :
make save-scdg-runs ARGS=PATH
SCDG module arguments
expl_method:
DFS Depth First Search
BFS Breadth First Search
CDFS Coverage Depth-First Search Strategy (Default)
CBFS Coverage Breadth First Search
graph_output:
gs .GS format
json .JSON format
EMPTY if left empty then build on all available format
packing_type:
symbion Concolic unpacking method (linux | windows [in progress])
unipacker Emulation unpacking method (windows only)
SCDG exploration techniques parameters:
jump_it Number of iteration allowed for a symbolic loop (default : 3)
max_in_pause_stach Number of states allowed in pause stash (default : 200)
max_step Maximum number of steps allowed for a state (default : 50 000)
max_end_state Number of deadended state required to stop (default : 600)
max_simul_state Number of simultaneous states we explore with simulation manager (default : 5)
Binary parameters:
n_args Number of symbolic arguments given to the binary (default : 0)
loop_counter_concrete How many times a loop can loop (default : 10240)
count_block_enable Enable the count of visited blocks and instructions
sim_file Create SimFile
entry_addr Entry address of the binary
SCDG creation parameter:
min_size Minimum size required for a trace to be used in SCDG (default : 3)
disjoint_union Do we merge traces or use disjoint union ? (default : merge)
not_comp_args Do we compare arguments to add new nodes when building graph ? (default : comparison enabled)
three_edges Do we use the three-edges strategy ? (default : False)
not_ignore_zero Do we ignore zero when building graph ? (default : Discard zero)
keep_inter_SCDG Keep intermediate SCDG in file (default : False)
eval_time TODO
Global parameter:
concrete_target_is_local Use a local GDB server instead of using cuckoo (default : False)
print_syscall Print the syscall found
csv_file Name of the csv to save the experiment data
plugin_enable Enable the plugins set to true in the config.ini file
approximate Symbolic approximation
is_packed Is the binary packed ? (default : False, not yet supported)
timeout Timeout in seconds before ending extraction (default : 600)
string_resolve Do we try to resolv references of string (default : True)
log_level_sema Level of log of sema, can be INFO, DEBUG, WARNING, ERROR (default : INFO)
log_level_angr Level of log of angr, can be INFO, DEBUG, WARNING, ERROR (default : ERROR)
log_level_claripy Level of log of claripy, can be INFO, DEBUG, WARNING, ERROR (default : ERROR)
family Family of the malware (default : Unknown)
exp_dir Name of the directory to save SCDG extracted (default : Default)
binary_path Relative path to the binary or directory (has to be in the database folder)
fast_main Jump directly into the main function
Plugins:
plugin_env_var Enable the env_var plugin
plugin_locale_info Enable the locale_info plugin
plugin_resources Enable the resources plugin
plugin_widechar Enable the widechar plugin
plugin_registry Enable the registry plugin
plugin_atom Enable the atom plugin
plugin_thread Enable the thread plugin
plugin_track_command Enable the track_command plugin
plugin_ioc_report Enable the ioc_report plugin
plugin_hooks Enable the hooks plugin
The binary path has to be a relative path to a binary beeing into the database
directory
To know the details of the angr options see Angr documentation
You also have a script MergeGspan.py
in sema_scdg/application/helper
which could merge all .gs
from a directory into only one file.
If you wish to run multiple experiments with different configuration files, the script multiple_experiments.sh
is available and can be used inside the scdg container:
# To show usage
./multiple_experiments.sh -h
# Run example
./multiple_experiments.sh -m python3.10 -c configs/config1 configs/config2
To run the test, inside the docker container :
source venv/bin/activate
python scdg_tests.py test_data/config_test.ini
There is a jupyter notebook providing a tutorial on how to use the scdg. To launch it, inside the docker, run
jupyter notebook --ip=0.0.0.0 --port=5001 --no-browser --allow-root --IdentityProvider.token=''
and visit http://127.0.0.1:5001/tree
on your browser. Go to /Tutorial
and open the jupyter notebook.