Antonio Emanuele Cinà
Leaderboard: https://attackbench.github.io/
The AttackBench
framework wants to fairly compare gradient-based attacks based on their security evaluation curves. To this end, we derive a process involving five distinct stages, as depicted below.
- In stage (1), we construct a list of diverse non-robust and robust models to assess the attacks' impact on various settings, thus testing their adaptability to diverse defensive strategies.
- In stage (2), we define an environment for testing gradient-based attacks under a systematic and reproducible protocol. This step provides common ground with shared assumptions, advantages, and limitations. We then run the attacks against the selected models individually and collect the performance metrics of interest in our analysis, which are perturbation size, execution time, and query usage.
- In stage (3), we gather all the previously-obtained results, comparing attacks with the novel
local optimality
metric. - Finally, in stage (4), we aggregate the optimality results from all considered models, and in stage (5) we rank the attacks based on their average optimality, namely
global optimality
.
Attack | Original | Advertorch | Adv_lib | ART | CleverHans | DeepRobust | Foolbox | Torchattacks |
---|---|---|---|---|---|---|---|---|
DDN | ☒ | ✓ | ☒ | ☒ | ☒ | ✓ | ☒ | |
ALMA | ☒ | ☒ | ✓ | ☒ | ☒ | ☒ | ☒ | ☒ |
FMN | ✓ | ☒ | ✓ | ☒ | ☒ | ☒ | ✓ | ☒ |
PGD | ☒ | ✓ | ✓ | ✓ | ✓ | |||
JSMA | ☒ | ☒ | ✓ | ☒ | ☒ | ☒ | ☒ | |
CW-L2 | ☒ | ✓ | ✓ | ~ | ✓ | ✓ | ||
CW-LINF | ☒ | ☒ | ✓ | ✓ | ☒ | ☒ | ☒ | ☒ |
FGSM | ☒ | ☒ | ✓ | ✓ | ||||
BB | ☒ | ☒ | ☒ | ✓ | ☒ | ☒ | ✓ | ☒ |
DF | ✓ | ☒ | ☒ | ✓ | ☒ | ~ | ✓ | ✓ |
APGD | ✓ | ☒ | ✓ | ✓ | ☒ | ☒ | ☒ | ✓ |
BIM | ☒ | ☒ | ✓ | ☒ | ☒ | |||
EAD | ☒ | ☒ | ✓ | ☒ | ☒ | ✓ | ☒ | |
PDGD | ☒ | ☒ | ✓ | ☒ | ☒ | ☒ | ☒ | ☒ |
PDPGD | ☒ | ☒ | ✓ | ☒ | ☒ | ☒ | ☒ | ☒ |
TR | ✓ | ☒ | ✓ | ☒ | ☒ | ☒ | ☒ | ☒ |
FAB | ✓ | ✓ | ☒ | ☒ | ☒ | ☒ | ✓ |
Legend:
- empty : not implemented yet
- ☒ : not available
- ✓ : implemented
- ~ : not functional yet
- python==3.9
- sacred
- pytorch==1.12.1
- torchvision==0.13.1
- adversarial-robustness-toolbox
- foolbox
- torchattacks
- cleverhans
- deeprobust
- robustbench https://github.com/RobustBench/robustbench
- adv_lib https://github.com/jeromerony/adversarial-library
Clone the Repository:
git clone https://github.com/attackbench/attackbench.git
cd attackbench
Use the provided environment.yml
file to create a Conda environment with the required dependencies:
conda env create -f environment.yml
Activate the Conda environment:
conda activate attackbench
To run the FMN-$\ell_2$ attack implemented within the adversarial lib
library against the augustin_2020
DDN on CIFAR10 and save the results in the results_dir/
directory:
conda activate attackbench
python -m attack_evaluation.run -F results_dir/ with model.augustin_2020 attack.adv_lib_fmn attack.threat_model="l2" dataset.num_samples=1000 dataset.batch_size=64 seed=42
Command Breakdown:
-
-F results_dir/
: Specifies the directory results_dir/ where the attack results will be saved. -
with
: Keyword for sacred. -
model.augustin_2020
: Specifies the target model augustin_2020 to be attacked. -
attack.adv_lib_fmn
: Indicates the use of the FMN attack from the adv_lib library. -
attack.threat_model="l2"
: Sets the threat model to$\ell_2$ , constraining adversarial perturbations based on the$\ell_2$ norm. -
dataset.num_samples=1000
: Specifies the number of samples to use from the CIFAR-10 dataset during the attack. -
dataset.batch_size=64
: Sets the batch size for processing the dataset during the attack. -
seed=42
: Sets the random seed for reproducibility.
After the attack completes, you can find the results saved in the specified results_dir/ directory.
Tthe wrappers for all the implementations (including libraries) must have the following format:
- inputs:
-
model
:nn.Module
taking inputs in the [0, 1] range and returning logits in$\mathbb{R}^K$ -
inputs
:FloatTensor
representing the input samples in the [0, 1] range -
labels
:LongTensor
representing the labels of the samples -
targets
:LongTensor
orNone
representing the targets associated to each samples -
targeted
:bool
flag indicating if a targeted attack should be performed
-
- output:
-
adv_inputs
:FloatTensor
representing the perturbed inputs in the [0, 1] range
-
If you use the AttackBench leaderboards or implementation, then consider citing our paper:
@article{CinaRony2024AttackBench,
author = {Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, Fabio Roli },
title = {AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples},
journal = {ArXiv},
year = {2024},
}
Feel free to contact us about anything related to AttackBench
by creating an issue, a pull request or
by email at [email protected]
.