AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

Antonio Emanuele Cinà $^\star$, Jérôme Rony $^\star$, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, and Fabio Roli

Leaderboard: https://attackbench.github.io/

Paper:

How it works

The AttackBench framework wants to fairly compare gradient-based attacks based on their security evaluation curves. To this end, we derive a process involving five distinct stages, as depicted below.

In stage (1), we construct a list of diverse non-robust and robust models to assess the attacks' impact on various settings, thus testing their adaptability to diverse defensive strategies.
In stage (2), we define an environment for testing gradient-based attacks under a systematic and reproducible protocol. This step provides common ground with shared assumptions, advantages, and limitations. We then run the attacks against the selected models individually and collect the performance metrics of interest in our analysis, which are perturbation size, execution time, and query usage.
In stage (3), we gather all the previously-obtained results, comparing attacks with the novel local optimality metric.
Finally, in stage (4), we aggregate the optimality results from all considered models, and in stage (5) we rank the attacks based on their average optimality, namely global optimality.

Currently implemented

Attack	Original	Advertorch	Adv_lib	ART	CleverHans	DeepRobust	Foolbox	Torchattacks
DDN	☒		✓	☒	☒	☒	✓	☒
ALMA	☒	☒	✓	☒	☒	☒	☒	☒
FMN	✓	☒	✓	☒	☒	☒	✓	☒
PGD	☒		✓	✓		✓		✓
JSMA	☒		☒	✓	☒	☒	☒	☒
CW-L2	☒		✓	✓		~	✓	✓
CW-LINF	☒	☒	✓	✓	☒	☒	☒	☒
FGSM	☒		☒	✓				✓
BB	☒	☒	☒	✓	☒	☒	✓	☒
DF	✓	☒	☒	✓	☒	~	✓	✓
APGD	✓	☒	✓	✓	☒	☒	☒	✓
BIM	☒		☒	✓		☒		☒
EAD	☒		☒	✓	☒	☒	✓	☒
PDGD	☒	☒	✓	☒	☒	☒	☒	☒
PDPGD	☒	☒	✓	☒	☒	☒	☒	☒
TR	✓	☒	✓	☒	☒	☒	☒	☒
FAB	✓		✓	☒	☒	☒	☒	✓

Legend:

empty : not implemented yet
☒ : not available
✓ : implemented
~ : not functional yet

Requirements and Installations

python==3.9
sacred
pytorch==1.12.1
torchvision==0.13.1
adversarial-robustness-toolbox
foolbox
torchattacks
cleverhans
deeprobust
robustbench https://github.com/RobustBench/robustbench
adv_lib https://github.com/jeromerony/adversarial-library

Clone the Repository:

git clone https://github.com/attackbench/attackbench.git
cd attackbench

Use the provided environment.yml file to create a Conda environment with the required dependencies:

conda env create -f environment.yml

Activate the Conda environment:

conda activate attackbench

Usage

To run the FMN-$\ell_2$ attack implemented within the adversarial lib library against the augustin_2020 DDN on CIFAR10 and save the results in the results_dir/ directory:

conda activate attackbench
python -m attack_evaluation.run  -F results_dir/ with model.augustin_2020 attack.adv_lib_fmn attack.threat_model="l2" dataset.num_samples=1000 dataset.batch_size=64 seed=42

Command Breakdown:

-F results_dir/: Specifies the directory results_dir/ where the attack results will be saved.
with: Keyword for sacred.
model.augustin_2020: Specifies the target model augustin_2020 to be attacked.
attack.adv_lib_fmn: Indicates the use of the FMN attack from the adv_lib library.
attack.threat_model="l2": Sets the threat model to $\ell_2$, constraining adversarial perturbations based on the $\ell_2$ norm.
dataset.num_samples=1000: Specifies the number of samples to use from the CIFAR-10 dataset during the attack.
dataset.batch_size=64: Sets the batch size for processing the dataset during the attack.
seed=42: Sets the random seed for reproducibility.

After the attack completes, you can find the results saved in the specified results_dir/ directory.

Attack format

Tthe wrappers for all the implementations (including libraries) must have the following format:

inputs:
- model: nn.Module taking inputs in the [0, 1] range and returning logits in $\mathbb{R}^K$
- inputs: FloatTensor representing the input samples in the [0, 1] range
- labels: LongTensor representing the labels of the samples
- targets: LongTensor or None representing the targets associated to each samples
- targeted: bool flag indicating if a targeted attack should be performed
output:
- adv_inputs: FloatTensor representing the perturbed inputs in the [0, 1] range

Citation

If you use the AttackBench leaderboards or implementation, then consider citing our paper:

@article{CinaRony2024AttackBench,
  author = {Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, Fabio Roli },
  title = {AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples},
  journal = {ArXiv},
  year = {2024},
}

Contact

Feel free to contact us about anything related to AttackBench by creating an issue, a pull request or by email at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 381 Commits
analysis		analysis
attack_evaluation		attack_evaluation
compiled		compiled
exp_configs		exp_configs
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
job_runner_cc.py		job_runner_cc.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

How it works

Currently implemented

Requirements and Installations

Usage

Attack format

Citation

Contact

About

Releases

Packages

Contributors 4

Languages

attackbench/AttackBench

Folders and files

Latest commit

History

Repository files navigation

AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

How it works

Currently implemented

Requirements and Installations

Usage

Attack format

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages