Skip to content

Commit

Permalink
added adversarial examples
Browse files Browse the repository at this point in the history
  • Loading branch information
LIN Yun committed Aug 2, 2020
1 parent 715221e commit 08823ad
Show file tree
Hide file tree
Showing 24 changed files with 1,720 additions and 44 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ wandb/
.vscode/
data
data/
datasets/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
113 changes: 113 additions & 0 deletions REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Adversarial Examples for Object Detection

This repository contains experiments on the generation of Adversarial Examples for Object Detection tasks, in particular for logo and input box detection for phishing website detection applications.

The primary technique used is the Dense Adversary Generation (DAG) algorithm from [Adversarial Examples for Semantic Segmentation and Object Detection (Xie et al., ICCV 2017)](https://arxiv.org/abs/1703.08603), and is implemented on the [Detectron2](https://github.com/facebookresearch/detectron2) framework.


## Introduction
Recent research has exposed the security vulnerabilities of ML models, in particular showing that adding imperceptible perturbations to an image can cause drastically different model performance.

One of the first and most popular adversarial attacks to date is the Fast Gradient Sign Attack (FGSM) described by Goodfellow et al. in [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572). The idea is to abuse the way neural networks learn, through gradients; rather than tweaking the model's weights to minimize the loss, the attack adjusts the input data/image instead to maximize the loss through backpropagated gradients.

![panda](figs/panda.png)

In other words, you take the gradient of the classification loss w.r.t. the input image, and add that to the image. The result is a misclassification.

Xi et al. proposes a natural extension of adversarial attacks on image classification to object detection tasks: instead of tweaking the image to make the model perform badly on a single task, i.e classification, you tweak the image to make the model perform badly on multiple tasks in tandem, i.e. localization and classification of objects of interest in the image.

Hence, it's just a matter of changing the loss function to reflect your adversarial objective.


## Method
The DAG algorithm presented is a **white-box attack**; it assumes the attacker has full knowledge and access to the model, including architecture, inputs, outputs and weights. Furthermore, it can perform **source/target misclassification**; it allows the attacker to specify the target class to misclassify the object as.

DAG maximizes the following objective:

![objective](figs/objective.png)

To break it down, for each target (object) on the image, it will:
- Suppress the confidence of the ground-truth class, while
- Increasing the confidence of the adversarial class

In other words, maximizing this objective will cause every target to be incorrectly predicted as the adversarial label.

However, singling out the targets on the image to attack (which regions to misclassify) is non-trivial task, and the authors propose utilizing the region proposals from the Region Proposal Network (RPN) of the object detection model as a starting point. In addition, they suggest increasing the Non-maximum Suppression (NMS) threshold of the RPN to 0.9, allowing for a larger or **denser** set of targets to be proposed (hence the name of the algorithm).

As follows is a sketch of the complete DAG algorithm:
1. Acquire target proposals from the RPN (with NMS threshold of 0.9)
2. Filter for targets that actually overlap with the ground-truth object, and with a confidence score for the ground-truth class > 0.1 (i.e. narrow down to a robust set of targets to attack).
3. Randomly assign an adversarial label to each target
4. Then for N iterations (N = 150 suggested by the authors):
1. Compute the objective, only using targets still not misclassified as the adversarial label
2. Take the normalized gradient of the objective and add it to the image
3. Terminate if all targets are misclassified as desired, else repeat


## Results
All experiments are performed using [Detectron2's implementation](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md) of Faster R-CNN, with ResNet50 backbone and Feature Pyramid Network.

Evaluation is performed on a hold-set test set (disjoint from the training set).

In all cases, the adversarial attack manages to break the model and reduces all the metrics reported drastically.

### COCO Dataset
| | Original | Adversarial |
|---------------------|----------|-------------|
| mAP (IoU=0.50:0.95) | 40.2 | 0.0 |

Some qualitative results, with original prediction on the left, and prediction on adversarial image on the right.

![coco_eg_1](figs/coco_eg_1.png)

![coco_eg_2](figs/coco_eg_2.png)

![coco_eg_3](figs/coco_eg_3.png)

### Phishing Dataset
| | Original | Adversarial |
|--------------------------|----------|-------------|
| mAP (IoU=0.50:0.95) | 59.7 | 2.9 |
| | | |
| Input AP (IoU=0.50:0.95) | 70.0 | 0.8 |
| Input Recall (IoU=0.50) | 98.1 | 20.8 |
| Input Recall (IoU=0.70) | 95.3 | 3.4 |
| Input Recall (IoU=0.85) | 74.9 | 0.1 |
| | | |
| Logo AP (IoU=0.50:0.95) | 49.3 | 5.0 |
| Logo Recall (IoU=0.50) | 94.1 | 28.7 |
| Logo Recall (IoU=0.70) | 80.9 | 15.9 |
| Logo Recall (IoU=0.85) | 38.0 | 3.5 |

Some qualitative results, with original prediction on the left, and prediction on adversarial image on the right.

![phish_eg_1](figs/phish_eg_1.png)

![phish_eg_2](figs/phish_eg_2.png)

![phish_eg_3](figs/phish_eg_3.png)


## Future Work
### Computational Efficiency
While effective, the algorithm requires up to ~150 iterations on every single image. While this allows us to trade computational efficiency for severity of attack, a more efficient algorithm can still be explored.

### More fine-grained control over prediction target
Similar to adversarial attacks for image classification tasks, the DAG algorithm allows the attacker to specify which target class to misclassify as. However in object detection, in addition to classification, it also has a bounding box regression output. An interesting direction to explore would be to allow the attacker to control where these bounding boxes land.

### Defense
Techniques such as adversarial training are well-proven defenses against adversarial attacks in the image classification setting. Whether these approaches transfer over to object detection, or development of a more specialized defense is required is worth delving into.


## Resources
### Adversarial Machine Learning
Here is a good [Reading List (Nicholas Carlini)](https://nicholas.carlini.com/writing/2018/adversarial-machine-learning-reading-list.html) as a starting point to get acquainted with the foundations of adversarial machine learning.

### Object Detection with Deep Learning
If you'll like to refresh your knowledge of deep learning for computer vision and its advanced applications, I recommend [this course](https://web.eecs.umich.edu/~justincj/teaching/eecs498/) by the University of Michigan. It's fairly up-to-date (Fall 2019) and the lecture videos are publicly available.

### Detectron2
When working with


## References
32 changes: 32 additions & 0 deletions configs/faster_rcnn_bet365.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
_BASE_: "./bases/Base-RCNN-FPN.yaml"
MODEL:
# COCO ResNet50 weights
WEIGHTS: "https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl"
MASK_ON: False # Not doing segmentation
RESNETS:
DEPTH: 50 # ResNet50
ROI_HEADS:
NUM_CLASSES: 2 # Change to suit own task
# Can reduce this for lower memory/faster training; Default 512
BATCH_SIZE_PER_IMAGE: 512
BACKBONE:
FREEZE_AT: 2 # Default 2
DATASETS:
TRAIN: ("benign_bet365",)
TEST: ("benign_test",)
DATALOADER:
NUM_WORKERS: 0
SOLVER:
IMS_PER_BATCH: 8 # Batch size; Default 16
BASE_LR: 0.00001
# (2/3, 8/9)
STEPS: (600, 900) # The iteration number to decrease learning rate by GAMMA.
MAX_ITER: 1000 # Number of training iterations
CHECKPOINT_PERIOD: 100 # Saves checkpoint every number of steps
INPUT:
MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) # Image input sizes
TEST:
# The period (in terms of steps) to evaluate the model during training.
# Set to 0 to disable.
EVAL_PERIOD: 100
OUTPUT_DIR: "./output" # Specify output directory
Loading

0 comments on commit 08823ad

Please sign in to comment.