Detection as a defense #11

oceank · 2019-08-22T15:31:32Z

The idea is to differentiate BS and AE based on their output from distinct transform models.

In Detecing adversarial samples from artifacts.pdf, it is shown that different models make different mistakes when presented to same AEs. And 2018-arXiv-PictureAE-Picture_AE_detection_bimodel.pdf proposes Bi-model approach that concatenates the output of an image from two distinct models as its feature representation and then feeds it to a binary classifier for classification. The approach is claimed to reach >90% detection accuracy on mnist and cifar10.

we can concatenate/stack up the output of transform models for an input image and use it as a representation of the image and feed into a binary classifier. This might could have a higher detection accuracy and generalize better across different type of attacks.

Investigation:

Identify patterns of the prediction output of BS and AE
for BS and each type of AE, plot the boxplot for the average, min and max accuracy of all transform models
Detection approach 1: majority voting
the prediction output of AEs is much more diverse than that of BS. That is, the number of transform models agrees with each other on AE will be much smaller than that on BS. If this number is below some threshold, say 75% X total_number_of_models, the input image will be marked as an AE. Otherwise, it is considered as a benign sample.
Detection approach 2: distance matrix
【empirical evidence to collection】: distance of prediction outputs of a benign sample from two distinct transform models is close to 0, while the distance of prediction outputs of an AE from two distinct transform models should be much larger than 0.
Try with different distance metrics: L2, entropy, KL divergence, cosine, correlation

【distance matrix】: for an image, create a distance matrix by computing the distances of its prediction outputs between each pair of transform models. Investigate any possible property or difference between AE and BS

oceank · 2019-08-23T17:24:14Z

Based on the initial experiment (check the result detail in the commit d0190cb), detection approach 1 together with the majority-voting based classification (recovery) could provide a very impressive success rate of either correctly detecting an AE as AE or classify the input image to its true label no matter if it is a benign sample or AE.

The hypothesis about the distance matrix (using L2 norm) is confirmed. For different types of AEs, rich pattern information shows up in the heatmap of their distance matrix. More investigation is encouraged.

oceank added the enhancement New feature or request label Aug 22, 2019

MENG2010 assigned MENG2010 and oceank and unassigned MENG2010 Aug 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection as a defense #11

Detection as a defense #11

oceank commented Aug 22, 2019

oceank commented Aug 23, 2019

Detection as a defense #11

Detection as a defense #11

Comments

oceank commented Aug 22, 2019

oceank commented Aug 23, 2019