Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection as a defense #11

Open
oceank opened this issue Aug 22, 2019 · 1 comment
Open

Detection as a defense #11

oceank opened this issue Aug 22, 2019 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@oceank
Copy link
Member

oceank commented Aug 22, 2019

The idea is to differentiate BS and AE based on their output from distinct transform models.

In Detecing adversarial samples from artifacts.pdf, it is shown that different models make different mistakes when presented to same AEs. And 2018-arXiv-PictureAE-Picture_AE_detection_bimodel.pdf proposes Bi-model approach that concatenates the output of an image from two distinct models as its feature representation and then feeds it to a binary classifier for classification. The approach is claimed to reach >90% detection accuracy on mnist and cifar10.

we can concatenate/stack up the output of transform models for an input image and use it as a representation of the image and feed into a binary classifier. This might could have a higher detection accuracy and generalize better across different type of attacks.

Investigation:

  1. Identify patterns of the prediction output of BS and AE
    for BS and each type of AE, plot the boxplot for the average, min and max accuracy of all transform models

  2. Detection approach 1: majority voting
    the prediction output of AEs is much more diverse than that of BS. That is, the number of transform models agrees with each other on AE will be much smaller than that on BS. If this number is below some threshold, say 75% X total_number_of_models, the input image will be marked as an AE. Otherwise, it is considered as a benign sample.

  3. Detection approach 2: distance matrix
    【empirical evidence to collection】: distance of prediction outputs of a benign sample from two distinct transform models is close to 0, while the distance of prediction outputs of an AE from two distinct transform models should be much larger than 0.
    Try with different distance metrics: L2, entropy, KL divergence, cosine, correlation

    【distance matrix】: for an image, create a distance matrix by computing the distances of its prediction outputs between each pair of transform models. Investigate any possible property or difference between AE and BS

@oceank oceank added the enhancement New feature or request label Aug 22, 2019
@oceank
Copy link
Member Author

oceank commented Aug 23, 2019

Based on the initial experiment (check the result detail in the commit d0190cb), detection approach 1 together with the majority-voting based classification (recovery) could provide a very impressive success rate of either correctly detecting an AE as AE or classify the input image to its true label no matter if it is a benign sample or AE.

The hypothesis about the distance matrix (using L2 norm) is confirmed. For different types of AEs, rich pattern information shows up in the heatmap of their distance matrix. More investigation is encouraged.

@MENG2010 MENG2010 assigned MENG2010 and oceank and unassigned MENG2010 Aug 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants