[Paper Review]EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES,2015

Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature. Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples. If we add a small perturbation on the original input x:

in which η is small enough, so that:

In a linear model, the output will increase by:

Because of the L infinite constraint, the maximum increase caused by the perturbation is to assign each element in η with absolute value ?. The sign of η is same as the sign of ω, so that the weighted perturbation will increase the value of the equation.

For a non-linear model, the weight cannot be derived directly. However, if we assume it is linear somehow, it can be calculated by taking derivative of the cost function. So using the same idear of adding perturbation on linear model. The adversarial perturbation is:

A very interesting finding: the relationship between input perturbation and output of neural network is very linear!

In many cases, a wide variety of models with different architectures trained on different subsets of the training data misclassify the same adversarial example. This suggests that adversarial examples expose fundamental blind spots in our training algorithms. On some datasets, such as ImageNet (Deng et al., 2009), the adversarial examples were so close to the original examples that the differences were indistinguishable to the human eye.

These results suggest that classifiers based on modern machine learning techniques, even those that obtain excellent performance on the test set, are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village that works well on naturally occuring data, but is exposed as a fake when one visits points in space that do not have high probability in the data distribution.

We mainly study on the relationship between parameters and outputs on Machine Learning courses, they are non-linear. How I make sense of it is that we can arbitrarily change parameters to generate a really non-linear output. However, when the parameters are fixed, the relationship between input and output is much linear than we expected. And in his reply on a forum, a graph is given to show how outputs of a model change as the perturbations change.

References:

Goodfellow, Ian J., et al. Explaining and Harnessing Adversarial Examples. Dec. 2014.

Goodfellow’s lecture on Adversarial Machine Learning:https://www.youtube.com/watch?v=CIfsB_EYsVI&t=1750s

Deep Learning Adversarial Examples – Clarifying Misconceptions: https://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html

https://towardsdatascience.com/perhaps-the-simplest-introduction-of-adversarial-examples-ever-c0839a759b8d

https://towardsdatascience.com/know-your-adversary-understanding-adversarial-examples-part-1-2-63af4c2f5830

原文地址:https://www.cnblogs.com/rhyswang/p/12301480.html

时间: 2024-08-30 16:38:33

[Paper Review]EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES,2015的相关文章

(转)Deep Learning Research Review Week 1: Generative Adversarial Nets

Adit Deshpande CS Undergrad at UCLA ('19) Blog About Resume Deep Learning Research Review Week 1: Generative Adversarial Nets Starting this week, I’ll be doing a new series called Deep Learning Research Review. Every couple weeks or so, I’ll be summa

[Paper Review]Distilling the Knowledge in a Neural Network,2015

Analogy: Many insects have a larval form that is optimized for extracting energy and nutrients from the environment and a completely different adult form that is optimized for the very different requirements of travelling and reproduction. The proble

cs231n spring 2017 lecture16 Adversarial Examples and Adversarial Training 听课笔记

(没太听明白,以后再听) 1. 如何欺骗神经网络? 这部分研究最开始是想探究神经网络到底是如何工作的.结果人们意外的发现,可以只改变原图一点点,人眼根本看不出变化,但是神经网络会给出完全不同的答案.比如下图,左边的熊猫被识别成熊猫,但是加上中间的小"噪音"一样的数值,右图的熊猫就识别不出来了.而且这个小"噪音"不是随机的,它更像是offset,是某种系统误差,叠加到图片上去,总是可以欺骗神经网络. 2. 神经网络从权重到输出的映射是非线性的,非常复杂,非常难优化.训

(转) AdversarialNetsPapers

本文转自:https://github.com/zhangqianhui/AdversarialNetsPapers AdversarialNetsPapers The classical Papers about adversarial nets The First paper ? [Generative Adversarial Nets] [Paper] [Code](the first paper about it) Unclassified ? [Deep Generative Imag

(转) 简述生成式对抗网络

简述生成式对抗网络 [转载请注明出处]chenrudan.github.io 本文主要阐述了对生成式对抗网络的理解,首先谈到了什么是对抗样本,以及它与对抗网络的关系,然后解释了对抗网络的每个组成部分,再结合算法流程和代码实现来解释具体是如何实现并执行这个算法的,最后给出一个基于对抗网络改写的去噪网络运行的结果,效果虽然挺差的,但是有些地方还是挺有意思的. 1. 对抗样本 2. 生成式对抗网络GAN 3. 代码解释 4. 运行实例 5. 小结 6. 引用 1. 对抗样本(adversarial e

对抗样本论文总结

[1]Karparthy博客 Breaking Linear Classifiers on ImageNet http://karpathy.github.io/2015/03/30/breaking-convnets/ [2]Christian等人在ICLR2014最先提出adversarial examples的论文Intriguing properties of neural networks 论文下载到本地的第3篇 [3]Ian Goodfellow对对抗样本解释的论文Explainin

实验四:Tensorflow实现了四个对抗图像制作算法--readme

文章来源:Github Four adversarial image crafting algorithms are implemented with Tensorflow. The four attacking algorithms can be found in attacks folder. The implementation adheres to the principle tensor-in, tensor-out. They all return a Tensorflow oper

用Caffe生成对抗样本

同步自我的知乎专栏:https://zhuanlan.zhihu.com/p/26122612 上篇文章 瞎谈CNN:通过优化求解输入图像 - 知乎专栏 中提到过对抗样本,这篇算是针对对抗样本的一个小小扩充:用Fast Gradient Sign方法在Caffe中生成对抗样本. 本文代码的完整例子可以在下面地址下载: frombeijingwithlove/dlcv_for_beginners Fast Gradient Sign方法 先回顾一下 瞎谈CNN:通过优化求解输入图像 - 知乎专栏 

PyTorch 1.0 中文官方教程:对抗性示例生成

译者:cangyunye 作者: Nathan Inkawhich 如果你正在阅读这篇文章,希望你能理解一些机器学习模型是多么有效.现在的研究正在不断推动ML模型变得更快.更准确和更高效.然而,在设计和训练模型中经常会忽视的是安全性和健壮性方面,特别是在面对欺骗模型的对手时. 本教程将提高您对ML模型安全漏洞的认识,并将深入探讨对抗性机器学习这一热门话题.您可能会惊讶地发现,在图像中添加细微的干扰会导致模型性能的巨大差异.鉴于这是一个教程,我们将通过一个图像分类器上的示例来探索这个主题.具体来说