What is an Adversarial Example and Why Do They Matter?

4 min readDec 31, 2020

VGG-16 classifies the image (left) as a piggy bank, but after adding some noise (middle), the network can no longer accurately classify the new image (right).

A neural network classifies the image on the left as a “piggy bank,” but after adding a little noise, the classification changes to “theater curtain.” The image on the right is an adversarial example because it causes the network to erroneously change its classification after a change that is nearly imperceptible to humans.

Any human would tell you that both the left and right images are piggy banks, but this neural network has changed its labeling after a minor tweak to the image pixels — showing us just how brittle the decision boundaries of neural networks actually are.

You may be asking yourself, “Was this particular image hard to find? Is this particular network (VGG-16) the problem?”

This particular image is not the problem. It is actually very simple to find more adversarial examples within a dataset. And this particular network isn’t the problem either. In fact, almost all neural network architectures are susceptible to this vulnerability despite achieving 92.7% top-5 test accuracy on ImageNet in the case of VGG-16. Without a fix, applications which employ neural network models are susceptible to adversarial attacks, which could result in negative consequences. Autonomous vehicles could veer off after erroneous detections, facial recognition systems could be evaded, and more.

In this article, I will describe three interesting and practical adversarial examples developed in recent years.

Fooling Road Sign Classifiers

In 2018, Eykholt et al. [1] demonstrated that by adding a few black and white stickers in carefully chosen locations on a Stop sign, they could fool a road sign classifier to think a “Stop” sign is a “Speed Limit 45” sign.

From Figure 1 of “Robust Physical-World Attacks on Deep Learning Visual Classification” by Eykholt et al., 2018

This physical adversarial example surfaces a big vulnerability for autonomous car makers. Road signs are critical in transportation safety, yet the possibility that an adversary could add these stickers to a road sign is high. One could reasonably expect that most autonomous car makers will need a road sign classifier integrated within their self-driving pipeline — so that classifier will need to be robust against attacks like these.

Fooling Person Detectors

In 2020, Wu et al. [2] showed they could evade a person detector by wearing a sweatshirt with a carefully optimized pattern.

From Figure 1 of “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors” by Wu et al., 2020

Notice that the YOLOv2 detector, on which they performed their experiments, is unable to place a bounding box around the central human wearing the “invisibility cloak” sweatshirt. The team performed additional experiments with printable posters and other wearable clothes, finding that object detectors are also vulnerable to physical adversarial examples. The authors note that object detectors are harder to fool than classifiers.

Fooling 3D Object Classifiers

Also in 2020, Zeng et al. [3] developed and tested the efficacy of 3D adversarial examples: scenes where physical properties like translation, rotation, and illumination are changed to cause misclassification of objects. In the following figure, note how the car (left) and train (right) are changed ever so slightly to cause AlexNet and ResNet to misclassify these 3D objects as pillows and vessels.

Figure 3 of “Adversarial Attacks Beyond the Image Space” by Zeng et al., 2020

Zeng et al. generalized adversarial examples beyond 2D pixels to 3D physical parameters. They found that neural networks are sensitive to physical perturbations, but that 3D adversarial examples are not nearly as effective as those in 2D. I believe the work of Zeng et al. could also have implications for networks which process 3D point clouds for classification and detection tasks. Because depth cameras and LIDAR produce point clouds as raw data, exploring and understanding the threat model in this case will be important as more machine learning models process this kind of data.

Whether it be road sign classification, person detection, or 3D object classification, neural networks have become the model of choice for many visual tasks. Despite their popularity, these models come with a big vulnerability in the form of adversarial examples. More work is needed to determine how to better defend against adversarial attacks, how to fix the models, or how to change our learning algorithms.

A few notes:

The adversarial example in the first image of this article (with the piggy bank) was created with a universal adversarial perturbation. The middle perturbation is considered universal because it can be added to any image to fool a VGG-16 with high likelihood.

References:

[1] K. Eykholt et al., “Robust Physical-World Attacks on Deep Learning Visual Classification,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 1625–1634, doi: 10.1109/CVPR.2018.00175.

[2] Wu, Zuxuan et al., “Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors.” Lecture Notes in Computer Science (2020): 1–17. Crossref. Web.

[3] Zeng, Xiaohui et al., “Adversarial Attacks Beyond the Image Space.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019): n. pag. Crossref. Web.

What is an Adversarial Example and Why Do They Matter?

Written by Pedro Sandoval-Segura