Adversarial Patch Attack (PGD)

How Adversarial Patch Attacks Work

An adversarial patch is a carefully crafted image that, when placed within a scene, causes object detection models to make incorrect predictions.

The Projected Gradient Descent (PGD) algorithm gradually modifies the patch to maximize the model's error:

Start with a random or pre-designed patch
Compute the gradient of the model's loss with respect to the patch pixels
Perturb the patch in the direction that increases the loss
Project the perturbation back to a constrained space (controlled by epsilon)
Repeat for multiple iterations

Targeted attacks aim to make the model predict a specific incorrect class, while untargeted attacks simply try to prevent detection of the real object.

Adversarial Patch Attack (PGD) Visualization

YOLOv3

How Adversarial Patch Attacks Work