Adversarial Patch Attack (PGD) Visualization

This visualization demonstrates how adversarial patches can fool object detection models

0.10
40
1.5

YOLOv3

Detection: Person (96%)
Attack Progress: 0%

How Adversarial Patch Attacks Work

An adversarial patch is a carefully crafted image that, when placed within a scene, causes object detection models to make incorrect predictions.

The Projected Gradient Descent (PGD) algorithm gradually modifies the patch to maximize the model's error:

  1. Start with a random or pre-designed patch
  2. Compute the gradient of the model's loss with respect to the patch pixels
  3. Perturb the patch in the direction that increases the loss
  4. Project the perturbation back to a constrained space (controlled by epsilon)
  5. Repeat for multiple iterations

Targeted attacks aim to make the model predict a specific incorrect class, while untargeted attacks simply try to prevent detection of the real object.