DTA: Physical Camouflage Attacks using Differentiable Transformation Network

CVPR 2022

¹Pusan National University

²SmartM2M

³Agency for Defense Development

Follow-up work is available: ACTIVE-ICCV2023

Abstract

To perform adversarial attacks in the physical world, many studies have proposed adversarial camouflage, a method to hide a target object by applying camouflage patterns on 3D object surfaces. For obtaining optimal physical adversarial camouflage, previous studies have utilized the so-called neural renderer, as it supports differentiability. However, existing neural renderers cannot fully represent various real-world transformations due to a lack of control of scene parameters compared to the legacy photo-realistic renderers. In this paper, we propose the Differentiable Transformation Attack (DTA), a framework for generating a robust physical adversarial pattern on a target object to camouflage it against object detection models with a wide range of transformations. It utilizes our novel Differentiable Transformation Network (DTN), which learns the expected transformation of a rendered object when the texture is changed while preserving the original properties of the target object. Using our attack framework, an adversary can gain both the advantages of the legacy photo-realistic renderers including various physical-world transformations and the benefit of white-box access by offering differentiability. Our experiments show that our camouflaged 3D vehicles can successfully evade state-of-the-art object detection models in the photo-realistic environment (i.e., CARLA on Unreal Engine). Furthermore, our demonstration on a scaled Tesla Model 3 proves the applicability and transferability of our method to the real world.

Photo-Realistic Simulation Demo	Real World Demo

DTA Framework

As shown in the top picture, DTA Framework consists of four components: Photo-Realistic Rendering Engine, Repeated Texture Projection Function, Differentiable Transformation Network (DTN), and the target Object Detection Model.

Photo-Realistic Rendering Engine

Photo-realistic rendering engine is any software that can produce a photo-realistic image which is similar to the real physical world. In our work, we use Carla Simulator (ver. 0.9.11) on Unreal Engine (ver. 4.2) to synthesize our dataset as well as to evaluate our generated texture on photo-realsitic simulation setting. We modify original code to allow car's texture modification. The video illustrates the output of the rendering engine we use.

Differentiable Transformation Network (DTN)

The video illustrates how our DTN can correctly predict the rendered image when the texture (color) is changed. The network retains the original target properties such as material, light reflection, and shadow from other objects.

Repeated Texture Projection Function

The video illustrates how our DTN + Repeated Texture Projection Function can be used to mimic the repeated pattern produced by the photo-realistic rendering engine. This gives us the differentiable version of the photo-realistic renderer, allowing us to use gradient-based optimization to find the optimum repeated attack texture.

Framework Procedure

DTN Model Training

Before using DTA to generate the adversarial pattern, We need to train DTN with the dataset generated by the photo-realistic rendering engine (see the first video on how the dataset is generated). First, we select a set of random flat color textures and predefined transformations. Then, we use the rendering engine to produce the photo-realistic images that will later be used as reference image x_ref , expected texture η_exp , and ground truth of rendered image x_ren . See the figure on the right for DTN training diagram.

DTA Attacking Phase

In the attack phase, the goal is to minimize the original target confidence score, which prevents the object detector from detecting the target object correctly. We use the differentiability of the complete DTA Framework to find the best adversarial pattern η_adv that minimizes the attack loss L_atk by updating the η_adv based on the loss gradient. The right figure shows the training history for DTA targeting EfficientDetD0 model of Toyota Camry car.

Evaluation Results

We compare our adversarial camouflage with a random pattern and previous works on 3D physical attacks: CAMOU, ER, UPC, and DAS We closely follow the approach to replicate the original papers, but we rebuild the environment and target models based on our evaluation setup (see supplementary material for the details). However, UPC and DAS have different settings to recreate in our environment; thus, we only evaluate them on the transferability experiment. Finally, we evaluate the transferability and applicability of our camouflage pattern in the real-world setting. We built two 1:10 scaled Tesla Model 3 using a 3D printer, each representing the normal and our camouflage texture. Then, we evaluate them in real-life locations, indoor and outdoor.

Photo-Realistic Simulation Evaluation

[Click the link to show sample demo videos]

Transferability Evaluation

[Click the link to show sample demo videos]

Real-World Evaluation

Citation


@InProceedings{Suryanto_2022_CVPR,
    author    = {Suryanto, Naufal and Kim, Yongsu and Kang, Hyoeun and Larasati, Harashta Tatimma and 
    Yun, Youngyeo and Le, Thi-Thu-Huong and Yang, Hunmin and Oh, Se-Yoon and Kim, Howon},
    title     = {DTA: Physical Camouflage Attacks Using Differentiable Transformation Network},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {15305-15314}
}

Paper

DTA: Physical Camouflage Attacks using Differentiable Transformation Network

Naufal Suryanto, Yongsu Kim, Hyoeun Kang, Harashta Tatimma Larasati, Youngyeo Yun, Thi-Thu-Huong Le, Hunmin Yang, Se-Yoon Oh, Howon Kim

description Paper

description Supplementary

description arXiv

description BibTeX

DTA: Physical Camouflage Attacks using Differentiable Transformation Network

CVPR 2022

Abstract

DTA Framework

Photo-Realistic Rendering Engine

Differentiable Transformation Network (DTN)

Repeated Texture Projection Function

Framework Procedure

DTN Model Training

DTA Attacking Phase

Evaluation Results

Photo-Realistic Simulation Evaluation

Target: EfficientDetD0 | Camera: 5m Distance, 15° Pitch, 360° Rotation

Target: EfficientDetD0 | Camera: 10m Distance, 15° Pitch, 360° Rotation

Target: EfficientDetD0 | Camera: 10m Distance, 30° Pitch, 360° Rotation

Target: EfficientDetD0 | Camera: 15m Distance, 30° Pitch, 360° Rotation

Transferability Evaluation

Target: SSD | Camera: 5m Distance, 0° Pitch, 360° Rotation

Target: Faster-RCNN | Camera: 10m Distance, 15° Pitch, 360° Rotation

Target: Mask-RCNN | Camera: 10m Distance, 30° Pitch, 360° Rotation

Real-World Evaluation

Citation

Paper