Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers
🌈 Abstract
The article discusses a "Switchable Backdoor Attack" against pre-trained vision transformers, which is a type of backdoor attack that is difficult to detect and remove. The key points are:
🙋 Q&A
[01] Algorithm Outline
1. What is the algorithm outline for the proposed attack? The algorithm outline is not explicitly provided in the document. The document focuses on describing the implementation details and experimental setups.
2. What are the key components of the proposed attack method? The key components include:
- Using prompts with clean tokens and a switchable trigger token
- Training the model in two modes: clean mode and backdoor mode
- Leveraging trigger learning to balance the performance on benign accuracy and attack success rate
[02] Implementation Details
1. What upstream backbones were used in the experiments? The experiments used three different upstream backbones: ViT, Swin, and ConvNeXt. These models have different architectures, parameter counts, and feature dimensions, but were all pre-trained on ImageNet-21k.
2. What datasets were used to evaluate the proposed attack and defense methods? The experiments used four datasets: CIFAR100, Flowers102, Pets, and DMLab. These datasets represent natural, specialized, and structured tasks, and have more classes and test samples compared to commonly used datasets.
3. What were the key hyperparameter settings for the proposed attack method? The key hyperparameter settings include:
- 50 clean tokens in the prompts
- Xavier uniform initialization for the prompts
- Cosine learning rate schedule, SGD optimizer with momentum 0.9
- 1:1 ratio between clean loss and backdoor loss
- Using the same amount of clean and triggered images during training
4. How were the baseline attack methods (BadNets, Blended, WaNet, ISSBA) implemented? The baseline attack methods were implemented with the following settings:
- Poison rate of 20%
- Trigger sizes and patterns tailored to the input image sizes
- Specific hyperparameters for each attack method as described in the original papers
5. How were the defense methods (Scale-Up, TeCo, NAD, I-BAU) implemented? The defense methods were implemented with the following settings:
- Scale-Up: Amplifying image pixels 1 to 11 times
- TeCo: Using various image corruptions to evaluate prediction consistency
- NAD: Using a teacher network to guide the fine-tuning of the backdoored student network
- I-BAU: Leveraging implicit hypergradient to account for the interdependence between inner and outer optimization
[03] Extra Experiments
1. What was the impact of trigger learning on the attack performance? The ablation study showed that trigger learning is important for balancing the benign accuracy and attack success rate. Without trigger learning, the performance in either clean or backdoor mode would suffer.
2. How did the choice of the noise limit (ε) affect the attack performance? Increasing the noise limit (ε) improved both the benign accuracy and attack success rate, and the performance became stable when ε was sufficiently large.
3. How did the prompt length affect the attack performance? Increasing the prompt length from 10 to 50 improved the benign accuracy and attack success rate, with the performance becoming stable at 50 prompts.
4. How robust was the proposed attack to the STRIP defense method? The proposed SWARM attack showed very high false acceptance rates (over 95%) on the STRIP defense, significantly outperforming the baseline attacks.
5. How robust was the proposed attack to fine-tuning as a backdoor mitigation method? The SWARM attack maintained high attack success rates (over 95%) even after fine-tuning, outperforming the baseline attacks which suffered significant performance degradation.
[04] Social Impact
1. What are the potential risks and harms of the proposed backdoor attack? The proposed backdoor attack is easy to implement, resource-efficient, and hard to detect and mitigate. If deployed by adversaries, it could cause significant harm to the reliability and trustworthiness of machine learning systems.
2. What is the author's perspective on the responsible development of pre-trained models? The authors acknowledge the potential negative impact of their work, but propose it to raise awareness and encourage the community to focus more on the security and trustworthiness of the pre-training paradigm, with the goal of ultimately improving the reliability of machine learning systems for the benefit of society.