Summarize by Aili

STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay

🌈 Abstract

The paper introduces the problem of outlier-aware test-time adaptation (TTA), which aims to conduct both sample recognition and outlier rejection during inference when outliers exist in the test data. To address this problem, the authors propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch. STAMP consists of three key components: reliable class-balanced memory, self-weighted entropy minimization, and stable optimization strategy.

🙋 Q&A

[01] Outlier-Aware Test-Time Adaptation

1. What is the key difference between the traditional closed-set TTA and the outlier-aware TTA introduced in this paper? The key difference is that in the traditional closed-set TTA, it is commonly assumed that the training and test data share the same label space. However, in the outlier-aware TTA scenario considered in this paper, the test data can contain samples from new semantic classes (outliers) that are not present in the training data.

2. What are the two main risks introduced by the presence of outliers in the test data during adaptation? The two main risks are:

Classifying outliers into known categories poses safety risks, especially in critical applications like autonomous driving.
Self-supervised losses (e.g., entropy minimization, pseudo-labeling) calculated with outliers may misguide the optimization process, decreasing the model's recognition ability on known classes.

3. What is the goal of outlier-aware test-time adaptation? The goal of outlier-aware test-time adaptation is to conduct recognition for normal samples and reject to make predictions for outliers. The algorithm should generate both a prediction and an out-of-distribution (OOD) score for each sample, and then decide whether to keep the prediction or reject the sample based on its OOD score.

[02] STAMP Methodology

1. What are the three key components of the STAMP algorithm? The three key components of STAMP are:

Reliable class-balanced memory: A memory bank that stores reliable samples for optimization by filtering samples based on their prediction's consistency and entropy.
Self-weighted entropy minimization: A strategy that assigns higher weights to low-entropy samples in the memory during optimization.
Stable optimization strategy: A combination of sharpness-aware minimization (SAM) and test-time step size decay to ensure the stability of the optimization process.

2. How does STAMP maintain a class-balanced prediction of the model? STAMP dynamically discards samples with the class that participates most frequently in optimization when the memory is full, in order to maintain a class-balanced prediction of the model.

3. What is the purpose of the self-weighted entropy minimization in STAMP? The self-weighted entropy minimization in STAMP serves two purposes:

It assigns greater weights to low-entropy samples in the memory, guiding the model to focus more on reliable samples.
It adjusts the weights for each sample adaptively, allowing the model to learn to assign higher weights to important samples.

4. How does STAMP ensure the stability of the optimization process? STAMP ensures the stability of the optimization process in two ways:

It dynamically adjusts the optimization step size, starting with a larger step size to quickly adapt to the target domain and then reducing it over time to mitigate error accumulation.
It applies the sharpness-aware minimization (SAM) technique to further reduce the impact of noisy gradients.

[03] Experimental Results

1. How does STAMP perform compared to existing TTA methods in the outlier-aware TTA setting? STAMP outperforms existing TTA methods in terms of both recognition and outlier detection performance under the outlier-aware TTA setting. Across multiple benchmarks, STAMP achieves the highest accuracy and AUC scores, demonstrating its effectiveness in handling outliers during adaptation.

2. How does STAMP perform in the conventional closed-set TTA setting without outliers? Even in the conventional closed-set TTA setting without outliers, STAMP still exhibits the best performance, outperforming the second-best method by an average of 2.9% in classification accuracy on the CIFAR10-C and CIFAR100-C datasets. This shows the versatility of STAMP in both outlier-aware and traditional TTA scenarios.

3. How does the proportion of outliers in the test data affect the performance of STAMP? The results show that the performance of STAMP is robust to the varying proportion of outliers, with the fluctuation in H-score being within 3% across different outlier ratios (from 5% to 50%). This indicates that STAMP can maintain its effectiveness in a wide range of realistic scenarios with different outlier proportions.

Shared by Daniel Chen ·

Install fromChrome Web Store