AdaptGuard: Defending Against Universal Attacks for Model Adaptation. (arXiv:2303.10594v1 [cs.CR])
Model adaptation aims at solving the domain transfer problem under the
constraint of only accessing the pretrained source models. With the increasing
considerations of data privacy and transmission efficiency, this paradigm has
been gaining recent popularity. This paper studies the vulnerability to
universal attacks transferred from the source domain during model adaptation
algorithms due to the existence of the malicious providers. We explore both
universal adversarial perturbations and backdoor attacks as loopholes on the
source side and discover that they still survive in the target models after
adaptation. To address this issue, we propose a model preprocessing framework,
named AdaptGuard, to improve the security of model adaptation algorithms.
AdaptGuard avoids direct use of the risky source parameters through knowledge
distillation and utilizes the pseudo adversarial samples under adjusted radius
to enhance the robustness. AdaptGuard is a plug-and-play module that requires
neither robust pretrained models nor any changes for the following model
adaptation algorithms. Extensive results on three commonly used datasets and
two popular adaptation methods validate that AdaptGuard can effectively defend
against universal attacks and maintain clean accuracy in the target domain
simultaneously. We hope this research will shed light on the safety and
robustness of transfer learning.