Visual Prompt Tuning
Abstract:
The current modus operandi in adapting pre-trained models involves updating
all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual
Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning
for large-scale Transformer models in vision.
Introduction
Nowadays, the most accurate results are now obtained by adapting large
foundation models pre-trained on massive curated or raw data, a finding that
mirrors developments in NLP. However, the issue with this is performing full
fine-tuning of the pre-trained model as it is a very expensive task.
What is the best way to adapt to large pre-trained Transformers to
downstream tasks in terms of effectiveness and efficiency?
Visual Prompt Tuning (VPT). Freezing the backbone and only training the head as well as
adding extra parameters in the input space.
One solution to this is to fine-tune only a subset of parameters such as the
head or the backbone. However, for Transformers these methods under-
perform full fine-tuning in accuracy.
Related Work
Transfer Learning:
Essentially using a pre-trained model and fine-tuning it to your specific task
Visual Prompt Tuning 1
Abstract:
The current modus operandi in adapting pre-trained models involves updating
all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual
Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning
for large-scale Transformer models in vision.
Introduction
Nowadays, the most accurate results are now obtained by adapting large
foundation models pre-trained on massive curated or raw data, a finding that
mirrors developments in NLP. However, the issue with this is performing full
fine-tuning of the pre-trained model as it is a very expensive task.
What is the best way to adapt to large pre-trained Transformers to
downstream tasks in terms of effectiveness and efficiency?
Visual Prompt Tuning (VPT). Freezing the backbone and only training the head as well as
adding extra parameters in the input space.
One solution to this is to fine-tune only a subset of parameters such as the
head or the backbone. However, for Transformers these methods under-
perform full fine-tuning in accuracy.
Related Work
Transfer Learning:
Essentially using a pre-trained model and fine-tuning it to your specific task
Visual Prompt Tuning 1