Be flexible! learn to debias by sampling and prompting for robust visual question answering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Be flexible! learn to debias by sampling and prompting for robust visual question answering

Abstract:	Recent studies point out that VQA models tend to rely on the language prior in the training data to answer the questions, which prevents the VQA model from generalization on the out-of-distribution test data. To address this problem, approaches are designed to reduce the language distribution prior effect by constructing negative image–question pairs, while they cannot provide the proper visual reason for answering the question. In this paper, we present a new debiasing framework for VQA by Learning to Sample paired image–question and Prompt for given question (LSP). Specifically, we construct the negative image–question pairs with certain sampling rate to prevent the model from overly relying on the visual shortcut content. Notably, question types provide a strong hint for answering the questions. We utilize question type to constrain the sampling process for negative question–image pairs, and further learn the question type-guided prompt for better question comprehension. Extensive experiments on two public benchmarks, VQA-CP v2 and VQA v2, demonstrate that our model achieves new state-of-the-art results in overall accuracy, i.e., 61.95% and 65.26%.

Keywords:	Question debiasing Visual question answering Contrastive sampling Question-type guided prompt
本文献已被 ScienceDirect 等数据库收录！