Abstract: Speech-based Visual Question Answering (SBVQA) is a challenging task that aims to answer spoken questions about images. The challenges of this task involve the variability of speakers, the ...
Abstract: Visual question answering (VQA) aims to build an interactive system that infers the answer according to the input image and text-based question. Recently, VQA for remote sensing has ...