• Overview
  • Participation
  • Data
  • Timeline
  • FAQ
  • Result Submission
  • Personal Center
  • Leader Board
  • Organizers
  • Contact Us

Introduction

In VTQA challenge, the model is expected to answer the question according to the given image-text pair. To answer VTQA questions, the proposed model needs to: (1) learn to identifying entities in image and text referred to the question, (2) align multimedia representations of the same entity, and (3) conduct multi-steps reasoning between text and image and output open-ended answer. The VTQA dataset consists of 10124 image-text pairs and 23,781 questions. The images are real images from MSCOCO dataset, containing a variety of entities. The annotators are required to first annotate relevant text according to the image, and then ask questions based on the image-text pair, and finally answer the question open-ended.

Information diversity, multimedia multi-step reasoning and open-ended answer make our task more challenging than the existing tasks. The aim of this challenge is to develop and benchmark models that are capable of multimedia entity alignment, multi-step reasoning and open-ended answer generation.

Challenge Task

As illustrated in the figure, given an image-text pair and a question, a system is required to answer the question by natural language. Importantly, the system needs to: (1) analyze the question and find out the key entities, (2) align the key entities between image and text, and (3) generate the answer according to the question and aligned entities. For example, in Figure 1, the key entity of Q1 is “Elena”. According to the text “gold hair”, we can determine that the second person from the right in the image is “Elena”. Finally, we further answer “suit” based on the image information. As for Q2, which is a more complex question, the previous steps need to be repeated several times to answer it.

Update

26 June 2023

The announcement of the challenge results will be extended to next Monday (3 July). During this period, model evaluation services will no longer be provided. New submissions or revisions of existing submissions will still be accepted before this Friday (30 June). The ranking will be based on the latest model submitted by the participants.

14 June 2023

The top-3 teams will be invited to submit their papers to ACM MM.

6 June 2023

To facilitate the submission of test set , we will provide specific error information when an evaluation run fails and reset the number of submissions to 1.

(Note: we will not backtrack the number of weekly submissions that have already passed; participants will no longer enjoy this service after their first successful submission.)

3 April 2023

Release the English version of dataset and correct some Chinese annotations. Meanwhile, the demo code is updated.

Register

Please enter the correct email format

Length of your password should be between 6 to 20

Please enter your name

Please enter your Institution

Teamname couldn't be empty.

Please fill in your common email account to register.The notification about the competition will be sent to your email.
Register

注册协议

【首部及导言】

为有效利用QQ号码资源,维护用户合法权益,特制订《QQ号码规则》(以下简称“本规则”)。请您务必审慎阅读、充分理解各条款内容,特别是免除或者限制责任的条款,以及开通或使用某项服务的单独协议,并选择接受或不接受。限制、免责条款可能以加粗形式提示您注意。

除非您已阅读并接受本规则所有条款,否则您无权申请或使用QQ号码。您申请或使用QQ号码的行为即视为您已阅读并同意受本规则的约束

一、【规则的范围】

1.1 本规则是腾讯制定的关于获取和使用QQ号码的相关规则。本规则适用于腾讯提供的需要注册或使用QQ号码的全部软件和服务。

1.2 本规则属于腾讯的业务规则,是《腾讯服务协议》不可分割的组成部分。

1.3 您通过QQ号码使用腾讯的软件和服务时,须同时遵守各项服务的单独协议。

二、【QQ号码的性质】

QQ号码是腾讯创设的用于识别用户身份的数字标识。QQ号码的所有权属于腾讯。

三、【QQ号码的获取】