IconQA

A New Benchmark for Abstract Diagram Understanding
and Visual Language Reasoning

 

What is IconQA?

iconqa_examples.png

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. In this work, we propose a new challenging benchmark, icon question answering (IconQA), which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, we build up a large-scale IconQA dataset that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.


Overview

IconQA provides diverse visual question answering questions that require:

  • basic common sense knowledge
  • comprehensive visual reasoning skills
  • abstract diagram recognition

There are three different sub-tasks in IconQA:

  • 57,672 image choice MC questions
  • 31,578 text choice MC questions
  • 18,189 fill-in-the-blank questions

The IconQA dataset can be accessed here.

TasksTrainValidationTestTotal
Multi-image-choice34,60311,53511,53557,672
Multi-text-choice18,9466,3166,31631,578
Filling-in-the-blank10,9133,6383,63818,189

Icon645

In addition to IconQA, we also present Icon645, a large-scale dataset of icons that cover a wide range of objects:

  • 645,687 colored icons
  • 377 different icon classes

Check out the dataset here.


Download

Our dataset is distributed under the CC BY-NC-SA (Attribution-NonCommercial-ShareAlike) license, which allows anyone to use our dataset for free under the following terms:

If you agree with the terms listed above, you can download our datasets below:

Or check out our github repository.


Citation

If the paper or the dataset inspires you, please cite us:

@inproceedings{lu2021iconqa,
  title = {IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning},
  author = {Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun},
  booktitle = {Submitted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year = {2021}
}