IconQA

A New Benchmark for Abstract Diagram Understanding
and Visual Language Reasoning

(NeurIPS 2021)

 

What is IconQA?

iconqa_examples.png

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. In this work, we propose a new challenging benchmark, icon question answering (IconQA), which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, we build up a large-scale IconQA dataset that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.

IconQA Dataset

There are three different sub-tasks in IconQA:

Sub-tasks Total Train Val Test
Multi-image-choice 57,672 34,603 11,535 11,535
Multi-text-choice 31,578 18,946 6,316 6,316
Filling-in-the-blank 18,189 10,913 3,638 3,638

IconQA provides diverse visual question answering questions that require:

Some examples in the IconQA dataset are shown below:

iconqa_examples.png

For more details, you can explore the datatset and check the visualizations here: Explore and Visualizations.

Icon645 Dataset

iconqa_examples.png

In addition to IconQA, we also present Icon645, a large-scale dataset of icons that cover a wide range of objects:

  • 645,687 colored icons
  • 377 different icon classes

These collected icon classes are frequently mentioned in the IconQA questions. In this work, we use the icon data to pre-train backbone networks on the icon classification task in order to extract semantic representations from abstract diagrams in IconQA. On top of pre-training encoders, the large-scale icon data could also contribute to open research on abstract aesthetics and symbolic visual understanding.

Download

Our dataset is distributed under the CC BY-NC-SA (Attribution-NonCommercial-ShareAlike) license, which allows anyone to use our dataset for free under the following terms:

If you agree with the terms listed above, you can download our datasets below:

Paper

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei Zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
The 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2021
Paper / PDF / Code

Code

View on the github repository.

Citation

If the paper or the dataset inspires you, please cite us:

@inproceedings{lu2021iconqa,
  title = {IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning},
  author = {Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun},
  booktitle = {The 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
  year = {2021}
}

Authors


1Center for Vision, Cognition, Learning, and Autonomy (VCLA), UCLA
2School of Computer Science and Technology, East China Normal University
3Computer Science Department, Columbia University
4School of Intelligent Systems Engineering, Sun Yat-sen University

Contact

Questions about IconQA, or want to get in touch? Contact Pan Lu at the contact page, or open up a pull request or issue on Github.