What is IconQA?

Current visual question answering (VQA) tasks mainly consider answering human-annotated questions for natural images in the daily-life context. In this work, we propose a new challenging benchmark, icon question answering (IconQA), which aims to highlight the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world diagram word problems. For this benchmark, we build up a large-scale IconQA dataset that consists of three sub-tasks: multi-image-choice, multi-text-choice, and filling-in-the-blank. Compared to existing VQA benchmarks, IconQA requires not only perception skills like object recognition and text understanding, but also diverse cognitive reasoning skills, such as geometric reasoning, commonsense reasoning, and arithmetic reasoning.

IconQA Dataset

There are three different sub-tasks in IconQA:

Sub-tasks	Total	Train	Val	Test
Multi-image-choice	57,672	34,603	11,535	11,535
Multi-text-choice	31,578	18,946	6,316	6,316
Filling-in-the-blank	18,189	10,913	3,638	3,638

IconQA provides diverse visual question answering questions that require:

abstract diagram recognition
comprehensive visual reasoning skills
basic common sense knowledge

Some examples in the IconQA dataset are shown below:

For more details, you can explore the datatset and check the visualizations here: Explore and Visualizations.

Icon645 Dataset

In addition to IconQA, we also present Icon645, a large-scale dataset of icons that cover a wide range of objects:

645,687 colored icons
377 different icon classes

These collected icon classes are frequently mentioned in the IconQA questions. In this work, we use the icon data to pre-train backbone networks on the icon classification task in order to extract semantic representations from abstract diagrams in IconQA. On top of pre-training encoders, the large-scale icon data could also contribute to open research on abstract aesthetics and symbolic visual understanding.

Download

Our dataset is distributed under the CC BY-NC-SA (Attribution-NonCommercial-ShareAlike) license, which allows anyone to use our dataset for free under the following terms:

You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You must not use the material for commercial purposes.
If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

If you agree with the terms listed above, you can download our datasets below:

IconQA (S3) or IconQA (Google Drive)
Icon645 (S3) or Icon645 (Google Drive)

Paper

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei Zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
The 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2021
Paper / PDF / Code

Code

View on the github repository.

Citation

If the paper or the dataset inspires you, please cite us:

@inproceedings{lu2021iconqa,
    title = {IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning},
    author = {Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun},
    booktitle = {The 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
    year = {2021}
}

Authors

Pan Lu¹

Liang Qiu¹

Jiaqi Chen⁴

Tony Xia¹

Yizhou Zhao¹

Wei Zhang²

Zhou Yu³

Xiaodan Liang⁴

Song-Chun Zhu¹

¹Center for Vision, Cognition, Learning, and Autonomy (VCLA), UCLA
²School of Computer Science and Technology, East China Normal University
³Computer Science Department, Columbia University
⁴School of Intelligent Systems Engineering, Sun Yat-sen University

Contact

Questions about IconQA, or want to get in touch? Contact Pan Lu at the contact page, or open up a pull request or issue on Github.