Open Graph Benchmark: Datasets for Machine Learning on Graphs

TLDR

OGB is a benchmark suite of diverse, realistic graph datasets designed to support scalable, robust, and reproducible graph machine learning research, and it is regularly updated with community contributions. OGB includes large‑scale datasets spanning social, information, biological, molecular, source code, and knowledge graph domains, offers unified evaluation protocols with application‑specific splits and metrics, runs extensive benchmark experiments, and supplies an automated end‑to‑end pipeline for data loading, experimental setup, and model evaluation. Benchmark experiments demonstrate that OGB datasets present substantial scalability challenges and out‑of‑distribution generalization problems under realistic splits, underscoring promising avenues for future research, and all datasets, loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu.

Abstract

We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu .

References

Page 1

	Year	Citations

Page 1