Publication | Open Access
Heterogeneous Graph Transformer
1.2K
Citations
21
References
2020
Year
Unknown Venue
Graph Neural NetworksGraph Representation LearningGraph TheoryData ScienceMachine LearningHeterogeneous Graph TransformerEngineeringKnowledge Graph EmbeddingsHgt ModelComputer EngineeringNetwork AnalysisGraph Signal ProcessingComputer ScienceHomogeneous GraphsGraph AnalysisGraph Neural NetworkGraph Processing
Graph neural networks have succeeded in modeling structured data, but most are limited to homogeneous graphs, preventing representation of heterogeneous structures. This paper introduces the Heterogeneous Graph Transformer (HGT) to model web‑scale heterogeneous graphs. HGT uses node‑ and edge‑type dependent parameters to compute heterogeneous attention and maintains dedicated representations for each type, while a heterogeneous mini‑batch sampling algorithm (HGSampling) enables efficient training on web‑scale data. On the Open Academic Graph, HGT outperforms all state‑of‑the‑art GNN baselines by 9–21 % across downstream tasks. Dataset and source code are publicly available at https://github.com/acbull/pyHGT.
Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges belong to the same types, making it infeasible to represent heterogeneous structures. In this paper, we present the Heterogeneous Graph Transformer (HGT) architecture for modeling Web-scale heterogeneous graphs. To model heterogeneity, we design node- and edge-type dependent parameters to characterize the heterogeneous attention over each edge, empowering HGT to maintain dedicated representations for different types of nodes and edges. To handle Web-scale graph data, we design the heterogeneous mini-batch graph sampling algorithm—HGSampling—for efficient and scalable training. Extensive experiments on the Open Academic Graph of 179 million nodes and 2 billion edges show that the proposed HGT model consistently outperforms all the state-of-the-art GNN baselines by 9–21 on various downstream tasks. The dataset and source code of HGT are publicly available at https://github.com/acbull/pyHGT.
| Year | Citations | |
|---|---|---|
Page 1
Page 1