Towards artificial general intelligence via a multimodal foundation model

TLDR

Artificial intelligence aims to replicate human core cognitive activities, yet most current methods possess only single‑cognitive abilities. The authors aim to advance toward artificial general intelligence by creating a foundation model pre‑trained on vast multimodal data that can be rapidly adapted to diverse downstream cognitive tasks. They pre‑train this model using self‑supervised learning on weakly semantically correlated Internet‑crawled data, achieving promising performance across a broad set of downstream tasks. The resulting model demonstrates strong imaginative capabilities and represents a transformative step from narrow AI toward generalized AI.

Abstract

Abstract The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of “weak or narrow AI” to that of “strong or generalized AI”.

References

Page 1

	Year	Citations

Page 1