Concepedia

Publication | Open Access

Physion: Evaluating Physical Prediction from Vision in Humans and\n Machines

15

Citations

0

References

2021

Year

Abstract

While current vision algorithms excel at many challenging tasks, it is\nunclear how well they understand the physical dynamics of real-world\nenvironments. Here we introduce Physion, a dataset and benchmark for rigorously\nevaluating the ability to predict how physical scenarios will evolve over time.\nOur dataset features realistic simulations of a wide range of physical\nphenomena, including rigid and soft-body collisions, stable multi-object\nconfigurations, rolling, sliding, and projectile motion, thus providing a more\ncomprehensive challenge than previous benchmarks. We used Physion to benchmark\na suite of models varying in their architecture, learning objective,\ninput-output structure, and training data. In parallel, we obtained precise\nmeasurements of human prediction behavior on the same set of scenarios,\nallowing us to directly evaluate how well any model could approximate human\nbehavior. We found that vision algorithms that learn object-centric\nrepresentations generally outperform those that do not, yet still fall far\nshort of human performance. On the other hand, graph neural networks with\ndirect access to physical state information both perform substantially better\nand make predictions that are more similar to those made by humans. These\nresults suggest that extracting physical representations of scenes is the main\nbottleneck to achieving human-level and human-like physical understanding in\nvision algorithms. We have publicly released all data and code to facilitate\nthe use of Physion to benchmark additional models in a fully reproducible\nmanner, enabling systematic evaluation of progress towards vision algorithms\nthat understand physical environments as robustly as people do.\n