Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

TLDR

Collecting annotated visual grasping datasets is time‑consuming and costly, and while simulators can automatically generate synthetic data, models trained solely on such data often fail to generalize to real‑world scenarios. The study aims to extend randomized simulated environments and domain adaptation techniques to train a grasping system capable of handling novel objects from raw monocular RGB images. The authors evaluate their methods with over 25,000 physical test grasps, examining various simulation settings and domain adaptation strategies, including a novel pixel‑level adaptation called GraspGAN. Results show that synthetic data combined with domain adaptation can cut the number of required real‑world samples by up to 50×, and that using only unlabeled real data with GraspGAN yields real‑world grasping performance comparable to that achieved with 939,777 labeled samples.

Abstract

Instrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms can be extremely time-consuming and expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically. Unfortunately, models trained purely on simulated data often fail to generalize to the real world. We study how randomized simulated environments and domain adaptation methods can be extended to train a grasping system to grasp novel objects from raw monocular RGB images. We extensively evaluate our approaches with a total of more than 25,000 physical test grasps, studying a range of simulation conditions and domain adaptation methods, including a novel extension of pixel-level domain adaptation that we term the GraspGAN. We show that, by using synthetic data and domain adaptation, we are able to reduce the number of real-world samples needed to achieve a given level of performance by up to 50 times, using only randomly generated simulated objects. We also show that by using only unlabeled real-world data and our GraspGAN methodology, we obtain real-world grasping performance without any real-world labels that is similar to that achieved with 939,777 labeled real-world samples.

References

Page 1

	Year	Citations

Page 1