A Depth Adaptive Feature Extraction and Dense Prediction Network for 6-D Pose Estimation in Robotic Grasping

Abstract

Estimating the 6-D pose of an object is a vital and challenging task for robot vision systems in industrial robotic grasping. With the wide use of 3-D cameras, the additional acquired depth image provides geometric information of the scene to increase the pose estimation performance but leads to a challenge, fully leveraging the two-modal data, the color image and the depth image. Previous works usually adopt two individual strategies to handle the data, which suffer from limited accuracy and efficiency since the two complementary data are not fully explored. Thus, we propose a depth adaptive feature extraction and dense prediction network that decouples the scale-dependent and the scale-invariant information from the depth image. The former guides the network to perceive the 3-D structure of the scene, and the latter, together with color image, provides the scene textures for feature extraction. The proposed network not only fuses multimodal textures but also retains their 3-D structure. In addition, a dense prediction strategy is adopted to regress the object pose; this approach can mitigate the instability caused by outliers. We conduct various evaluations on a real-world industrial dataset to illustrate the advantages of the proposed approach; and a practical robotic grasping platform is presented to demonstrate its application performance.

References

Page 1

	Year	Citations

Page 1