Cinque Terre

Chaitanya Mitash

Ph.D Student, Computer Science
Rutgers University

Improving 6D Pose Estimation of Objects in Clutter via
Physics-aware Monte Carlo Tree Search (ArXiv)

Chaitanya Mitash, Abdeslam Boularias and Kostas E. Bekris

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018

Abstract: — This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods.


The code for the entire pose estimation pipeline is shared at :

Rutgers Extended RGBD dataset

Dataset download link: download
For each scene in the dataset, we share:
  • RGB Image
  • Depth Image
  • Segmentation mask
  • Parameters
    • camera_pose: pose of the camera in a global frame.
    • camera_intrinsics: intrinsic parameters of the camera.
    • rest_surface: pose of the resting surface such as a table or shelf bin.
    • dependency_order: physical and visual dependency of objects upon each other.
    • pose: ground-truth object pose in a global frame.

Examples of scenes in the dataset and results of pose estimation with physics-based reasoning.
Cinque Terre


title={Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search},
   author={Mitash, Chaitanya and Boularias, Abdeslam and Bekris, Kostas E},
   journal={{IEEE} International Conference on Robotics and Automation (ICRA)},

Contact Information

Chaitanya Mitash, Kostas E. Bekris and Abdeslam Boularias
Computer Science Department, Rutgers University, New Brunswick, NJ.
E-mail: {cm1074,kb572,ab1544}