NVIDIA researchers have released an open-access dataset designed to help developers get to grips with robotic manipulation — by providing 3D models, depth imagery, and video for 28 readily-available toy grocery items.
"The NVIDIA HOPE [Household Objects for Pose Estimation] datasets consist of RGBD [Red, Green, Blue, and Depth] images and video sequences with labeled 6-DoF [Degrees of Freedom] poses for 28 toy grocery objects," NVIDIA's Stephen Tyree explains. "The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. Further, we provide 3D textured meshes for generating synthetic training data."
The 28 items included in the dataset — which range from ranch dressing and mayo to tinned beans, orange juice cartons, and boxes of granola bars — include a variety of shapes and sizes mimicking those of real groceries. The items are captured in visible lighting with depth information in a total of 50 different scenes — made up of ten different household and office environments with up to ten lighting variants each, and varying amounts of clutter.
Alongside the images is a companion video dataset, totaling ten sequences captured by a camera mounted to a robotic arm. The visible-light videos are accompanied by a three-dimensional point-cloud recreation of the corresponding scene, created using Alibaba's Cascade-Stereo system.
Elsewhere in the collection are annotations with ground-truth poses with a claimed accuracy to "a few millimeters," along with pre-trained pose estimators for every object and baseline performance measurements for validation and test sets — the latter forming a benchmark against which users of the dataset can test their own findings.
The dataset, and a Python-based preview tool, is available on GitHub under the Creative Commons Attribution-NoCommercial-ShareAlike 4.0 license; a paper detailing its creation is available on Cornell's arXiv preprint server.