Introduction
The Hand and Glove Segmentation Dataset for Department of Energy (DOE) Glovebox Environments (HAGS) is a robot allocentric perception dataset. It aims to improve safety and accuracy in human-robot collaboration (HRC), particularly in glovebox environments. The dataset's incorporation of diverse HRC experiments, including building a Jenga block tower and disassembling a small box, enhances diversity, variability, and reproducibility, providing a comprehensive representation of interactions for robust and generalizable studies in human-robot interaction. The dataset is instrumental in advancing safety systems and robotic aid in real-time scenarios within the field of machine learning, fostering the development of intelligent, reliable solutions for human-robot collaboration.
Dataset Preview
Dataset Characteristics
As mentioned, the dataset captures two human-robot collaboration experiments. In the first experiment, participants built a Jenga block tower, receiving six blocks from the robot manipulator arm. The second experiment required participants to disassemble a box, with the robot manipulator arm providing the participant with different screwdrivers to remove four screws. Each participant repeated both experiments four times under the following conditions: a) wearing gloves, b) ungloved, c) with a green screen placed along the bottom of the glovebox, and d) without a green screen placed along the bottom. Lastly, each experiment was recorded from two distinct camera angles: a top view and a side view.
Dataset Contents
The dataset contains:
- Ten participants conducted two experiments, each involving four variables and two camera angles, resulting in a total of 16 videos per participant.
- Eight hours of video footage of each experiment.
- 2876 annotated in-distribution and out-of-distribution frames.
- 1438 original, unannotated sampled frames.
Data Collection
The data was collected in a standard glovebox commonly utilized by researchers in the DOE. Each video provides two camera angles: one from a bird's eye view captured by a 1080p GoPro, and another from a 1080p Intel RealSense Development Kit Camera recording from the right side of the participant. To assist the participants, a Universal Robots UR3e robot manipulator arm equipped with a gripper for object handling was pre-programmed to conduct the two tasks. Two researchers assisted in the experiment, one operating the robot arm and the other assisting in object placement. The collection process ensured normally distributed frames from each video, which were then annotated.
Data Post-Processing for Machine Learning
For applications with machine learning, the sampled frames were split into two sets: a) in-distribution set and b) out-of-distribution set. The in-distribution set contains the most likely scenarios to occur with human-robot collaboration work in a glovebox, providing applicability in model training. Therefore, videos without a green screen in the background and with participants wearing gloves were designated as the in-distribution set. The rest of the videos contain either the participant not wearing gloves and/or a green screen placed in the background. These scenarios are less likely to occur in a glovebox setting, and thus related frames were placed in the out-of-distribution set, providing applicability in model evaluation. 1440 frames were sampled for labeling. These frames were sampled equally distributed across all the videos, with 120 in-distribution frames and 24 out-of-distribution frames sampled per participant (except Participant 6); further explanation regarding the exception for Participant 6 can be found in the Data Quality Statement in the Dataset Report.
Data Annotation
The data was manually annotated by four researchers who were divided into two groups. Three classes were assigned to each image: left hand, right hand, and background. Human annotators were instructed to annotate each hand from the tip to the wrist and provide their best estimate of the wrist location when the subject was wearing gloves. The human annotator supplied the open-sourced Segment Anything Model with a user box or point prompt to generate an initial annotation for each frame, using it in conjunction with the open-sourced Label Studio annotation tool. Subsequently, a human annotator then adjusted each annotation for precision. Two annotators labeled each image to promote inter-annotator agreement. Each frame’s annotation was converted to a single PNG file, where the three classes were recorded: left hand, right hand, and background.
Models and Data Usage
Using the annotated dataset, multiple semantic segmentation models were trained. The official model training code and configurations for this dataset are in GitHub. The link to the GitHub repository is provided in the Software metadata field below.
Human Subjects
This study was approved by the University of Texas at Austin Institutional Review Board (IRB) under the IRB ID: STUDY00003948. Anyone present in the recorded data and their observed behavior provided consent. To provide a comprehensive representation of collaborative scenarios, a diverse pool of participants was selected. To protect their privacy, participants with recognizable features were asked to cover them up through the use of makeup or other methods. The only characteristic identified by the dataset is race. Anyone who revoked their consent and expressed so was noted and removed from the data and the annotations. Included in this data package are the IRB-exempt determination and the Research Information Sheet distributed to the participants.
Dataset Organization
The dataset is organized in the following manner:
It is recommended users first inspect the metadata under the metadata directory to understand which files should be used for their task. For an in-depth explanation of the dataset file structure, refer to the Dataset Report included in this dataset. Use the "tree" view for the metadata to better observe the dataset structure.
Dataset Quality Statement
The research team maintains high data quality by rigorously adhering to standardized protocols during experimentation, ensuring consistency and reproducibility in participant procedures. Participant adherence to protocols is meticulously monitored to uphold data quality, including videos and pictures captured during sessions. Inter-annotator agreement is established during the annotation process, with each annotation reviewed by two individuals to improve accuracy and reliability. Comprehensive documentation is maintained throughout data collection to ensure traceability and facilitate auditing. All dataset contents are thoroughly documented, ensuring transparency and reproducibility in findings. Finally, specific dataset noise is reported in the Dataset Report to provide insight into potential anomalies or discrepancies within the data.
Bulk Data Download
A script named download_data.py is provided for bulk data download. To run the script, save it as a '.py' file, navigate to its directory in the terminal, and execute it using the 'python' command, ensuring all required dependencies are installed. Before running the script, ensure a stable internet connection as it will take some time to download all files from the Dataverse repository.