视频理解

发布日期: 2023-07-15

2023-07-15 更新

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding

Authors:Hao Zheng, Regina Lee, Yuqian Lu

Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD - the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance for comprehending knowledge in assembly progress, process efficiency, task collaboration, skill parameters and human intention. Details of HA-ViD is available at: https://iai-hrc.github.io/ha-vid.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/07/15/2023-07-15-shi-pin-li-jie/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

视频理解

Anti-Spoofing

2023-07-15 Anti-Spoofing

Anti-Spoofing

I2I Translation

2023-07-15 I2I Translation

I2I Translation

视频理解

2023-07-15 更新

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding

打赏用于支持本站流量费