视频理解

发布日期: 2022-07-26

2022-07-26 更新

Egocentric scene context for human-centric environment understanding from video

Authors:Tushar Nagarajan, Santhosh Kumar Ramakrishnan, Ruta Desai, James Hillis, Kristen Grauman

First-person video highlights a camera-wearer’s activities in the context of their persistent environment. However, current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space and only capture what is directly seen. We present an approach that links egocentric video and camera pose over time by learning representations that are predictive of the camera-wearer’s (potentially unseen) local surroundings to facilitate human-centric environment understanding. We train such models using videos from agents in simulated 3D environments where the environment is fully observable, and test them on real-world videos of house tours from unseen environments. We show that by grounding videos in their physical environment, our models surpass traditional scene classification models at predicting which room a camera-wearer is in (where frame-level information is insufficient), and can leverage this grounding to localize video moments corresponding to environment-centric queries, outperforming prior methods. Project page: http://vision.cs.utexas.edu/projects/ego-scene-context/
PDF

点此查看论文截图

木子已

https://ipaper.today/2022/07/26/2022-07-26-shi-pin-li-jie/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

视频理解

GAN

2022-07-26 GAN

GAN

I2I Translation

2022-07-26 I2I Translation

I2I Translation

视频理解

2022-07-26 更新

Egocentric scene context for human-centric environment understanding from video

打赏用于支持本站流量费