场景文本检测识别

发布日期: 2023-02-03

2023-02-03 更新

SceneScape: Text-Driven Consistent Scene Generation

Authors:Rafail Fridman, Amit Abecasis, Yoni Kasten, Tali Dekel

We propose a method for text-driven perpetual view generation — synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To achieve 3D consistency, i.e., generating videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene; the depth maps are used to construct a unified mesh representation of the scene, which is updated throughout the generation and is used for rendering. In contrast to previous works, which are applicable only for limited domains (e.g., landscapes), our framework generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/
PDF Project page: https://scenescape.github.io/

点此查看论文截图

木子已

https://ipaper.today/2023/02/03/2023-02-03-chang-jing-wen-ben-jian-ce-shi-bie/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

场景文本检测识别

I2I Translation

2023-02-03 I2I Translation

I2I Translation

Few-Shot

2023-02-03 Few-Shot

Few-Shot

场景文本检测识别

2023-02-03 更新

SceneScape: Text-Driven Consistent Scene Generation

打赏用于支持本站流量费