Vision Transformer

发布日期: 2023-02-09

2023-02-09 更新

Neural Congealing: Aligning Images to a Joint Semantic Atlas

Authors:Dolev Ofri-Amar, Michal Geyer, Yoni Kasten, Tali Dekel

We present Neural Congealing — a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas — a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images. We derive a new robust self-supervised framework that optimizes the atlas representation and mappings per image set, requiring only a few real-world images as input without any additional input information (e.g., segmentation masks). Notably, we design our losses and training paradigm to account only for the shared content under severe variations in appearance, pose, background clutter or other distracting objects. We demonstrate results on a plethora of challenging image sets including sets of mixed domains (e.g., aligning images depicting sculpture and artwork of cats), sets depicting related yet different object categories (e.g., dogs and tigers), or domains for which large-scale training data is scarce (e.g., coffee mugs). We thoroughly evaluate our method and show that our test-time optimization approach performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.
PDF Project page: https://neural-congealing.github.io/

点此查看论文截图

木子已

https://ipaper.today/2023/02/09/2023-02-09-vision-transformer/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

Vision Transformer

检测/分割/跟踪

2023-02-09 检测/分割/跟踪

检测分割跟踪

Few-Shot

2023-02-09 Few-Shot

Few-Shot

Vision Transformer

2023-02-09 更新

Neural Congealing: Aligning Images to a Joint Semantic Atlas

打赏用于支持本站流量费