Domain Adaptation


2023-04-03 更新

SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer

Authors:Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He

Real-world images taken by different cameras with different degradation kernels often result in a cross-device domain gap in image super-resolution. A prevalent attempt to this issue is unsupervised domain adaptation (UDA) that needs to access source data. Considering privacy policies or transmission restrictions of data in many practical applications, we propose a SOurce-free image Super-Resolution framework (SOSR) to address this issue, i.e., adapt a model pre-trained on labeled source data to a target domain with only unlabeled target data. SOSR leverages the source model to generate refined pseudo-labels for teacher-student learning. To better utilize the pseudo-labels, this paper proposes a novel wavelet-based augmentation method, named Wavelet Augmentation Transformer (WAT), which can be flexibly incorporated with existing networks, to implicitly produce useful augmented data. WAT learns low-frequency information of varying levels across diverse samples, which is aggregated efficiently via deformable attention. Furthermore, an uncertainty-aware self-training mechanism is proposed to improve the accuracy of pseudo-labels, with inaccurate predictions being rectified by uncertainty estimation. To acquire better SR results and avoid overfitting pseudo-labels, several regularization losses are proposed to constrain the frequency information between target LR and SR images. Experiments show that without accessing source data, SOSR achieves superior results to the state-of-the-art UDA methods.
PDF 15 pages, 9 figures, 10 tables

点此查看论文截图

Fooling Polarization-based Vision using Locally Controllable Polarizing Projection

Authors:Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng

Polarization is a fundamental property of light that encodes abundant information regarding surface shape, material, illumination and viewing geometry. The computer vision community has witnessed a blossom of polarization-based vision applications, such as reflection removal, shape-from-polarization, transparent object segmentation and color constancy, partially due to the emergence of single-chip mono/color polarization sensors that make polarization data acquisition easier than ever. However, is polarization-based vision vulnerable to adversarial attacks? If so, is that possible to realize these adversarial attacks in the physical world, without being perceived by human eyes? In this paper, we warn the community of the vulnerability of polarization-based vision, which can be more serious than RGB-based vision. By adapting a commercial LCD projector, we achieve locally controllable polarizing projection, which is successfully utilized to fool state-of-the-art polarization-based vision algorithms for glass segmentation and color constancy. Compared with existing physical attacks on RGB-based vision, which always suffer from the trade-off between attack efficacy and eye conceivability, the adversarial attackers based on polarizing projection are contact-free and visually imperceptible, since naked human eyes can rarely perceive the difference of viciously manipulated polarizing light and ordinary illumination. This poses unprecedented risks on polarization-based vision, both in the monochromatic and trichromatic domain, for which due attentions should be paid and counter measures be considered.
PDF

点此查看论文截图

STFAR: Improving Object Detection Robustness at Test-Time by Self-Training with Feature Alignment Regularization

Authors:Yijin Chen, Xun Xu, Yongyi Su, Kui Jia

Domain adaptation helps generalizing object detection models to target domain data with distribution shift. It is often achieved by adapting with access to the whole target domain data. In a more realistic scenario, target distribution is often unpredictable until inference stage. This motivates us to explore adapting an object detection model at test-time, a.k.a. test-time adaptation (TTA). In this work, we approach test-time adaptive object detection (TTAOD) from two perspective. First, we adopt a self-training paradigm to generate pseudo labeled objects with an exponential moving average model. The pseudo labels are further used to supervise adapting source domain model. As self-training is prone to incorrect pseudo labels, we further incorporate aligning feature distributions at two output levels as regularizations to self-training. To validate the performance on TTAOD, we create benchmarks based on three standard object detection datasets and adapt generic TTA methods to object detection task. Extensive evaluations suggest our proposed method sets the state-of-the-art on test-time adaptive object detection task.
PDF

点此查看论文截图

One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

Authors:Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture’’ information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM
PDF 13 pages, 8 figures, 5 tables

点此查看论文截图

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Authors:Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual ‘foundation models’ for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data scale and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Finally, we show that task or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on our website for the benefit of the research community.
PDF Project website: https://eai-vc.github.io

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录