2022-11-22 更新
Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier
Authors:Zelin Zang, Lei Shang, Senqiao Yang, Baigui Sun, Stan Z. Li
Unsupervised domain adaptation (UDA) has been highly successful in transferring knowledge acquired from a label-rich source domain to a label-scarce target domain. Open-set domain adaptation (ODA) and universal domain adaptation (UNDA) have been proposed as solutions to the problem concerning the presence of additional novel categories in the target domain. Existing ODA and UNDA approaches treat all novel categories as one unified unknown class and attempt to detect this unknown class during the training process. We find that domain variance leads to more significant view-noise in unsupervised data augmentation, affecting the further applications of contrastive learning~(CL), as well as the current closed-set classifier and open-set classifier causing the model to be overconfident in novel class discovery. To address the above two issues, we propose Soft-contrastive All-in-one Network~(SAN) for ODA and UNDA tasks. SAN includes a novel data-augmentation-based CL loss, which is used to improve the representational capability, and a more human-intuitive classifier, which is used to improve the new class discovery capability. The soft contrastive learning~(SCL) loss is used to weaken the adverse effects of the data-augmentation label noise problem, which is amplified in domain transfer. The All-in-One~(AIO) classifier overcomes the overconfidence problem of the current mainstream closed-set classifier and open-set classifier in a more human-intuitive way. The visualization results and ablation experiments demonstrate the importance of the two proposed innovations. Moreover, extensive experimental results on ODA and UNDA show that SAN has advantages over the existing state-of-the-art methods.
PDF 8 pages, 5 figures. arXiv admin note: text overlap with arXiv:2104.03344 by other authors
点此查看论文截图
A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins Objective
Authors:Praveen Kumar Rajendran, Quoc-Vinh Lai-Dang, Luiz Felipe Vecchietti, Dongsoo Har
Identifying the camera pose for a given image is a challenging problem with applications in robotics, autonomous vehicles, and augmented/virtual reality. Lately, learning-based methods have shown to be effective for absolute camera pose estimation. However, these methods are not accurate when generalizing to different domains. In this paper, a domain adaptive training framework for absolute pose regression is introduced. In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches using Barlow Twins objective. The parallel branches leverage a lightweight CNN-based absolute pose regressor architecture. Further, the efficacy of incorporating spatial and channel-wise attention in the regression head for rotation prediction is investigated. Our method is evaluated with two datasets, Cambridge landmarks and 7Scenes. The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures and achieves performance comparable to transformer-based architectures. Our method ranks 2nd and 4th with the Cambridge Landmarks and 7Scenes datasets, respectively. In addition, for augmented domains not encountered during training, our approach significantly outperforms the MS-transformer. Furthermore, it is shown that our domain adaptive framework achieves better performance than the single branch model trained with the identical CNN backbone with all instances of the unseen distribution.
PDF [draft-v1] 18 pages, 8 figures, and 10 tables
点此查看论文截图
Novel transfer learning schemes based on Siamese networks and synthetic data
Authors:Dominik Stallmann, Philip Kenneweg, Barbara Hammer
Transfer learning schemes based on deep networks which have been trained on huge image corpora offer state-of-the-art technologies in computer vision. Here, supervised and semi-supervised approaches constitute efficient technologies which work well with comparably small data sets. Yet, such applications are currently restricted to application domains where suitable deepnetwork models are readily available. In this contribution, we address an important application area in the domain of biotechnology, the automatic analysis of CHO-K1 suspension growth in microfluidic single-cell cultivation, where data characteristics are very dissimilar to existing domains and trained deep networks cannot easily be adapted by classical transfer learning. We propose a novel transfer learning scheme which expands a recently introduced Twin-VAE architecture, which is trained on realistic and synthetic data, and we modify its specialized training procedure to the transfer learning domain. In the specific domain, often only few to no labels exist and annotations are costly. We investigate a novel transfer learning strategy, which incorporates a simultaneous retraining on natural and synthetic data using an invariant shared representation as well as suitable target variables, while it learns to handle unseen data from a different microscopy tech nology. We show the superiority of the variation of our Twin-VAE architecture over the state-of-the-art transfer learning methodology in image processing as well as classical image processing technologies, which persists, even with strongly shortened training times and leads to satisfactory results in this domain. The source code is available at https://github.com/dstallmann/transfer_learning_twinvae, works cross-platform, is open-source and free (MIT licensed) software. We make the data sets available at https://pub.uni-bielefeld.de/record/2960030.
PDF
点此查看论文截图
Probabilistic Contrastive Learning for Domain Adaptation
Authors:Junjie Li, Yixin Zhang, Zilei Wang, Keyu Tu
The standard contrastive learning acts on the extracted features with $\ell{2}$ normalization. For domain adaptation tasks, however, we find that contrastive learning with the standard paradigm does not perform well. The reason is mainly that the class weights (weights of the final fully connected layer) are not involved during optimization, which does not guarantee the produced features to be clustered around the class weights learned from source data. To tackle this issue, we propose a simple yet powerful probabilistic contrastive learning (PCL) in this paper, which not only produces compact features but also enforces them to be distributed around the class weights. Specifically, we break the traditional contrastive learning paradigm (feature+$\ell{2}$ normalization) by replacing the features with probabilities and removing $\ell_{2}$ normalization. In this way, we can enforce the probability to approximate the one-hot form, thereby narrowing the distance between the features and the class weights. PCL is generic due to conciseness, which can be used for different tasks. In this paper, we conduct extensive experiments on five tasks, \textit{i.e.}, unsupervised domain adaptation (UDA), semi-supervised domain adaptation (SSDA), semi-supervised learning (SSL), UDA detection, and UDA semantic segmentation. The results demonstrate that our PCL can bring significant gains for these tasks. In particular, for segmentation tasks, with the blessing of PCL, our method achieves or even surpasses CPSL-D with a smaller training cost (13090, 5 days vs 4V100, 11 days). Code is available at https://github.com/ljjcoder/Probabilistic-Contrastive-Learning.
PDF 12 pages,4 figures
点此查看论文截图
Beyond Deterministic Translation for Unsupervised Domain Adaptation
Authors:Eleni Chiou, Eleftheria Panagiotaki, Iasonas Kokkinos
In this work we challenge the common approach of using a one-to-one mapping (‘translation’) between the source and target domains in unsupervised domain adaptation (UDA). Instead, we rely on stochastic translation to capture inherent translation ambiguities. This allows us to (i) train more accurate target networks by generating multiple outputs conditioned on the same source image, leveraging both accurate translation and data augmentation for appearance variability, (ii) impute robust pseudo-labels for the target data by averaging the predictions of a source network on multiple translated versions of a single target image and (iii) train and ensemble diverse networks in the target domain by modulating the degree of stochasticity in the translations. We report improvements over strong recent baselines, leading to state-of-the-art UDA results on two challenging semantic segmentation benchmarks. Our code is available at https://github.com/elchiou/Beyond-deterministic-translation-for-UDA.
PDF Accepted at BMVC 2022. Code is available at https://github.com/elchiou/Beyond-deterministic-translation-for-UDA
点此查看论文截图
Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings
Authors:Barış Batuhan Topal, Deniz Yuret, Tevfik Metin Sezgin
Drawings are powerful means of pictorial abstraction and communication. Understanding diverse forms of drawings, including digital arts, cartoons, and comics, has been a major problem of interest for the computer vision and computer graphics communities. Although there are large amounts of digitized drawings from comic books and cartoons, they contain vast stylistic variations, which necessitate expensive manual labeling for training domain-specific recognizers. In this work, we show how self-supervised learning, based on a teacher-student network with a modified student network update design, can be used to build face and body detectors. Our setup allows exploiting large amounts of unlabeled data from the target domain when labels are provided for only a small subset of it. We further demonstrate that style transfer can be incorporated into our learning pipeline to bootstrap detectors using a vast amount of out-of-domain labeled images from natural images (i.e., images from the real world). Our combined architecture yields detectors with state-of-the-art (SOTA) and near-SOTA performance using minimal annotation effort.
PDF Preprint, 8 pages of the paper itself + 7 pages of Supplementary Material. Includes 8 figures and 7 tables
点此查看论文截图
Computational Optics Meet Domain Adaptation: Transferring Semantic Segmentation Beyond Aberrations
Authors:Qi Jiang, Hao Shi, Shaohua Gao, Jiaming Zhang, Kailun Yang, Lei Sun, Kaiwei Wang
Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through computational optics, i.e. Computational Imaging (CI) technique, ignoring the feasibility in semantic segmentation. In this paper, we pioneer to investigate Semantic Segmentation under Optical Aberrations (SSOA) of MOS. To benchmark SSOA, we construct Virtual Prototype Lens (VPL) groups through optical simulation, generating Cityscapes-ab and KITTI-360-ab datasets under different behaviors and levels of aberrations. We look into SSOA via an unsupervised domain adaptation perspective to address the scarcity of labeled aberration data in real-world scenarios. Further, we propose Computational Imaging Assisted Domain Adaptation (CIADA) to leverage prior knowledge of CI for robust performance in SSOA. Based on our benchmark, we conduct experiments on the robustness of state-of-the-art segmenters against aberrations. In addition, extensive evaluations of possible solutions to SSOA reveal that CIADA achieves superior performance under all aberration distributions, paving the way for the applications of MOS in semantic scene understanding. Code and dataset will be made publicly available at https://github.com/zju-jiangqi/CIADA.
PDF Code and dataset will be made publicly available at https://github.com/zju-jiangqi/CIADA
点此查看论文截图
ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation
Authors:Shishuai Hu, Zehui Liao, Yong Xia
The domain discrepancy existed between medical images acquired in different situations renders a major hurdle in deploying pre-trained medical image segmentation models for clinical use. Since it is less possible to distribute training data with the pre-trained model due to the huge data size and privacy concern, source-free unsupervised domain adaptation (SFDA) has recently been increasingly studied based on either pseudo labels or prior knowledge. However, the image features and probability maps used by pseudo label-based SFDA and the consistent prior assumption and the prior prediction network used by prior-guided SFDA may become less reliable when the domain discrepancy is large. In this paper, we propose a \textbf{Pro}mpt learning based \textbf{SFDA} (\textbf{ProSFDA}) method for medical image segmentation, which aims to improve the quality of domain adaption by minimizing explicitly the domain discrepancy. Specifically, in the prompt learning stage, we estimate source-domain images via adding a domain-aware prompt to target-domain images, then optimize the prompt via minimizing the statistic alignment loss, and thereby prompt the source model to generate reliable predictions on (altered) target-domain images. In the feature alignment stage, we also align the features of target-domain images and their styles-augmented counterparts to optimize the source model, and hence push the model to extract compact features. We evaluate our ProSFDA on two multi-domain medical image segmentation benchmarks. Our results indicate that the proposed ProSFDA outperforms substantially other SFDA methods and is even comparable to UDA methods. Code will be available at \url{https://github.com/ShishuaiHu/ProSFDA}.
PDF
点此查看论文截图
Unsupervised Domain Adaptation via Deep Hierarchical Optimal Transport
Authors:Yingxue Xu, Guihua Wen, Yang Hu, Pei Yang
Unsupervised domain adaptation is a challenging task that aims to estimate a transferable model for unlabeled target domain by exploiting source labeled data. Optimal Transport (OT) based methods recently have been proven to be a promising direction for domain adaptation due to their competitive performance. However, most of these methods coarsely aligned source and target distributions, leading to the over-aligned problem where the category-discriminative information is mixed up although domain-invariant representations can be learned. In this paper, we propose a Deep Hierarchical Optimal Transport method (DeepHOT) for unsupervised domain adaptation. The main idea is to use hierarchical optimal transport to learn both domain-invariant and category-discriminative representations by mining the rich structural correlations among domain data. The DeepHOT framework consists of a domain-level OT and an image-level OT, where the latter is used as the ground distance metric for the former. The image-level OT captures structural associations of local image regions that are beneficial to image classification, while the domain-level OT learns domain-invariant representations by leveraging the underlying geometry of domains. However, due to the high computational complexity, the optimal transport based models are limited in some scenarios. To this end, we propose a robust and efficient implementation of the DeepHOT framework by approximating origin OT with sliced Wasserstein distance in image-level OT and using a mini-batch unbalanced optimal transport for domain-level OT. Extensive experiments show that DeepHOT surpasses the state-of-the-art methods in four benchmark datasets. Code will be released on GitHub.
PDF 9 pages, 3 figures
点此查看论文截图
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Authors:Yuecong Xu, Haozhi Cao, Zhenghua Chen, Xiaoli Li, Lihua Xie, Jianfei Yang
Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare, thanks to the introduction of large-scale datasets and deep learning-based representations. However, video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications due to domain shifts between the training public video datasets (source video domains) and real-world videos (target video domains). Further, with the high cost of video annotation, it is more practical to use unlabeled videos for training. To tackle performance degradation and address concerns in high video annotation cost uniformly, the video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain by alleviating video domain shift, improving the generalizability and portability of video models. This paper surveys recent progress in VUDA with deep learning. We begin with the motivation of VUDA, followed by its definition, and recent progress of methods for both closed-set VUDA and VUDA under different scenarios, and current benchmark datasets for VUDA research. Eventually, future directions are provided to promote further VUDA research.
PDF Survey on Video Unsupervised Domain Adaptation (VUDA), 16 pages, 1 figure, 8 tables