2023-04-21 更新
SAM Fails to Segment Anything? — SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More
Authors:Tianrun Chen, Lanyun Zhu, Chaotao Ding, Runlong Cao, Yan Wang, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang
The emergence of large models, also known as foundation models, has brought significant advancements to AI research. One such model is Segment Anything (SAM), which is designed for image segmentation tasks. However, as with other foundation models, our experimental findings suggest that SAM may fail or perform poorly in certain segmentation tasks, such as shadow detection and camouflaged object detection (concealed object detection). This study first paves the way for applying the large pre-trained image segmentation model SAM to these downstream tasks, even in situations where SAM performs poorly. Rather than fine-tuning the SAM network, we propose \textbf{SAM-Adapter}, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters. Our extensive experiments show that SAM-Adapter can significantly elevate the performance of SAM in challenging tasks and we can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection and shadow detection. We believe our work opens up opportunities for utilizing SAM in downstream tasks, with potential applications in various fields, including medical image processing, agriculture, remote sensing, and more.
PDF
点此查看论文截图
Adaptive Stylization Modulation for Domain Generalized Semantic Segmentation
Authors:Gabriel Tjio, Ping Liu, Chee-Keong Kwoh, Joey Tianyi Zhou
Obtaining sufficient labelled data for model training is impractical for most real-life applications. Therefore, we address the problem of domain generalization for semantic segmentation tasks to reduce the need to acquire and label additional data. Recent work on domain generalization increase data diversity by varying domain-variant features such as colour, style and texture in images. However, excessive stylization or even uniform stylization may reduce performance. Performance reduction is especially pronounced for pixels from minority classes, which are already more challenging to classify compared to pixels from majority classes. Therefore, we introduce a module, $ASH_{+}$, that modulates stylization strength for each pixel depending on the pixel’s semantic content. In this work, we also introduce a parameter that balances the element-wise and channel-wise proportion of stylized features with the original source domain features in the stylized source domain images. This learned parameter replaces an empirically determined global hyperparameter, allowing for more fine-grained control over the output stylized image. We conduct multiple experiments to validate the effectiveness of our proposed method. Finally, we evaluate our model on the publicly available benchmark semantic segmentation datasets (Cityscapes and SYNTHIA). Quantitative and qualitative comparisons indicate that our approach is competitive with state-of-the-art. Code is made available at \url{https://github.com/placeholder}
PDF
点此查看论文截图
Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
Authors:Qianjiang Hu, Daizong Liu, Wei Hu
3D object detection from point clouds is crucial in safety-critical autonomous driving. Although many works have made great efforts and achieved significant progress on this task, most of them suffer from expensive annotation cost and poor transferability to unknown data due to the domain gap. Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors. To this end, we make the attempt to propose a density-insensitive domain adaption framework to address the density-induced domain gap. In particular, we first introduce Random Beam Re-Sampling (RBRS) to enhance the robustness of 3D detectors trained on the source domain to the varying beam-density. Then, we take this pre-trained detector as the backbone model, and feed the unlabeled target domain data into our newly designed task-specific teacher-student framework for predicting its high-quality pseudo labels. To further adapt the property of density-insensitivity into the target domain, we feed the teacher and student branches with the same sample of different densities, and propose an Object Graph Alignment (OGA) module to construct two object-graphs between the two branches for enforcing the consistency in both the attribute and relation of cross-density objects. Experimental results on three widely adopted 3D object detection datasets demonstrate that our proposed domain adaption method outperforms the state-of-the-art methods, especially over varying-density data. Code is available at https://github.com/WoodwindHu/DTS}{https://github.com/WoodwindHu/DTS.
PDF Accepted by CVPR2023
点此查看论文截图
Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification
Authors:Suncheng Xiang, Jingsheng Gao, Mengyuan Guan, Jiacheng Ruan, Chengfeng Zhou, Ting Liu, Dahong Qian, Yuzhuo Fu
Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.
PDF
点此查看论文截图
CHATTY: Coupled Holistic Adversarial Transport Terms with Yield for Unsupervised Domain Adaptation
Authors:Chirag P, Mukta Wagle, Ravi Kant Gupta, Pranav Jeevan, Amit Sethi
We propose a new technique called CHATTY: Coupled Holistic Adversarial Transport Terms with Yield for Unsupervised Domain Adaptation. Adversarial training is commonly used for learning domain-invariant representations by reversing the gradients from a domain discriminator head to train the feature extractor layers of a neural network. We propose significant modifications to the adversarial head, its training objective, and the classifier head. With the aim of reducing class confusion, we introduce a sub-network which displaces the classifier outputs of the source and target domain samples in a learnable manner. We control this movement using a novel transport loss that spreads class clusters away from each other and makes it easier for the classifier to find the decision boundaries for both the source and target domains. The results of adding this new loss to a careful selection of previously proposed losses leads to improvement in UDA results compared to the previous state-of-the-art methods on benchmark datasets. We show the importance of the proposed loss term using ablation studies and visualization of the movement of target domain sample in representation space.
PDF 10 pages, 4 figures
点此查看论文截图
Domain Adaptable Self-supervised Representation Learning on Remote Sensing Satellite Imagery
Authors:Muskaan Chopra, Prakash Chandra Chhipa, Gopal Mengi, Varun Gupta, Marcus Liwicki
This work presents a novel domain adaption paradigm for studying contrastive self-supervised representation learning and knowledge transfer using remote sensing satellite data. Major state-of-the-art remote sensing visual domain efforts primarily focus on fully supervised learning approaches that rely entirely on human annotations. On the other hand, human annotations in remote sensing satellite imagery are always subject to limited quantity due to high costs and domain expertise, making transfer learning a viable alternative. The proposed approach investigates the knowledge transfer of selfsupervised representations across the distinct source and target data distributions in depth in the remote sensing data domain. In this arrangement, self-supervised contrastive learning-based pretraining is performed on the source dataset, and downstream tasks are performed on the target datasets in a round-robin fashion. Experiments are conducted on three publicly available datasets, UC Merced Landuse (UCMD), SIRI-WHU, and MLRSNet, for different downstream classification tasks versus label efficiency. In self-supervised knowledge transfer, the proposed approach achieves state-of-the-art performance with label efficiency labels and outperforms a fully supervised setting. A more in-depth qualitative examination reveals consistent evidence for explainable representation learning. The source code and trained models are published on GitHub.
PDF Accepted in International Joint Conference on Neural Networks (IJCNN) 2023. First three authors shares equal contribution!
点此查看论文截图
Learning Representative Trajectories of Dynamical Systems via Domain-Adaptive Imitation
Authors:Edgardo Solano-Carrillo, Jannis Stoppe
Domain-adaptive trajectory imitation is a skill that some predators learn for survival, by mapping dynamic information from one domain (their speed and steering direction) to a different domain (current position of the moving prey). An intelligent agent with this skill could be exploited for a diversity of tasks, including the recognition of abnormal motion in traffic once it has learned to imitate representative trajectories. Towards this direction, we propose DATI, a deep reinforcement learning agent designed for domain-adaptive trajectory imitation using a cycle-consistent generative adversarial method. Our experiments on a variety of synthetic families of reference trajectories show that DATI outperforms baseline methods for imitation learning and optimal control in this setting, keeping the same per-task hyperparameters. Its generalization to a real-world scenario is shown through the discovery of abnormal motion patterns in maritime traffic, opening the door for the use of deep reinforcement learning methods for spatially-unconstrained trajectory data mining.
PDF Code is available at https://github.com/DLR-MI/dati
点此查看论文截图
Noisy Universal Domain Adaptation via Divergence Optimization for Visual Recognition
Authors:Qing Yu, Atsushi Hashimoto, Yoshitaka Ushiku
To transfer the knowledge learned from a labeled source domain to an unlabeled target domain, many studies have worked on universal domain adaptation (UniDA), where there is no constraint on the label sets of the source domain and target domain. However, the existing UniDA methods rely on source samples with correct annotations. Due to the limited resources in the real world, it is difficult to obtain a large amount of perfectly clean labeled data in a source domain in some applications. As a result, we propose a novel realistic scenario named Noisy UniDA, in which classifiers are trained using noisy labeled data from the source domain as well as unlabeled domain data from the target domain that has an uncertain class distribution. A multi-head convolutional neural network framework is proposed in this paper to address all of the challenges faced in the Noisy UniDA at once. Our network comprises a single common feature generator and multiple classifiers with various decision bounds. We can detect noisy samples in the source domain, identify unknown classes in the target domain, and align the distribution of the source and target domains by optimizing the divergence between the outputs of the various classifiers. The proposed method outperformed the existing methods in most of the settings after a thorough analysis of the various domain adaption scenarios. The source code is available at \url{https://github.com/YU1ut/Divergence-Optimization}.
PDF