2022-10-15 更新
Reducing Annotation Effort by Identifying and Labeling Contextually Diverse Classes for Semantic Segmentation Under Domain Shift
Authors:Sharat Agarwal, Saket Anand, Chetan Arora
In Active Domain Adaptation (ADA), one uses Active Learning (AL) to select a subset of images from the target domain, which are then annotated and used for supervised domain adaptation (DA). Given the large performance gap between supervised and unsupervised DA techniques, ADA allows for an excellent trade-off between annotation cost and performance. Prior art makes use of measures of uncertainty or disagreement of models to identify regions' to be annotated by the human oracle. However, these regions frequently comprise of pixels at object boundaries which are hard and tedious to annotate. Hence, even if the fraction of image pixels annotated reduces, the overall annotation time and the resulting cost still remain high. In this work, we propose an ADA strategy, which given a frame, identifies a set of classes that are hardest for the model to predict accurately, thereby recommending semantically meaningful regions to be annotated in a selected frame. We show that these set of
hard’ classes are context-dependent and typically vary across frames, and when annotated help the model generalize better. We propose two ADA techniques: the Anchor-based and Augmentation-based approaches to select complementary and diverse regions in the context of the current training set. Our approach achieves 66.6 mIoU on GTA to Cityscapes dataset with an annotation budget of 4.7% in comparison to 64.9 mIoU by MADA using 5% of annotations. Our technique can also be used as a decorator for any existing frame-based AL technique, e.g., we report 1.5% performance improvement for CDAL on Cityscapes using our approach.
PDF Accepted WACV2023
点此查看论文截图
3D Reconstruction of Sculptures from Single Images via Unsupervised Domain Adaptation on Implicit Models
Authors:Ziyi Chang, George Alex Koulieris, Hubert P. H. Shum
Acquiring the virtual equivalent of exhibits, such as sculptures, in virtual reality (VR) museums, can be labour-intensive and sometimes infeasible. Deep learning based 3D reconstruction approaches allow us to recover 3D shapes from 2D observations, among which single-view-based approaches can reduce the need for human intervention and specialised equipment in acquiring 3D sculptures for VR museums. However, there exist two challenges when attempting to use the well-researched human reconstruction methods: limited data availability and domain shift. Considering sculptures are usually related to humans, we propose our unsupervised 3D domain adaptation method for adapting a single-view 3D implicit reconstruction model from the source (real-world humans) to the target (sculptures) domain. We have compared the generated shapes with other methods and conducted ablation studies as well as a user study to demonstrate the effectiveness of our adaptation method. We also deploy our results in a VR application.
PDF
点此查看论文截图
Diffusion models as plug-and-play priors
Authors:Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, Dimitris Samaras
We consider the problem of inferring high-dimensional data $\mathbf{x}$ in a model that consists of a prior $p(\mathbf{x})$ and an auxiliary differentiable constraint $c(\mathbf{x},\mathbf{y})$ on $x$ given some additional information $\mathbf{y}$. In this paper, the prior is an independently trained denoising diffusion generative model. The auxiliary constraint is expected to have a differentiable form, but can come from diverse sources. The possibility of such inference turns diffusion models into plug-and-play modules, thereby allowing a range of potential applications in adapting models to new domains and tasks, such as conditional generation or image segmentation. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise at each step. Considering many noised versions of $\mathbf{x}$ in evaluation of its fitness is a novel search mechanism that may lead to new algorithms for solving combinatorial optimization problems.
PDF NeurIPS 2022; code: https://github.com/AlexGraikos/diffusion_priors
点此查看论文截图
CS-Insights: A System for Analyzing Computer Science Research
Authors:Terry Ruas, Jan Philip Wahle, Lennart Küll, Saif M. Mohammad, Bela Gipp
This paper presents CS-Insights, an interactive web application to analyze computer science publications from DBLP through multiple perspectives. The dedicated interfaces allow its users to identify trends in research activity, productivity, accessibility, author’s productivity, venues’ statistics, topics of interest, and the impact of computer science research on other fields. CS-Insightsis publicly available, and its modular architecture can be easily adapted to domains other than computer science.
PDF
点此查看论文截图
Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation
Authors:Xu Guo, Boyang Li, Han Yu
Prompt tuning, or the conditioning of a frozen pretrained language model (PLM) with soft prompts learned from data, has demonstrated impressive performance on a wide range of NLP tasks. However, prompt tuning requires a large training dataset to be effective and is outperformed by finetuning the entire PLM in data-scarce regimes. Previous work \citep{gu-etal-2022-ppt,vu-etal-2022-spot} proposed to transfer soft prompts pretrained on the source domain to the target domain. In this paper, we explore domain adaptation for prompt tuning, a problem setting where unlabeled data from the target domain are available during pretraining. We propose bOosting Prompt TunIng with doMain Adaptation (OPTIMA), which regularizes the decision boundary to be smooth around regions where source and target data distributions are similar. Extensive experiments demonstrate that OPTIMA significantly enhances the transferability and sample-efficiency of prompt tuning compared to strong baselines. Moreover, in few-shot settings, OPTIMA exceeds full-model tuning by a large margin.
PDF 13 pages, 11 figures, Findings of EMNLP 2022
点此查看论文截图
When Do Flat Minima Optimizers Work?
Authors:Jean Kaddour, Linqing Liu, Ricardo Silva, Matt J. Kusner
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network’s generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.
PDF
点此查看论文截图
Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains
Authors:Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, Akshay Chaudhari
Multi-modal foundation models are typically trained on millions of pairs of natural images and text captions, frequently obtained through web-crawling approaches. Although such models depict excellent generative capabilities, they do not typically generalize well to specific domains such as medical images that have fundamentally shifted distributions compared to natural images. Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets. Thus, in this study, we seek to research and expand the representational capabilities of large pretrained foundation models to medical concepts, specifically for leveraging the Stable Diffusion model to generate domain specific images found in medical imaging. We explore the sub-components of the Stable Diffusion pipeline (the variational autoencoder, the U-Net and the text-encoder) to fine-tune the model to generate medical images. We benchmark the efficacy of these efforts using quantitative image quality metrics and qualitative radiologist-driven evaluations that accurately represent the clinical content of conditional text prompts. Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image, while maintaining a 95% accuracy on a classifier trained to detect the abnormality.
PDF 17 pages, 8 figures
点此查看论文截图
Local Byte Fusion for Neural Machine Translation
Authors:Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, Junjie Hu
Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corpora, subword tokenization schemes over-segment low-resource languages leading to a drop in translation performance. A simple alternative to subword tokenizers is byte-based methods i.e. tokenization into byte sequences using encoding schemes such as UTF-8. Byte tokens often represent inputs at a sub-character granularity i.e. one character can be represented by a sequence of multiple byte tokens. This results in byte sequences that are significantly longer than character sequences. Enforcing aggregation of local information in the lower layers can guide the model to build higher-level semantic information. We propose a Local Byte Fusion (LOBEF) method for byte-based machine translation — utilizing byte $n$-gram and word boundaries — to aggregate local semantic information. Extensive experiments on multilingual translation, zero-shot cross-lingual transfer, and domain adaptation reveal a consistent improvement over traditional byte-based models and even over subword techniques. Further analysis also indicates that our byte-based models are parameter-efficient and can be trained faster than subword models.
PDF
点此查看论文截图
On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity
Authors:Rodrigue Siry, Ryan Webster, Loic Simon, Julien Rabin
The recent advent of powerful generative models has triggered the renewed development of quantitative measures to assess the proximity of two probability distributions. As the scalar Frechet inception distance remains popular, several methods have explored computing entire curves, which reveal the trade-off between the fidelity and variability of the first distribution with respect to the second one. Several of such variants have been proposed independently and while intuitively similar, their relationship has not yet been made explicit. In an effort to make the emerging picture of generative evaluation more clear, we propose a unification of four curves known respectively as: the precision-recall (PR) curve, the Lorenz curve, the receiver operating characteristic (ROC) curve and a special case of R\’enyi divergence frontiers. In addition, we discuss possible links between PR / Lorenz curves with the derivation of domain adaptation bounds.
PDF 32 pages, 3 figures
点此查看论文截图
Feature-Adaptive Interactive Thresholding of Large 3D Volumes
Authors:Thomas Lang, Tomas Sauer
Thresholding is the most widely used segmentation method in volumetric image processing, and its pointwise nature makes it attractive for the fast handling of large three-dimensional samples. However, global thresholds often do not properly extract components in the presence of artifacts, measurement noise or grayscale value fluctuations. This paper introduces Feature-Adaptive Interactive Thresholding (FAITH), a thresholding technique that incorporates (geometric) features, local processing and interactive user input to overcome these limitations. Given a global threshold suitable for most regions, FAITH uses interactively selected seed voxels to identify critical regions in which that threshold will be adapted locally on the basis of features computed from local environments around these voxels. The combination of domain expert knowledge and a rigorous mathematical model thus enables a very exible way of local thresholding with intuitive user interaction. A qualitative analysis shows that the proposed model is able to overcome limitations typically occuring in plain thresholding while staying efficient enough to also allow the segmentation of big volumes.
PDF 13 pages, 3 figures, 2 tables