2022-07-16 更新
BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks
Authors:Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata
High-quality calibrated uncertainty estimates are crucial for numerous real-world applications, especially for deep learning-based deployed ML systems. While Bayesian deep learning techniques allow uncertainty estimation, training them with large-scale datasets is an expensive process that does not always yield models competitive with non-Bayesian counterparts. Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature and do not provide uncertainty estimates. To address these issues, we propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation. BayesCap is a memory-efficient method that can be trained on a small fraction of the original dataset, enhancing pretrained non-Bayesian computer vision models by providing calibrated uncertainty estimates for the predictions without (i) hampering the performance of the model and (ii) the need for expensive retraining the model from scratch. The proposed method is agnostic to various architectures and tasks. We show the efficacy of our method on a wide variety of tasks with a diverse set of architectures, including image super-resolution, deblurring, inpainting, and crucial application such as medical image translation. Moreover, we apply the derived uncertainty estimates to detect out-of-distribution samples in critical scenarios like depth estimation in autonomous driving. Code is available at https://github.com/ExplainableML/BayesCap.
PDF Accepted at ECCV 2022. Code is available at https://github.com/ExplainableML/BayesCap
点此查看论文截图
EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations
Authors:Min Zhao, Fan Bao, Chongxuan Li, Jun Zhu
Score-based diffusion generative models (SDGMs) have achieved the SOTA FID results in unpaired image-to-image translation (I2I). However, we notice that existing methods totally ignore the training data in the source domain, leading to sub-optimal solutions for unpaired I2I. To this end, we propose energy-guided stochastic differential equations (EGSDE) that employs an energy function pretrained on both the source and target domains to guide the inference process of a pretrained SDE for realistic and faithful unpaired I2I. Building upon two feature extractors, we carefully design the energy function such that it encourages the transferred image to preserve the domain-independent features and discard domainspecific ones. Further, we provide an alternative explanation of the EGSDE as a product of experts, where each of the three experts (corresponding to the SDE and two feature extractors) solely contributes to faithfulness or realism. Empirically, we compare EGSDE to a large family of baselines on three widely-adopted unpaired I2I tasks under four metrics. EGSDE not only consistently outperforms existing SDGMs-based methods in almost all settings but also achieves the SOTA realism results (e.g., FID of 65.82 in Cat to Dog and FID of 59.75 in Wild to Dog on AFHQ) without harming the faithful performance.
PDF
点此查看论文截图
Learning to translate by learning to communicate
Authors:C. M. Downey, Leo Z. Liu, Xuhui Zhou, Shane Steinert-Threlkeld
We formulate and test a technique to use Emergent Communication (EC) with a pretrained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the currently dominant paradigm in NLP of pretraining on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been highlighted. In our approach, we embed a modern multilingual model (mBART, Liu et. al. 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task, with the hypothesis that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et. al. 2022), one of which outperforms a backtranslation-based baseline in 6/8 translation settings, and proves especially beneficial for the very low-resource languages of Nepali and Sinhala.
PDF
点此查看论文截图
Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos
Authors:Sida Peng, Zhen Xu, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, Xiaowei Zhou
This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce a pose-driven deformation field based on the linear blend skinning algorithm, which combines the blend weight field and the 3D human skeleton to produce observation-to-canonical correspondences. Since 3D human skeletons are more observable, they can regularize the learning of the deformation field. Moreover, the pose-driven deformation field can be controlled by input skeletal motions to generate new deformation fields to animate the canonical human model. Experiments show that our approach significantly outperforms recent human modeling methods. The code is available at https://zju3dv.github.io/animatable_nerf/.
PDF Project page: https://zju3dv.github.io/animatable_nerf/ and https://zju3dv.github.io/animatable_sdf/. arXiv admin note: substantial text overlap with arXiv:2105.02872
点此查看论文截图
Explaining Image Enhancement Black-Box Methods through a Path Planning Based Algorithm
Authors:Marco Cotogni, Claudio Cusano
Nowadays, image-to-image translation methods, are the state of the art for the enhancement of natural images. Even if they usually show high performance in terms of accuracy, they often suffer from several limitations such as the generation of artifacts and the scalability to high resolutions. Moreover, their main drawback is the completely black-box approach that does not allow to provide the final user with any insight about the enhancement processes applied. In this paper we present a path planning algorithm which provides a step-by-step explanation of the output produced by state of the art enhancement methods, overcoming black-box limitation. This algorithm, called eXIE, uses a variant of the A* algorithm to emulate the enhancement process of another method through the application of an equivalent sequence of enhancing operators. We applied eXIE to explain the output of several state-of-the-art models trained on the Five-K dataset, obtaining sequences of enhancing operators able to produce very similar results in terms of performance and overcoming the huge limitation of poor interpretability of the best performing algorithms.
PDF