I2I Translation

2022-12-08 更新

Unpaired Image Translation via Vector Symbolic Architectures

Authors:Justin Theiss, Jay Leverett, Daeil Kim, Aayush Prakash

Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. We introduce VSA-based constraints on adversarial learning for source-to-target translations by learning a hypervector mapping that inverts the translation to ensure consistency with source content. We show both qualitatively and quantitatively that our method improves over other state-of-the-art techniques.


Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing

Authors:Sarmad F. Ismael, Koray Kayabol, Erchan Aptoula

Domain adaptation is one of the prominent strategies for handling both domain shift, that is widely encountered in large-scale land use/land cover map calculation, and the scarcity of pixel-level ground truth that is crucial for supervised semantic segmentation. Studies focusing on adversarial domain adaptation via re-styling source domain samples, commonly through generative adversarial networks, have reported varying levels of success, yet they suffer from semantic inconsistencies, visual corruptions, and often require a large number of target domain samples. In this letter, we propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images, that i) leads to semantically consistent and noise-free images, ii) operates with a single target domain sample (i.e. one-shot) and iii) at a fraction of the number of parameters required from state-of-the-art methods. More specifically an image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains, and a perceptual network module and loss function is further introduced to enforce semantic consistency. Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods. Our source code will be available at \url{https://github.com/Sarmadfismael/LRM_I2I}.


Cyclically Disentangled Feature Translation for Face Anti-spoofing

Authors:Haixiao Yue, Keyao Wang, Guosheng Zhang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang

Current domain adaptation methods for face anti-spoofing leverage labeled source domain data and unlabeled target domain data to obtain a promising generalizable decision boundary. However, it is usually difficult for these methods to achieve a perfect domain-invariant liveness feature disentanglement, which may degrade the final classification performance by domain differences in illumination, face category, spoof type, etc. In this work, we tackle cross-scenario face anti-spoofing by proposing a novel domain adaptation method called cyclically disentangled feature translation network (CDFTN). Specifically, CDFTN generates pseudo-labeled samples that possess: 1) source domain-invariant liveness features and 2) target domain-specific content features, which are disentangled through domain adversarial training. A robust classifier is trained based on the synthetic pseudo-labeled images under the supervision of source domain labels. We further extend CDFTN for multi-target domain adaptation by leveraging data from more unlabeled target domains. Extensive experiments on several public datasets demonstrate that our proposed approach significantly outperforms the state of the art.
PDF Accepted by AAAI2023


Exploring Stroke-Level Modifications for Scene Text Editing

Authors:Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, Yongdong Zhang

Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
PDF Accepted to AAAI 2023


AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

Authors:Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., “gloomy” or “Asian”). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI_Illustrator.


High-resolution semantically-consistent image-to-image translation

Authors:Mikhail Sokolov, Christopher Henry, Joni Storie, Christopher Storie, Victor Alhassan, Mathieu Turgeon-Pelchat

Deep learning has become one of remote sensing scientists’ most efficient computer vision tools in recent years. However, the lack of training labels for the remote sensing datasets means that scientists need to solve the domain adaptation problem to narrow the discrepancy between satellite image datasets. As a result, image segmentation models that are then trained, could better generalize and use an existing set of labels instead of requiring new ones. This work proposes an unsupervised domain adaptation model that preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. This paper’s major contribution is proposing the improved architecture of the SemI2I model, which significantly boosts the proposed model’s performance and makes it competitive with the state-of-the-art CyCADA model. A second contribution is testing the CyCADA model on the remote sensing multi-band datasets such as WorldView-2 and SPOT-6. The proposed model preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. Thus, the semantic segmentation model, trained on the adapted images, shows substantial performance gain compared to the SemI2I model and reaches similar results as the state-of-the-art CyCADA model. The future development of the proposed method could include ecological domain transfer, {\em a priori} evaluation of dataset quality in terms of data distribution, or exploration of the inner architecture of the domain adaptation model.
PDF 25 pages, 7 figures


Unsupervised Structure-Consistent Image-to-Image Translation

Authors:Shima Shahfar, Charalambos Poullis

The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation. We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers. The auxiliary module’s loss forces the generator to learn to reconstruct an image with an all-zero texture code, encouraging better disentanglement between the structure and texture information. The proposed attribute-based transfer method enables refined control in style transfer while preserving structural information without using a semantic mask. To manipulate an image, we encode both the geometry of the objects and the general style of the input images into two latent codes with an additional constraint that enforces structure consistency. Moreover, due to the auxiliary loss, training time is significantly reduced. The superiority of the proposed model is demonstrated in complex domains such as satellite images where state-of-the-art are known to fail. Lastly, we show that our model improves the quality metrics for a wide range of datasets while achieving comparable results with multi-modal image generation techniques.
PDF structure-consistent image-to-image translation \and style transfer \and training class imbalance


Image Translation Based Nuclei Segmentation for Immunohistochemistry Images

Authors:Roger Trullo, Quoc-Anh Bui, Qi Tang, Reza Olfati-Saber

Numerous deep learning based methods have been developed for nuclei segmentation for H&E images and have achieved close to human performance. However, direct application of such methods to another modality of images, such as Immunohistochemistry (IHC) images, may not achieve satisfactory performance. Thus, we developed a Generative Adversarial Network (GAN) based approach to translate an IHC image to an H&E image while preserving nuclei location and morphology and then apply pre-trained nuclei segmentation models to the virtual H&E image. We demonstrated that the proposed methods work better than several baseline methods including direct application of state of the art nuclei segmentation methods such as Cellpose and HoVer-Net, trained on H&E and a generative method, DeepLIIF, using two public IHC image datasets.


Unsupervised Domain Adaptation with Histogram-gated Image Translation for Delayered IC Image Analysis

Authors:Yee-Yang Tee, Deruo Cheng, Chye-Soon Chee, Tong Lin, Yiqiong Shi, Bah-Hwee Gwee

Deep learning has achieved great success in the challenging circuit annotation task by employing Convolutional Neural Networks (CNN) for the segmentation of circuit structures. The deep learning approaches require a large amount of manually annotated training data to achieve a good performance, which could cause a degradation in performance if a deep learning model trained on a given dataset is applied to a different dataset. This is commonly known as the domain shift problem for circuit annotation, which stems from the possibly large variations in distribution across different image datasets. The different image datasets could be obtained from different devices or different layers within a single device. To address the domain shift problem, we propose Histogram-gated Image Translation (HGIT), an unsupervised domain adaptation framework which transforms images from a given source dataset to the domain of a target dataset, and utilize the transformed images for training a segmentation network. Specifically, our HGIT performs generative adversarial network (GAN)-based image translation and utilizes histogram statistics for data curation. Experiments were conducted on a single labeled source dataset adapted to three different target datasets (without labels for training) and the segmentation performance was evaluated for each target dataset. We have demonstrated that our method achieves the best performance compared to the reported domain adaptation techniques, and is also reasonably close to the fully supervised benchmark.
PDF 7 pages, 4 figures, To be presented at IEEE PAINE 2022 (oral)


文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !