2023-06-27 更新
A Semi-Paired Approach For Label-to-Image Translation
Authors:George Eskandar, Shuai Zhang, Mohamed Abdelsamad, Mark Youssef, Diandian Guo, Bin Yang
Data efficiency, or the ability to generalize from a few labeled data, remains a major challenge in deep learning. Semi-supervised learning has thrived in traditional recognition tasks alleviating the need for large amounts of labeled data, yet it remains understudied in image-to-image translation (I2I) tasks. In this work, we introduce the first semi-supervised (semi-paired) framework for label-to-image translation, a challenging subtask of I2I which generates photorealistic images from semantic label maps. In the semi-paired setting, the model has access to a small set of paired data and a larger set of unpaired images and labels. Instead of using geometrical transformations as a pretext task like previous works, we leverage an input reconstruction task by exploiting the conditional discriminator on the paired data as a reverse generator. We propose a training algorithm for this shared network, and we present a rare classes sampling algorithm to focus on under-represented classes. Experiments on 3 standard benchmarks show that the proposed model outperforms state-of-the-art unsupervised and semi-supervised approaches, as well as some fully supervised approaches while using a much smaller number of paired samples.
PDF
点此查看论文截图
Improving Panoptic Segmentation for Nighttime or Low-Illumination Urban Driving Scenes
Authors:Ankur Chrungoo
Autonomous vehicles and driving systems use scene parsing as an essential tool to understand the surrounding environment. Panoptic segmentation is a state-of-the-art technique which proves to be pivotal in this use case. Deep learning-based architectures have been utilized for effective and efficient Panoptic Segmentation in recent times. However, when it comes to adverse conditions like dark scenes with poor illumination or nighttime images, existing methods perform poorly in comparison to daytime images. One of the main factors for poor results is the lack of sufficient and accurately annotated nighttime images for urban driving scenes. In this work, we propose two new methods, first to improve the performance, and second to improve the robustness of panoptic segmentation in nighttime or poor illumination urban driving scenes using a domain translation approach. The proposed approach makes use of CycleGAN (Zhu et al., 2017) to translate daytime images with existing panoptic annotations into nighttime images, which are then utilized to retrain a Panoptic segmentation model to improve performance and robustness under poor illumination and nighttime conditions. In our experiments, Approach-1 demonstrates a significant improvement in the Panoptic segmentation performance on the converted Cityscapes dataset with more than +10% PQ, +12% RQ, +2% SQ, +14% mIoU and +10% AP50 absolute gain. Approach-2 demonstrates improved robustness to varied nighttime driving environments. Both the approaches are supported via comprehensive quantitative and qualitative analysis.
PDF 12 pages, 6 figures
点此查看论文截图
UAlberta at SemEval-2023 Task 1: Context Augmentation and Translation for Multilingual Visual Word Sense Disambiguation
Authors:Michael Ogezi, Bradley Hauer, Talgat Omarov, Ning Shi, Grzegorz Kondrak
We describe the systems of the University of Alberta team for the SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task. We present a novel algorithm that leverages glosses retrieved from BabelNet, in combination with text and image encoders. Furthermore, we compare language-specific encoders against the application of English encoders to translated texts. As the contexts given in the task datasets are extremely short, we also experiment with augmenting these contexts with descriptions generated by a language model. This yields substantial improvements in accuracy. We describe and evaluate additional V-WSD methods which use image generation and text-conditioned image segmentation. Overall, the results of our official submission rank us 18 out of 56 teams. Some of our unofficial results are even better than the official ones. Our code is publicly available at https://github.com/UAlberta-NLP/v-wsd.
PDF
点此查看论文截图
Domain-Scalable Unpaired Image Translation via Latent Space Anchoring
Authors:Siyu Huang, Jie An, Donglai Wei, Zudi Lin, Jiebo Luo, Hanspeter Pfister
Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data. However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains. To address this problem, we propose a new domain-scalable UNIT method, termed as latent space anchoring, which can be efficiently extended to new visual domains and does not need to fine-tune encoders and decoders of existing domains. Our method anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models to reconstruct single-domain images. In the inference phase, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning. Experiments on various datasets show that the proposed method achieves superior performance on both standard and domain-scalable UNIT tasks in comparison with the state-of-the-art methods.
PDF Accepeted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Code is available at https://github.com/siyuhuang/Latent-Space-Anchoring