GAN


2022-07-26 更新

Multimodal Image Synthesis and Editing: A Survey

Authors:Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modelling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years. Instead of providing explicit guidance for network training, multimodal guidance offers intuitive and flexible means for image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of features with inherent modality gaps, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modality and model architectures. We start with an introduction to different types of guidance modalities in image synthesis and editing. We then describe multimodal image synthesis and editing approaches extensively with detailed frameworks including Generative Adversarial Networks (GANs), Auto-regressive models, Diffusion models, Neural Radiance Fields (NeRF) and other methods. This is followed by a comprehensive description of benchmark datasets and corresponding evaluation metrics as widely adopted in multimodal image synthesis and editing, as well as detailed comparisons of various synthesis methods with analysis of respective advantages and limitations. Finally, we provide insights about the current research challenges and possible directions for future research. We hope this survey could lay a sound and valuable foundation for future development of multimodal image synthesis and editing. A project associated with this survey is available at https://github.com/fnzhan/MISE.
PDF Under submission of TPAMI

点此查看论文截图

Estimación de áreas de cultivo mediante Deep Learning y programación convencional

Authors:Javier Caicedo, Pamela Acosta, Romel Pozo, Henry Guilcapi, Christian Mejia-Escobar

Artificial Intelligence has enabled the implementation of more accurate and efficient solutions to problems in various areas. In the agricultural sector, one of the main needs is to know at all times the extent of land occupied or not by crops in order to improve production and profitability. The traditional methods of calculation demand the collection of data manually and in person in the field, causing high labor costs, execution times, and inaccuracy in the results. The present work proposes a new method based on Deep Learning techniques complemented with conventional programming for the determination of the area of populated and unpopulated crop areas. We have considered as a case study one of the most recognized companies in the planting and harvesting of sugar cane in Ecuador. The strategy combines a Generative Adversarial Neural Network (GAN) that is trained on a dataset of aerial photographs of natural and urban landscapes to improve image resolution; a Convolutional Neural Network (CNN) trained on a dataset of aerial photographs of sugar cane plots to distinguish populated or unpopulated crop areas; and a standard image processing module for the calculation of areas in a percentage manner. The experiments performed demonstrate a significant improvement in the quality of the aerial photographs as well as a remarkable differentiation between populated and unpopulated crop areas, consequently, a more accurate result of cultivated and uncultivated areas. The proposed method can be extended to the detection of possible pests, areas of weed vegetation, dynamic crop development, and both qualitative and quantitative quality control.
PDF 21 pages, in Spanish, 17 figures, 3 tables

点此查看论文截图

FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring

Authors:Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang, Meng Wang

Blind image deblurring (BID) remains a challenging and significant task. Benefiting from the strong fitting ability of deep learning, paired data-driven supervised BID method has obtained great progress. However, paired data are usually synthesized by hand, and the realistic blurs are more complex than synthetic ones, which makes the supervised methods inept at modeling realistic blurs and hinders their real-world applications. As such, unsupervised deep BID method without paired data offers certain advantages, but current methods still suffer from some drawbacks, e.g., bulky model size, long inference time, and strict image resolution and domain requirements. In this paper, we propose a lightweight and real-time unsupervised BID baseline, termed Frequency-domain Contrastive Loss Constrained Lightweight CycleGAN (shortly, FCL-GAN), with attractive properties, i.e., no image domain limitation, no image resolution limitation, 25x lighter than SOTA, and 5x faster than SOTA. To guarantee the lightweight property and performance superiority, two new collaboration units called lightweight domain conversion unit(LDCU) and parameter-free frequency-domain contrastive unit(PFCU) are designed. LDCU mainly implements inter-domain conversion in lightweight manner. PFCU further explores the similarity measure, external difference and internal connection between the blurred domain and sharp domain images in frequency domain, without involving extra parameters. Extensive experiments on several image datasets demonstrate the effectiveness of our FCL-GAN in terms of performance, model size and reference time.
PDF Please cite this work as: Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang and Meng Wang, “FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring,” In: Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), Lisbon, Portugal, June 2022

点此查看论文截图

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation

Authors:Jing He, Yiyi Zhou, Qi Zhang, Jun Peng, Yunhang Shen, Xiaoshuai Sun, Chao Chen, Rongrong Ji

Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation. However, existing methods still suffer from excessive memory footprint and computation overhead. In this paper, we propose a progressive pixel synthesis network towards efficient image generation, coined as PixelFolder. Specifically, PixelFolder formulates image generation as a progressive pixel regression problem and synthesizes images via a multi-stage structure, which can greatly reduce the overhead caused by large tensor transformations. In addition, we introduce novel pixel folding operations to further improve model efficiency while maintaining pixel-wise prior knowledge for end-to-end regression. With these innovative designs, we greatly reduce the expenditure of pixel synthesis, e.g., reducing 89% computation and 53% parameters compared with the latest pixel synthesis method CIPS. To validate our approach, we conduct extensive experiments on two benchmark datasets, namely FFHQ and LSUN Church. The experimental results show that with much less expenditure, PixelFolder obtains new state-of-the-art (SOTA) performance on two benchmark datasets, i.e., 3.77 FID and 2.45 FID on FFHQ and LSUN Church, respectively.Meanwhile, PixelFolder is also more efficient than the SOTA methods like StyleGAN2, reducing about 72% computation and 31% parameters, respectively. These results greatly validate the effectiveness of the proposed PixelFolder.
PDF Accepted by ECCV2022. The code is available at https://github.com/BlingHe/PixelFolder

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录