Authors:Shota Hirose, Shiori Maki, Naoki Wada, Heming Sun, Jiro Katto
Spectral Normalization is one of the best methods for stabilizing the training of Generative Adversarial Network. Spectral Normalization limits the gradient of discriminator between the distribution between real data and fake data. However, even with this normalization, GAN’s training sometimes fails. In this paper, we reveal that more severe restriction is sometimes needed depending on the training dataset, then we propose a novel stabilizer which offers an adaptive normalization method, called ABCAS. Our method decides discriminator’s Lipschitz constant adaptively, by checking the distance of distributions of real and fake data. Our method improves the stability of the training of Generative Adversarial Network and achieved better Fr\’echet Inception Distance score of generated images. We also investigated suitable spectral norm for three datasets. We show the result as an ablation study.
PDF ICCE 2023
Authors:Joaquim de Curtò, Irene de Zarzà, Hong Yan, Carlos T. Calafate
In this paper, we bring forward the use of the recently developed Signature Transform as a way to measure the similarity between image distributions and provide detailed acquaintance and extensive evaluations. We are the first to pioneer RMSE and MAE Signature, along with log-signature as an alternative to measure GAN convergence, a problem that has been extensively studied. We are also forerunners to introduce analytical measures based on statistics to study the goodness of fit of the GAN sample distribution that are both efficient and effective. Current GAN measures involve lots of computation normally done at the GPU and are very time consuming. In contrast, we diminish the computation time to the order of seconds and computation is done at the CPU achieving the same level of goodness. Lastly, a PCA adaptive t-SNE approach, which is novel in this context, is also proposed for data visualization.
Authors:Mingyang Zhang, Ziqi Di, Maoguo Gong, Yue Wu, Hao Li, Xiangming Jiang
In recent years, research on hyperspectral image (HSI) classification has continuous progress on introducing deep network models, and recently the graph convolutional network (GCN) based models have shown impressive performance. However, these deep learning frameworks based on point estimation suffer from low generalization and inability to quantify the classification results uncertainty. On the other hand, simply applying the Bayesian Neural Network (BNN) based on distribution estimation to classify the HSI is unable to achieve high classification accuracy due to the large amount of parameters. In this paper, we design a Bayesian layer with Bayesian idea as an insertion layer into point estimation based neural networks, and propose a Bayesian Layer Graph Convolutional Network (BLGCN) model by combining graph convolution operations, which can effectively extract graph information and estimate the uncertainty of classification results. Moreover, a Generative Adversarial Network (GAN) is built to solve the sample imbalance problem of HSI dataset. Finally, we design a dynamic control training strategy based on the confidence interval of the classification results, which will terminate the training early when the confidence interval reaches the preseted threshold. The experimental results show that our model achieves a balance between high classification accuracy and strong generalization. In addition, it can quantifies the uncertainty of the classification results.
Authors:René Haas, Stella Graßhof, Sami S. Brandt
In this paper, we present an approach for combining non-rigid structure-from-motion (NRSfM) with deep generative models,and propose an efficient framework for discovering trajectories in the latent space of 2D GANs corresponding to changes in 3D geometry. Our approach uses recent advances in NRSfM and enables editing of the camera and non-rigid shape information associated with the latent codes without needing to retrain the generator. This formulation provides an implicit dense 3D reconstruction as it enables the image synthesis of novel shapes from arbitrary view angles and non-rigid structure. The method is built upon a sparse backbone, where a neural regressor is first trained to regress parameters describing the cameras and sparse non-rigid structure directly from the latent codes. The latent trajectories associated with changes in the camera and structure parameters are then identified by estimating the local inverse of the regressor in the neighborhood of a given latent code. The experiments show that our approach provides a versatile, systematic way to model, analyze, and edit the geometry and non-rigid structures of faces.
Authors:Abdullah Hayajneh, Mohammad Shaqfeh, Erchin Serpedin, Mitchell A. Stotland
This paper presents a novel machine learning framework to consistently detect, localize and rate congenital cleft lip anomalies in human faces. The goal is to provide a universal, objective measure of facial differences and reconstructive surgical outcomes that matches human judgments. The proposed method employs the StyleGAN2 generative adversarial network with model adaptation to produce normalized transformations of cleft-affected faces in order to allow for subsequent measurement of deformity using a pixel-wise subtraction approach. The complete pipeline of the proposed framework consists of the following steps: image preprocessing, face normalization, color transformation, morphological erosion, heat-map generation and abnormality scoring. Heatmaps that finely discern anatomic anomalies are proposed by exploiting the features of the considered framework. The proposed framework is validated through computer simulations and surveys containing human ratings. The anomaly scores yielded by the proposed computer model correlate closely with the human ratings of facial differences, leading to 0.942 Pearson’s r score.
PDF All the face images used in this study were publicly available on the internet
Authors:Alex Burnap, John R. Hauser, Artem Timoshenko
Aesthetics are critically important to market acceptance. In the automotive industry, an improved aesthetic design can boost sales by 30% or more. Firms invest heavily in designing and testing aesthetics. A single automotive “theme clinic” can cost over $100,000, and hundreds are conducted annually. We propose a model to augment the commonly-used aesthetic design process by predicting aesthetic scores and automatically generating innovative and appealing product designs. The model combines a probabilistic variational autoencoder (VAE) with adversarial components from generative adversarial networks (GAN) and a supervised learning component. We train and evaluate the model with data from an automotive partner-images of 203 SUVs evaluated by targeted consumers and 180,000 high-quality unrated images. Our model predicts well the appeal of new aesthetic designs-43.5% improvement relative to a uniform baseline and substantial improvement over conventional machine learning models and pretrained deep neural networks. New automotive designs are generated in a controllable manner for use by design teams. We empirically verify that automatically generated designs are (1) appealing to consumers and (2) resemble designs which were introduced to the market five years after our data were collected. We provide an additional proof-of-concept application using opensource images of dining room chairs.
Authors:Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu
During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited regions. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With 1.2%-area edited regions, our method reduces the computation of DDIM by 7.5$\times$ and GauGAN by 18$\times$ while preserving the visual fidelity. With SIGE, we accelerate the speed of DDIM by 3.0x on RTX 3090 and 6.6$\times$ on Apple M1 Pro CPU, and GauGAN by 4.2$\times$ on RTX 3090 and 14$\times$ on Apple M1 Pro CPU.
PDF NeurIPS 2022 Website: https://www.cs.cmu.edu/~sige/ Code: https://github.com/lmxyy/sige