Vision Transformer


2022-11-29 更新

Identification of Surface Defects on Solar PV Panels and Wind Turbine Blades using Attention based Deep Learning Model

Authors:Divyanshi Dwivedi, K. Victor Sam Moses Babu, Pradeep Kumar Yemula, Pratyush Chakraborty, Mayukha Pal

According to Global Electricity Review 2022, electricity generation from renewable energy sources has increased by 20% worldwide primarily due to more installation of large green power plants. Monitoring the renewable energy assets in those large power plants is still challenging as the assets are highly impacted by several environmental factors, resulting in issues like less power generation, malfunctioning, and degradation of asset life. Therefore, detecting the surface defects on the renewable energy assets would facilitate the process to maintain the safety and efficiency of the green power plants. An innovative detection framework is proposed to achieve an economical renewable energy asset surface monitoring system. First capture the asset’s high-resolution images on a regular basis and inspect them to detect the damages. For inspection this paper presents a unified deep learning-based image inspection model which analyzes the captured images to identify the surface or structural damages on the various renewable energy assets in large power plants. We use the Vision Transformer (ViT), the latest developed deep-learning model in computer vision, to detect the damages on solar panels and wind turbine blades and classify the type of defect to suggest the preventive measures. With the ViT model, we have achieved above 97% accuracy for both the assets, which outperforms the benchmark classification models for the input images of varied modalities taken from publicly available sources.
PDF

点此查看论文截图

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Authors:Huaishao Luo, Junwei Bao, Youzheng Wu, Xiaodong He, Tianrui Li

Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. The SegCLIP achieves segmentation based on ViT and the main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. The gathering operation can dynamically capture the semantic groups, which can be used to generate the final segmentation results. We further propose a reconstruction loss on masked patches and a superpixel-based KL loss with pseudo-labels to enhance the visual representation. Experimental results show that our model achieves comparable or superior segmentation accuracy on the PASCAL VOC 2012 (+1.4% mIoU), PASCAL Context (+2.4% mIoU), and COCO (+5.6% mIoU) compared with baselines. We release the code at https://github.com/ArrowLuo/SegCLIP.
PDF

点此查看论文截图

Self-Distilled Self-Supervised Representation Learning

Authors:Jiho Jang, Seonhoon Kim, Kiyoon Yoo, Chaerin Kong, Jangho Kim, Nojun Kwak

State-of-the-art frameworks in self-supervised learning have recently shown that fully utilizing transformer-based models can lead to performance boost compared to conventional CNN models. Striving to maximize the mutual information of two views of an image, existing works apply a contrastive loss to the final representations. Motivated by self-distillation in the supervised regime, we further exploit this by allowing the intermediate representations to learn from the final layer via the contrastive loss. Through self-distillation, the intermediate layers are better suited for instance discrimination, making the performance of an early-exited sub-network not much degraded from that of the full network. This renders the pretext task easier also for the final layer, leading to better representations. Our method, Self-Distilled Self-Supervised Learning (SDSSL), outperforms competitive baselines (SimCLR, BYOL and MoCo v3) using ViT on various tasks and datasets. In the linear evaluation and k-NN protocol, SDSSL not only leads to superior performance in the final layers, but also in most of the lower layers. Furthermore, qualitative and quantitative analyses show how representations are formed more effectively along the transformer layers. Code is available at https://github.com/hagiss/SDSSL.
PDF WACV 23, 11 pages

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录