2022-08-04 更新
Real Time Object Detection System with YOLO and CNN Models: A Review
Authors:Viswanatha V, Chandana R K, Ramachandra A. C.
The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK ONCE (YOLO) algorithm and it’s more evolved versions are briefly described in this research survey. This survey is all about YOLO and convolution neural networks (CNN)in the direction of real time object detection.YOLO does generalized object representation more effectively without precision losses than other object detection models.CNN architecture models have the ability to eliminate highlights and identify objects in any given image. When implemented appropriately, CNN models can address issues like deformity diagnosis, creating educational or instructive application, etc. This article reached atnumber of observations and perspective findings through the analysis.Also it provides support for the focused visual information and feature extraction in the financial and other industries, highlights the method of target detection and feature selection, and briefly describe the development process of YOLO algorithm.
PDF
点此查看论文截图
Training a universal instance segmentation network for live cell images of various cell types and imaging modalities
Authors:Tianqi Guo, Yin Wang, Luis Solorio, Jan P. Allebach
We share our recent findings in an attempt to train a universal segmentation network for various cell types and imaging modalities. Our method was built on the generalized U-Net architecture, which allows the evaluation of each component individually. We modified the traditional binary training targets to include three classes for direct instance segmentation. Detailed experiments were performed regarding training schemes, training settings, network backbones, and individual modules on the segmentation performance. Our proposed training scheme draws minibatches in turn from each dataset, and the gradients are accumulated before an optimization step. We found that the key to training a universal network is all-time supervision on all datasets, and it is necessary to sample each dataset in an unbiased way. Our experiments also suggest that there might exist common features to define cell boundaries across cell types and imaging modalities, which could allow application of trained models to totally unseen datasets. A few training tricks can further boost the segmentation performance, including uneven class weights in the cross-entropy loss function, well-designed learning rate scheduler, larger image crops for contextual information, and additional loss terms for unbalanced classes. We also found that segmentation performance can benefit from group normalization layer and Atrous Spatial Pyramid Pooling module, thanks to their more reliable statistics estimation and improved semantic understanding, respectively. We participated in the 6th Cell Tracking Challenge (CTC) held at IEEE International Symposium on Biomedical Imaging (ISBI) 2021 using one of the developed variants. Our method was evaluated as the best runner up during the initial submission for the primary track, and also secured the 3rd place in an additional round of competition in preparation for the summary publication.
PDF A summary report of participation in the 6th Cell Tracking Challenge (CTC) at IEEE ISBI 2021
点此查看论文截图
Low Cost Embedded Vision System For Location And Tracking Of A Color Object
Authors:Diego Ayala, Danilo Chavez, Leopoldo Altamirano Robles
This paper describes the development of an embedded vision system for detection, location, and tracking of a color object; it makes use of a single 32-bit microprocessor to acquire image data, process, and perform actions according to the interpreted data. The system is intended for applications that need to make use of artificial vision for detection, location and tracking of a color object and its objective is to have achieve at reduced terms of size, power consumption, and cost.
PDF
点此查看论文截图
Texture based Prototypical Network for Few-Shot Semantic Segmentation of Forest Cover: Generalizing for Different Geographical Regions
Authors:Gokul P, Ujjwal Verma
Forest plays a vital role in reducing greenhouse gas emissions and mitigating climate change besides maintaining the world’s biodiversity. The existing satellite-based forest monitoring system utilizes supervised learning approaches that are limited to a particular region and depend on manually annotated data to identify forest. This work envisages forest identification as a few-shot semantic segmentation task to achieve generalization across different geographical regions. The proposed few-shot segmentation approach incorporates a texture attention module in the prototypical network to highlight the texture features of the forest. Indeed, the forest exhibits a characteristic texture different from other classes, such as road, water, etc. In this work, the proposed approach is trained for identifying tropical forests of South Asia and adapted to determine the temperate forest of Central Europe with the help of a few (one image for 1-shot) manually annotated support images of the temperate forest. An IoU of 0.62 for forest class (1-way 1-shot) was obtained using the proposed method, which is significantly higher (0.46 for PANet) than the existing few-shot semantic segmentation approach. This result demonstrates that the proposed approach can generalize across geographical regions for forest identification, creating an opportunity to develop a global forest cover identification tool.
PDF 5 pages, 2 figures, includes additional experiments
点此查看论文截图
2022-08-04 更新
Bridging the Gap Between Object Detection and User Intent via Query-Modulation
Authors:Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard
When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task.
PDF
点此查看论文截图
SSformer: A Lightweight Transformer for Semantic Segmentation
Authors:Wentao Shi, Jing Xu, Pan Gao
It is well believed that Transformer performs better in semantic segmentation compared to convolutional neural networks. Nevertheless, the original Vision Transformer may lack of inductive biases of local neighborhoods and possess a high time complexity. Recently, Swin Transformer sets a new record in various vision tasks by using hierarchical architecture and shifted windows while being more efficient. However, as Swin Transformer is specifically designed for image classification, it may achieve suboptimal performance on dense prediction-based segmentation task. Further, simply combing Swin Transformer with existing methods would lead to the boost of model size and parameters for the final segmentation model. In this paper, we rethink the Swin Transformer for semantic segmentation, and design a lightweight yet effective transformer model, called SSformer. In this model, considering the inherent hierarchical design of Swin Transformer, we propose a decoder to aggregate information from different layers, thus obtaining both local and global attentions. Experimental results show the proposed SSformer yields comparable mIoU performance with state-of-the-art models, while maintaining a smaller model size and lower compute.
PDF
点此查看论文截图
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Authors:De-An Huang, Zhiding Yu, Anima Anandkumar
We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP. Since MinVIS treats frames in training videos as independent images, we can drastically sub-sample the annotated frames in training videos without any modifications. With only 1% of labeled frames, MinVIS outperforms or is comparable to fully-supervised state-of-the-art approaches on YouTube-VIS 2019/2021. Our key observation is that queries trained to be discriminative between intra-frame object instances are temporally consistent and can be used to track instances without any manually designed heuristics. MinVIS thus has the following inference pipeline: we first apply the trained query-based image instance segmentation to video frames independently. The segmented instances are then tracked by bipartite matching of the corresponding queries. This inference is done in an online fashion and does not need to process the whole video at once. MinVIS thus has the practical advantages of reducing both the labeling costs and the memory requirements, while not sacrificing the VIS performance. Code is available at: https://github.com/NVlabs/MinVIS
PDF
点此查看论文截图
KD-SCFNet: Towards More Accurate and Efficient Salient Object Detection via Knowledge Distillation
Authors:Jin Zhang, Qiuwei Liang, Yanjiao Shi
Most existing salient object detection (SOD) models are difficult to apply due to the complex and huge model structures. Although some lightweight models are proposed, the accuracy is barely satisfactory. In this paper, we design a novel semantics-guided contextual fusion network (SCFNet) that focuses on the interactive fusion of multi-level features for accurate and efficient salient object detection. Furthermore, we apply knowledge distillation to SOD task and provide a sizeable dataset KD-SOD80K. In detail, we transfer the rich knowledge from a seasoned teacher to the untrained SCFNet through unlabeled images, enabling SCFNet to learn a strong generalization ability to detect salient objects more accurately. The knowledge distillation based SCFNet (KDSCFNet) achieves comparable accuracy to the state-of-the-art heavyweight methods with less than 1M parameters and 174 FPS real-time detection speed. Extensive experiments demonstrate the robustness and effectiveness of the proposed distillation method and SOD framework. Code and data: https://github.com/zhangjinCV/KD-SCFNet.
PDF ECCV2022