2023-07-01 更新
Weakly Supervised Scene Text Generation for Low-resource Languages
Authors:Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote Shivakum, Bing Yin, Cong Liu, Yue Lu
A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages. In this paper, we propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision. The proposed method is able to generate a large amount of scene text images with diverse backgrounds and font styles through cross-language generation. Our method disentangles the content and style features of scene text images, with the former representing textual information and the latter representing characteristics such as font, alignment, and background. To preserve the complete content structure of generated images, we introduce an integrated attention module. Furthermore, to bridge the style gap in the style of different languages, we incorporate a pre-trained font classifier. We evaluate our method using state-of-the-art scene text recognition models. Experiments demonstrate that our generated scene text significantly improves the scene text recognition accuracy and help achieve higher accuracy when complemented with other generative methods.
PDF
点此查看论文截图
Efficient and Accurate Scene Text Detection with Low-Rank Approximation Network
Authors:Yuchen Su
Recently, regression-based methods, which predict parameter curves for localizing texts, are popular in scene text detection. However, these methods struggle to balance concise structure and fast post-processing, and the existing parameter curves are still not ideal for modeling arbitrary-shaped texts, leading to a challenge in balancing speed and accuracy. To tackle these challenges, we firstly propose a dual matching scheme for positive samples, which accelerates inference speed through sparse matching scheme and accelerates model convergence through dense matching scheme. Then, we propose a novel text contour representation method based on low-rank approximation by exploiting the shape correlation between different text contours, which is complete, compact, simplicity and robustness. Based on these designs, we implement an efficient and accurate arbitrary-shaped text detector, named LRANet. Extensive experiments are conducted on three challenging datasets, which demonstrate the accuracy and efficiency of our LRANet over state-of-the-art methods. The code will be released soon.
PDF
点此查看论文截图
DiffusionSTR: Diffusion Model for Scene Text Recognition
Authors:Masato Fujitake
This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.
PDF Accepted to ICIP 2023