场景文本检测识别

发布日期: 2023-03-21

2023-03-21 更新

Scene Graph Based Fusion Network For Image-Text Retrieval

Authors:Guoliang Wang, Yanlei Shang, Yong Chen

A critical challenge to image-text retrieval is how to learn accurate correspondences between images and texts. Most existing methods mainly focus on coarse-grained correspondences based on co-occurrences of semantic objects, while failing to distinguish the fine-grained local correspondences. In this paper, we propose a novel Scene Graph based Fusion Network (dubbed SGFN), which enhances the images’/texts’ features through intra- and cross-modal fusion for image-text retrieval. To be specific, we design an intra-modal hierarchical attention fusion to incorporate semantic contexts, such as objects, attributes, and relationships, into images’/texts’ feature vectors via scene graphs, and a cross-modal attention fusion to combine the contextual semantics and local fusion via contextual vectors. Extensive experiments on public datasets Flickr30K and MSCOCO show that our SGFN performs better than quite a few SOTA image-text retrieval methods.
PDF

点此查看论文截图

木子已

https://ipaper.today/2023/03/21/2023-03-21-chang-jing-wen-ben-jian-ce-shi-bie/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源木子已 !

场景文本检测识别

I2I Translation

2023-03-21 I2I Translation

I2I Translation

Few-Shot

2023-03-21 Few-Shot

Few-Shot

场景文本检测识别

2023-03-21 更新

Scene Graph Based Fusion Network For Image-Text Retrieval

打赏用于支持本站流量费