Authors:Yan Ju, Shan Jia, Jialing Cai, Haiying Guan, Siwei Lyu
With the rapid development of the deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of human face are now of such high qualities that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: global branch to extract multi-scale semantic features and local branch to select informative patches for detailed local artifacts extraction. Due to the lack of face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF$^3$), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.