2022-11-30 更新
Face Parsing with RoI Tanh-Warping
Authors:Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan
Face parsing computes pixel-wise label maps for different semantic components (e.g., hair, mouth, eyes) from face images. Existing face parsing literature have illustrated significant advantages by focusing on individual regions of interest (RoIs) for faces and facial components. However, the traditional crop-and-resize focusing mechanism ignores all contextual area outside the RoIs, and thus is not suitable when the component area is unpredictable, e.g. hair. Inspired by the physiological vision system of human, we propose a novel RoI Tanh-warping operator that combines the central vision and the peripheral vision together. It addresses the dilemma between a limited sized RoI for focusing and an unpredictable area of surrounding context for peripheral information. To this end, we propose a novel hybrid convolutional neural network for face parsing. It uses hierarchical local based method for inner facial components and global methods for outer facial components. The whole framework is simple and principled, and can be trained end-to-end. To facilitate future research of face parsing, we also manually relabel the training data of the HELEN dataset and will make it public. Experiments on both HELEN and LFW-PL benchmarks demonstrate that our method surpasses state-of-the-art methods.
PDF CVPR 2019
点此查看论文截图
A High-Efficiency Framework for Constructing Large-Scale Face Parsing Benchmark
Authors:Yinglu Liu, Hailin Shi, Yue Si, Hao Shen, Xiaobo Wang, Tao Mei
Face parsing, which is to assign a semantic label to each pixel in face images, has recently attracted increasing interest due to its huge application potentials. Although many face related fields (e.g., face recognition and face detection) have been well studied for many years, the existing datasets for face parsing are still severely limited in terms of the scale and quality, e.g., the widely used Helen dataset only contains 2,330 images. This is mainly because pixel-level annotation is a high cost and time-consuming work, especially for the facial parts without clear boundaries. The lack of accurate annotated datasets becomes a major obstacle in the progress of face parsing task. It is a feasible way to utilize dense facial landmarks to guide the parsing annotation. However, annotating dense landmarks on human face encounters the same issues as the parsing annotation. To overcome the above problems, in this paper, we develop a high-efficiency framework for face parsing annotation, which considerably simplifies and speeds up the parsing annotation by two consecutive modules. Benefit from the proposed framework, we construct a new Dense Landmark Guided Face Parsing (LaPa) benchmark. It consists of 22,000 face images with large variations in expression, pose, occlusion, etc. Each image is provided with accurate annotation of a 11-category pixel-level label map along with coordinates of 106-point landmarks. To the best of our knowledge, it is currently the largest public dataset for face parsing. To make full use of our LaPa dataset with abundant face shape and boundary priors, we propose a simple yet effective Boundary-Sensitive Parsing Network (BSPNet). Our network is taken as a baseline model on the proposed LaPa dataset, and meanwhile, it achieves the state-of-the-art performance on the Helen dataset without resorting to extra face alignment.
PDF
点此查看论文截图
RoI Tanh-polar Transformer Network for Face Parsing in the Wild
Authors:Yiming Lin, Jie Shen, Yujiang Wang, Maja Pantic
Face parsing aims to predict pixel-wise labels for facial components of a target face in an image. Existing approaches usually crop the target face from the input image with respect to a bounding box calculated during pre-processing, and thus can only parse inner facial Regions of Interest~(RoIs). Peripheral regions like hair are ignored and nearby faces that are partially included in the bounding box can cause distractions. Moreover, these methods are only trained and evaluated on near-frontal portrait images and thus their performance for in-the-wild cases has been unexplored. To address these issues, this paper makes three contributions. First, we introduce iBugMask dataset for face parsing in the wild, which consists of 21,866 training images and 1,000 testing images. The training images are obtained by augmenting an existing dataset with large face poses. The testing images are manually annotated with $11$ facial regions and there are large variations in sizes, poses, expressions and background. Second, we propose RoI Tanh-polar transform that warps the whole image to a Tanh-polar representation with a fixed ratio between the face area and the context, guided by the target bounding box. The new representation contains all information in the original image, and allows for rotation equivariance in the convolutional neural networks~(CNNs). Third, we propose a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and the Tanh-Cartesian space, allowing for receptive fields of different shapes in CNNs. Through extensive experiments, we show that the proposed method improves the state-of-the-art for face parsing in the wild and does not require facial landmarks for alignment.
PDF Accepted at Image and Vision Computing. Code is available on https://github.com/hhj1897/face_parsing
点此查看论文截图
How to Boost Face Recognition with StyleGAN?
Authors:Artem Sevastopolsky, Yury Malkov, Nikita Durasov, Luisa Verdoliva, Matthias Nießner
State-of-the-art face recognition systems require huge amounts of labeled training data. Given the priority of privacy in face recognition applications, the data is limited to celebrity web crawls, which have issues such as skewed distributions of ethnicities and limited numbers of identities. On the other hand, the self-supervised revolution in the industry motivates research on adaptation of the related techniques to facial recognition. One of the most popular practical tricks is to augment the dataset by the samples drawn from the high-resolution high-fidelity models (e.g. StyleGAN-like), while preserving the identity. We show that a simple approach based on fine-tuning an encoder for StyleGAN allows to improve upon the state-of-the-art facial recognition and performs better compared to training on synthetic face identities. We also collect large-scale unlabeled datasets with controllable ethnic constitution — AfricanFaceSet-5M (5 million images of different people) and AsianFaceSet-3M (3 million images of different people) and we show that pretraining on each of them improves recognition of the respective ethnicities (as well as also others), while combining all unlabeled datasets results in the biggest performance increase. Our self-supervised strategy is the most useful with limited amounts of labeled training data, which can be beneficial for more tailored face recognition tasks and when facing privacy concerns. Evaluation is provided based on a standard RFW dataset and a new large-scale RB-WebFace benchmark.
PDF
点此查看论文截图
FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild
Authors:Yiming Lin, Jie Shen, Yujiang Wang, Maja Pantic
Image-based age estimation aims to predict a person’s age from facial images. It is used in a variety of real-world applications. Although end-to-end deep models have achieved impressive results for age estimation on benchmark datasets, their performance in-the-wild still leaves much room for improvement due to the challenges caused by large variations in head pose, facial expressions, and occlusions. To address this issue, we propose a simple yet effective method to explicitly incorporate facial semantics into age estimation, so that the model would learn to correctly focus on the most informative facial components from unaligned facial images regardless of head pose and non-rigid deformation. To this end, we design a face parsing-based network to learn semantic information at different scales and a novel face parsing attention module to leverage these semantic features for age estimation. To evaluate our method on in-the-wild data, we also introduce a new challenging large-scale benchmark called IMDB-Clean. This dataset is created by semi-automatically cleaning the noisy IMDB-WIKI dataset using a constrained clustering method. Through comprehensive experiment on IMDB-Clean and other benchmark datasets, under both intra-dataset and cross-dataset evaluation protocols, we show that our method consistently outperforms all existing age estimation methods and achieves a new state-of-the-art performance. To the best of our knowledge, our work presents the first attempt of leveraging face parsing attention to achieve semantic-aware age estimation, which may be inspiring to other high level facial analysis tasks. Code and data are available on \url{https://github.com/ibug-group/fpage}.
PDF Accepted by Transactions of Image Processing. Code and data are available on https://github.com/ibug-group/fpage
点此查看论文截图
An Improved Lightweight YOLOv5 Model Based on Attention Mechanism for Face Mask Detection
Authors:Sheng Xu, Zhanyu Guo, Yuchi Liu, Jingwei Fan, Xuxu Liu
Coronavirus 2019 has brought severe challenges to social stability and public health worldwide. One effective way of curbing the epidemic is to require people to wear masks in public places and monitor mask-wearing states by utilizing suitable automatic detectors. However, existing deep learning based models struggle to simultaneously achieve the requirements of both high precision and real-time performance. To solve this problem, we propose an improved lightweight face mask detector based on YOLOv5, which can achieve an excellent balance of precision and speed. Firstly, a novel backbone ShuffleCANet that combines ShuffleNetV2 network with Coordinate Attention mechanism is proposed as the backbone. Afterwards, an efficient path aggression network BiFPN is applied as the feature fusion neck. Furthermore, the localization loss is replaced with alpha-CIoU in model training phase to obtain higher-quality anchors. Some valuable strategies such as data augmentation, adaptive image scaling, and anchor cluster operation are also utilized. Experimental results on AIZOO face mask dataset show the superiority of the proposed model. Compared with the original YOLOv5, the proposed model increases the inference speed by 28.3% while still improving the precision by 0.58%. It achieves the best mean average precision of 95.2% compared with other seven existing models, which is 4.4% higher than the baseline.
PDF Accepted as a conference paper at the 31st International Conference on Artificial Neural Networks (ICANN 2022). The final authenticated publication will be available in the Springer Lecture Notes in Computer Science (LNCS)
点此查看论文截图
DigiFace-1M: 1 Million Digital Face Images for Face Recognition
Authors:Gwangbin Bae, Martin de La Gorce, Tadas Baltrusaitis, Charlie Hewitt, Dong Chen, Julien Valentin, Roberto Cipolla, Jingjing Shen
State-of-the-art face recognition models show impressive accuracy, achieving over 99.8% on Labeled Faces in the Wild (LFW) dataset. Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet. Web-crawled face images are severely biased (in terms of race, lighting, make-up, etc) and often contain label noise. More importantly, the face images are collected without explicit consent, raising ethical concerns. To avoid such problems, we introduce a large-scale synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline. We first demonstrate that aggressive data augmentation can significantly reduce the synthetic-to-real domain gap. Having full control over the rendering pipeline, we also study how each attribute (e.g., variation in facial pose, accessories and textures) affects the accuracy. Compared to SynFace, a recent method trained on GAN-generated synthetic faces, we reduce the error rate on LFW by 52.5% (accuracy from 91.93% to 96.17%). By fine-tuning the network on a smaller number of real face images that could reasonably be obtained with consent, we achieve accuracy that is comparable to the methods trained on millions of real face images.
PDF WACV 2023
点此查看论文截图
Interlinked Convolutional Neural Networks for Face Parsing
Authors:Yisu Zhou, Xiaolin Hu, Bo Zhang
Face parsing is a basic task in face image analysis. It amounts to labeling each pixel with appropriate facial parts such as eyes and nose. In the paper, we present a interlinked convolutional neural network (iCNN) for solving this problem in an end-to-end fashion. It consists of multiple convolutional neural networks (CNNs) taking input in different scales. A special interlinking layer is designed to allow the CNNs to exchange information, enabling them to integrate local and contextual information efficiently. The hallmark of iCNN is the extensive use of downsampling and upsampling in the interlinking layers, while traditional CNNs usually uses downsampling only. A two-stage pipeline is proposed for face parsing and both stages use iCNN. The first stage localizes facial parts in the size-reduced image and the second stage labels the pixels in the identified facial parts in the original image. On a benchmark dataset we have obtained better results than the state-of-the-art methods.
PDF 11 pages, 4 figures, ISNN2015 Conference
点此查看论文截图
Prepended Domain Transformer: Heterogeneous Face Recognition without Bells and Whistles
Authors:Anjith George, Amir Mohammadi, Sebastien Marcel
Heterogeneous Face Recognition (HFR) refers to matching face images captured in different domains, such as thermal to visible images (VIS), sketches to visible images, near-infrared to visible, and so on. This is particularly useful in matching visible spectrum images to images captured from other modalities. Though highly useful, HFR is challenging because of the domain gap between the source and target domain. Often, large-scale paired heterogeneous face image datasets are absent, preventing training models specifically for the heterogeneous task. In this work, we propose a surprisingly simple, yet, very effective method for matching face images across different sensing modalities. The core idea of the proposed approach is to add a novel neural network block called Prepended Domain Transformer (PDT) in front of a pre-trained face recognition (FR) model to address the domain gap. Retraining this new block with few paired samples in a contrastive learning setup was enough to achieve state-of-the-art performance in many HFR benchmarks. The PDT blocks can be retrained for several source-target combinations using the proposed general framework. The proposed approach is architecture agnostic, meaning they can be added to any pre-trained FR models. Further, the approach is modular and the new block can be trained with a minimal set of paired samples, making it much easier for practical deployment. The source code and protocols will be made available publicly.
PDF 16 pages. Accepted for publication in IEEE TIFS
点此查看论文截图
Is Face Recognition Safe from Realizable Attacks?
Authors:Sanjay Saha, Terence Sim
Face recognition is a popular form of biometric authentication and due to its widespread use, attacks have become more common as well. Recent studies show that Face Recognition Systems are vulnerable to attacks and can lead to erroneous identification of faces. Interestingly, most of these attacks are white-box, or they are manipulating facial images in ways that are not physically realizable. In this paper, we propose an attack scheme where the attacker can generate realistic synthesized face images with subtle perturbations and physically realize that onto his face to attack black-box face recognition systems. Comprehensive experiments and analyses show that subtle perturbations realized on attackers face can create successful attacks on state-of-the-art face recognition systems in black-box settings. Our study exposes the underlying vulnerability posed by the Face Recognition Systems against realizable black-box attacks.
PDF 2020 IEEE International Joint Conference on Biometrics (IJCB)
点此查看论文截图
Edge-aware Graph Representation Learning and Reasoning for Face Parsing
Authors:Gusi Te, Yinglu Liu, Wei Hu, Hailin Shi, Tao Mei
Face parsing infers a pixel-wise label to each facial component, which has drawn much attention recently. Previous methods have shown their efficiency in face parsing, which however overlook the correlation among different face regions. The correlation is a critical clue about the facial appearance, pose, expression etc., and should be taken into account for face parsing. To this end, we propose to model and reason the region-wise relations by learning graph representations, and leverage the edge information between regions for optimized abstraction. Specifically, we encode a facial image onto a global graph representation where a collection of pixels (“regions”) with similar features are projected to each vertex. Our model learns and reasons over relations between the regions by propagating information across vertices on the graph. Furthermore, we incorporate the edge information to aggregate the pixel-wise features onto vertices, which emphasizes on the features around edges for fine segmentation along edges. The finally learned graph representation is projected back to pixel grids for parsing. Experiments demonstrate that our model outperforms state-of-the-art methods on the widely used Helen dataset, and also exhibits the superior performance on the large-scale CelebAMask-HQ and LaPa dataset. The code is available at https://github.com/tegusi/EAGRNet.
PDF ECCV 2020
点此查看论文截图
MorDeephy: Face Morphing Detection Via Fused Classification
Authors:Iurii Medvedev, Farhad Shadmand, Nuno Gonçalves
Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a sophisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these features. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a prominent ability for generalising the task of morphing detection to unseen scenarios.
PDF 10 pages, 5 figures, 4 tables
点此查看论文截图
Development of a face mask detection pipeline for mask-wearing monitoring in the era of the COVID-19 pandemic: A modular approach
Authors:Benjaphan Sommana, Ukrit Watchareeruetai, Ankush Ganguly, Samuel W. F. Earp, Taya Kitiyakara, Suparee Boonmanunt, Ratchainant Thammasudjarit
During the SARS-Cov-2 pandemic, mask-wearing became an effective tool to prevent spreading and contracting the virus. The ability to monitor the mask-wearing rate in the population would be useful for determining public health strategies against the virus. However, artificial intelligence technologies for detecting face masks have not been deployed at a large scale in real-life to measure the mask-wearing rate in public. In this paper, we present a two-step face mask detection approach consisting of two separate modules: 1) face detection and alignment and 2) face mask classification. This approach allowed us to experiment with different combinations of face detection and face mask classification modules. More specifically, we experimented with PyramidKey and RetinaFace as face detectors while maintaining a lightweight backbone for the face mask classification module. Moreover, we also provide a relabeled annotation of the test set of the AIZOO dataset, where we rectified the incorrect labels for some face images. The evaluation results on the AIZOO and Moxa 3K datasets showed that the proposed face mask detection pipeline surpassed the state-of-the-art methods. The proposed pipeline also yielded a higher mAP on the relabeled test set of the AIZOO dataset than the original test set. Since we trained the proposed model using in-the-wild face images, we can successfully deploy our model to monitor the mask-wearing rate using public CCTV images.
PDF Accepted at the 19th International Joint Conference on Computer Science and Software Engineering (JCSSE 2022)
点此查看论文截图
3D Face Parsing via Surface Parameterization and 2D Semantic Segmentation Network
Authors:Wenyuan Sun, Ping Zhou, Yangang Wang, Zongpu Yu, Jing Jin, Guangquan Zhou
Face parsing assigns pixel-wise semantic labels as the face representation for computers, which is the fundamental part of many advanced face technologies. Compared with 2D face parsing, 3D face parsing shows more potential to achieve better performance and further application, but it is still challenging due to 3D mesh data computation. Recent works introduced different methods for 3D surface segmentation, while the performance is still limited. In this paper, we propose a method based on the “3D-2D-3D” strategy to accomplish 3D face parsing. The topological disk-like 2D face image containing spatial and textural information is transformed from the sampled 3D face data through the face parameterization algorithm, and a specific 2D network called CPFNet is proposed to achieve the semantic segmentation of the 2D parameterized face data with multi-scale technologies and feature aggregation. The 2D semantic result is then inversely re-mapped to 3D face data, which finally achieves the 3D face parsing. Experimental results show that both CPFNet and the “3D-2D-3D” strategy accomplish high-quality 3D face parsing and outperform state-of-the-art 2D networks as well as 3D methods in both qualitative and quantitative comparisons.
PDF
点此查看论文截图
Face Morphing Attacks and Face Image Quality: The Effect of Morphing and the Unsupervised Attack Detection by Quality
Authors:Biying Fu, Naser Damer
Morphing attacks are a form of presentation attacks that gathered increasing attention in recent years. A morphed image can be successfully verified to multiple identities. This operation, therefore, poses serious security issues related to the ability of a travel or identity document to be verified to belong to multiple persons. Previous works touched on the issue of the quality of morphing attack images, however, with the main goal of quantitatively proofing the realistic appearance of the produced morphing attacks. We theorize that the morphing processes might have an effect on both, the perceptual image quality and the image utility in face recognition (FR) when compared to bona fide samples. Towards investigating this theory, this work provides an extensive analysis of the effect of morphing on face image quality, including both general image quality measures and face image utility measures. This analysis is not limited to a single morphing technique, but rather looks at six different morphing techniques and five different data sources using ten different quality measures. This analysis reveals consistent separability between the quality scores of morphing attack and bona fide samples measured by certain quality measures. Our study goes further to build on this effect and investigate the possibility of performing unsupervised morphing attack detection (MAD) based on quality scores. Our study looks intointra and inter-dataset detectability to evaluate the generalizability of such a detection concept on different morphing techniques and bona fide sources. Our final results point out that a set of quality measures, such as MagFace and CNNNIQA, can be used to perform unsupervised and generalized MAD with a correct classification accuracy of over 70%.
PDF accepted at IET Biometrics journal
点此查看论文截图
Face Morphing Attack Detection Using Privacy-Aware Training Data
Authors:Marija Ivanovska, Andrej Kronovšek, Peter Peer, Vitomir Štruc, Borut Batagelj
Images of morphed faces pose a serious threat to face recognition—based security systems, as they can be used to illegally verify the identity of multiple people with a single morphed image. Modern detection algorithms learn to identify such morphing attacks using authentic images of real individuals. This approach raises various privacy concerns and limits the amount of publicly available training data. In this paper, we explore the efficacy of detection algorithms that are trained only on faces of non—existing people and their respective morphs. To this end, two dedicated algorithms are trained with synthetic data and then evaluated on three real-world datasets, i.e.: FRLL-Morphs, FERET-Morphs and FRGC-Morphs. Our results show that synthetic facial images can be successfully employed for the training process of the detection algorithms and generalize well to real-world scenarios.
PDF
点此查看论文截图
Face Parsing via Recurrent Propagation
Authors:Sifei Liu, Jianping Shi, Ji Liang, Ming-Hsuan Yang
Face parsing is an important problem in computer vision that finds numerous applications including recognition and editing. Recently, deep convolutional neural networks (CNNs) have been applied to image parsing and segmentation with the state-of-the-art performance. In this paper, we propose a face parsing algorithm that combines hierarchical representations learned by a CNN, and accurate label propagations achieved by a spatially variant recurrent neural network (RNN). The RNN-based propagation approach enables efficient inference over a global space with the guidance of semantic edges generated by a local convolutional model. Since the convolutional architecture can be shallow and the spatial RNN can have few parameters, the framework is much faster and more light-weighted than the state-of-the-art CNNs for the same task. We apply the proposed model to coarse-grained and fine-grained face parsing. For fine-grained face parsing, we develop a two-stage approach by first identifying the main regions and then segmenting the detail components, which achieves better performance in terms of accuracy and efficiency. With a single GPU, the proposed algorithm parses face images accurately at 300 frames per second, which facilitates real-time applications.
PDF 10 pages, 5 figures, BMVC 2017