2022-04-26 更新
Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge
Authors:Sharib Ali, Noha Ghatwary, Debesh Jha, Ece Isik-Polat, Gorkem Polat, Chen Yang, Wuyang Li, Adrian Galdran, Miguel-Ángel González Ballester, Vajira Thambawita, Steven Hicks, Sahadev Poudel, Sang-Woong Lee, Ziyi Jin, Tianyuan Gan, ChengHui Yu, JiangPeng Yan, Doyeob Yeo, Hyunseok Lee, Nikhil Kumar Tomar, Mahmood Haithmi, Amr Ahmed, Michael A. Riegler, Christian Daul, Pål Halvorsen, Jens Rittscher, Osama E. Salem, Dominique Lamarque, Renato Cannizzaro, Stefano Realdon, Thomas de Lange, James E. East
Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, location, and surface largely affect identification, localisation, and characterisation. Moreover, colonoscopic surveillance and removal of polyps (referred to as polypectomy ) are highly operator-dependent procedures. There exist a high missed detection rate and incomplete removal of colonic polyps due to their variable nature, the difficulties to delineate the abnormality, the high recurrence rates, and the anatomical topography of the colon. There have been several developments in realising automated methods for both detection and segmentation of these polyps using machine learning. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets that come from different centres, modalities and acquisition systems. To test this hypothesis rigorously we curated a multi-centre and multi-population dataset acquired from multiple colonoscopy systems and challenged teams comprising machine learning experts to develop robust automated detection and segmentation methods as part of our crowd-sourcing Endoscopic computer vision challenge (EndoCV) 2021. In this paper, we analyse the detection results of the four top (among seven) teams and the segmentation results of the five top teams (among 16). Our analyses demonstrate that the top-ranking teams concentrated on accuracy (i.e., accuracy > 80% on overall Dice score on different validation sets) over real-time performance required for clinical applicability. We further dissect the methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets.
PDF 26 pages
论文截图
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression
Authors:Usma Niyaz, Deepti R. Bathula
Knowledge distillation (KD) is an effective model compression technique where a compact student network is taught to mimic the behavior of a complex and highly trained teacher network. In contrast, Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge, even in the absence of a powerful but static teacher network. Motivated by these findings, we propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance. Furthermore, an online distillation strategy is utilized to train the teacher and students simultaneously. To evaluate the performance of the proposed approach, extensive experiments were conducted using three different versions of teacher-student networks on benchmark biomedical classification (MSI vs. MSS) and object detection (Polyp Detection) tasks. Ensemble of student networks trained in the proposed manner achieved better results than the ensemble of students trained using KD or ML individually, establishing the benefit of augmenting knowledge transfer from teacher to students with peer-to-peer learning between students.
PDF changed the format of paper
论文截图
Detecting, Localising and Classifying Polyps from Colonoscopy Videos using Deep Learning
Authors:Yu Tian, Leonardo Zorron Cheng Tao Pu, Yuyuan Liu, Gabriel Maicas, Johan W. Verjans, Alastair D. Burt, Seon Ho Shin, Rajvinder Singh, Gustavo Carneiro
In this paper, we propose and analyse a system that can automatically detect, localise and classify polyps from colonoscopy videos. The detection of frames with polyps is formulated as a few-shot anomaly classification problem, where the training set is highly imbalanced with the large majority of frames consisting of normal images and a small minority comprising frames with polyps. Colonoscopy videos may contain blurry images and frames displaying feces and water jet sprays to clean the colon — such frames can mistakenly be detected as anomalies, so we have implemented a classifier to reject these two types of frames before polyp detection takes place. Next, given a frame containing a polyp, our method localises (with a bounding box around the polyp) and classifies it into five different classes. Furthermore, we study a method to improve the reliability and interpretability of the classification result using uncertainty estimation and classification calibration. Classification uncertainty and calibration not only help improve classification accuracy by rejecting low-confidence and high-uncertain results, but can be used by doctors to decide how to decide on the classification of a polyp. All the proposed detection, localisation and classification methods are tested using large data sets and compared with relevant baseline approaches.
PDF Preprint to submit to IEEE journals
论文截图
Stack of discriminative autoencoders for multiclass anomaly detection in endoscopy images
Authors:Mohammad Reza Mohebbian, Khan A. Wahid, Paul Babyn
Wireless Capsule Endoscopy (WCE) helps physicians examine the gastrointestinal (GI) tract noninvasively. There are few studies that address pathological assessment of endoscopy images in multiclass classification and most of them are based on binary anomaly detection or aim to detect a specific type of anomaly. Multiclass anomaly detection is challenging, especially when the dataset is poorly sampled or imbalanced. Many available datasets in endoscopy field, such as KID2, suffer from an imbalance issue, which makes it difficult to train a high-performance model. Additionally, increasing the number of classes makes classification more difficult. We proposed a multiclass classification algorithm that is extensible to any number of classes and can handle an imbalance issue. The proposed method uses multiple autoencoders where each one is trained on one class to extract features with the most discrimination from other classes. The loss function of autoencoders is set based on reconstruction, compactness, distance from other classes, and Kullback-Leibler (KL) divergence. The extracted features are clustered and then classified using an ensemble of support vector data descriptors. A total of 1,778 normal, 227 inflammation, 303 vascular, and 44 polyp images from the KID2 dataset are used for evaluation. The entire algorithm ran 5 times and achieved F1-score of 96.3 +- 0.2% and 85.0 +- 0.4% on the test set for binary and multiclass anomaly detection, respectively. The impact of each step of the algorithm was investigated by various ablation studies and the results were compared with published works. The suggested approach is a competitive option for detecting multiclass anomalies in the GI field.
PDF
论文截图
An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning applied to Gastrointestinal Tract Abnormality Classification
Authors:Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, Håvard D. Johansen, Dag Johansen, Pål Halvorsen, Michael A. Riegler
Precise and efficient automated identification of Gastrointestinal (GI) tract diseases can help doctors treat more patients and improve the rate of disease detection and identification. Currently, automatic analysis of diseases in the GI tract is a hot topic in both computer science and medical-related journals. Nevertheless, the evaluation of such an automatic analysis is often incomplete or simply wrong. Algorithms are often only tested on small and biased datasets, and cross-dataset evaluations are rarely performed. A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level. Towards this goal, we present comprehensive evaluations of five distinct machine learning models using Global Features and Deep Neural Networks that can classify 16 different key types of GI tract conditions, including pathological findings, anatomical landmarks, polyp removal conditions, and normal findings from images captured by common GI tract examination instruments. In our evaluation, we introduce performance hexagons using six performance metrics such as recall, precision, specificity, accuracy, F1-score, and Matthews Correlation Coefficient to demonstrate how to determine the real capabilities of models rather than evaluating them shallowly. Furthermore, we perform cross-dataset evaluations using different datasets for training and testing. With these cross-dataset evaluations, we demonstrate the challenge of actually building a generalizable model that could be used across different hospitals. Our experiments clearly show that more sophisticated performance metrics and evaluation methods need to be applied to get reliable models rather than depending on evaluations of the splits of the same dataset, i.e., the performance metrics should always be interpreted together rather than relying on a single metric.
PDF 30 pages, 12 figures, 8 tables, Accepted for ACM Transactions on Computing for Healthcare
论文截图
Learn like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification
Authors:Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Mustafa Nasir-Moin, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour
Applying curriculum learning requires both a range of difficulty in data and a method for determining the difficulty of examples. In many tasks, however, satisfying these requirements can be a formidable challenge. In this paper, we contend that histopathology image classification is a compelling use case for curriculum learning. Based on the nature of histopathology images, a range of difficulty inherently exists among examples, and, since medical datasets are often labeled by multiple annotators, annotator agreement can be used as a natural proxy for the difficulty of a given example. Hence, we propose a simple curriculum learning method that trains on progressively-harder images as determined by annotator agreement. We evaluate our hypothesis on the challenging and clinically-important task of colorectal polyp classification. Whereas vanilla training achieves an AUC of 83.7% for this task, a model trained with our proposed curriculum learning approach achieves an AUC of 88.2%, an improvement of 4.5%. Our work aims to inspire researchers to think more creatively and rigorously when choosing contexts for applying curriculum learning.
PDF
论文截图
On the Efficiency of Subclass Knowledge Distillation in Classification Tasks
Authors:Ahmad Sajedi, Konstantinos N. Plataniotis
This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection (two classes) the amount of information transferred from the teacher to the student network is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information about possible subclasses within the available classes in the classification task. To that end, we propose the so-called Subclass Knowledge Distillation (SKD) framework, which is the process of transferring the subclasses’ prediction knowledge from a large teacher model into a smaller student one. Through SKD, additional meaningful information which is not in the teacher’s class logits but exists in subclasses (e.g., similarities inside classes) will be conveyed to the student and boost its performance. Mathematically, we measure how many extra information bits the teacher can provide for the student via SKD framework. The framework developed is evaluated in clinical application, namely colorectal polyp binary classification. In this application, clinician-provided annotations are used to define subclasses based on the annotation label’s variability in a curriculum style of learning. A lightweight, low complexity student trained with the proposed framework achieves an F1-score of 85.05%, an improvement of 2.14% and 1.49% gain over the student that trains without and with conventional knowledge distillation, respectively. These results show that the extra subclasses’ knowledge (i.e., 0.4656 label bits per training sample in our experiment) can provide more information about the teacher generalization, and therefore SKD can benefit from using more information to increase the student performance.
PDF Changing the material of the paper. I will resubmitted again after correction