2022-09-24 更新
CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection
Authors:Swati Jindal, Xin Eric Wang
The robustness of gaze and head pose estimation models is highly dependent on the amount of labeled data. Recently, generative modeling has shown excellent results in generating photo-realistic images, which can alleviate the need for annotations. However, adopting such generative models to new domains while maintaining their ability to provide fine-grained control over different image attributes, \eg, gaze and head pose directions, has been a challenging problem. This paper proposes CUDA-GHR, an unsupervised domain adaptation framework that enables fine-grained control over gaze and head pose directions while preserving the appearance-related factors of the person. Our framework simultaneously learns to adapt to new domains and disentangle visual attributes such as appearance, gaze direction, and head orientation by utilizing a label-rich source domain and an unlabeled target domain. Extensive experiments on the benchmarking datasets show that the proposed method can outperform state-of-the-art techniques on both quantitative and qualitative evaluations. Furthermore, we demonstrate the effectiveness of generated image-label pairs in the target domain for pretraining networks for the downstream task of gaze and head pose estimation. The source code and pre-trained models are available at https://github.com/jswati31/cuda-ghr.
PDF Accepted at WACV2023, Camera-ready version
点此查看论文截图
Domain Adaptive Semantic Segmentation via Regional Contrastive Consistency Regularization
Authors:Qianyu Zhou, Chuyun Zhuang, Ran Yi, Xuequan Lu, Lizhuang Ma
Unsupervised domain adaptation (UDA) for semantic segmentation has been well-studied in recent years. However, most existing works largely neglect the local regional consistency across different domains and are less robust to changes in outdoor environments. In this paper, we propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation. Our core idea is to pull the similar regional features extracted from the same location of different images, i.e., the original image and augmented image, to be closer, and meanwhile push the features from the different locations of the two images to be separated. We innovatively propose a region-wise contrastive loss with two sampling strategies to realize effective regional consistency. Besides, we present momentum projection heads, where the teacher projection head is the exponential moving average of the student. Finally, a memory bank mechanism is designed to learn more robust and stable region-wise features under varying environments. Extensive experiments on two common UDA benchmarks, i.e., GTAV to Cityscapes and SYNTHIA to Cityscapes, demonstrate that our approach outperforms the state-of-the-art methods.
PDF Accepted to IEEE International Conference on Multimedia and Expo (ICME), 2022
点此查看论文截图
Generative Domain Adaptation for Face Anti-Spoofing
Authors:Qianyu Zhou, Ke-Yue Zhang, Taiping Yao, Ran Yi, Kekai Sheng, Shouhong Ding, Lizhuang Ma
Face anti-spoofing (FAS) approaches based on unsupervised domain adaption (UDA) have drawn growing attention due to promising performances for target scenarios. Most existing UDA FAS methods typically fit the trained models to the target domain via aligning the distribution of semantic high-level features. However, insufficient supervision of unlabeled target domains and neglect of low-level feature alignment degrade the performances of existing methods. To address these issues, we propose a novel perspective of UDA FAS that directly fits the target data to the models, i.e., stylizes the target data to the source-domain style via image translation, and further feeds the stylized data into the well-trained source model for classification. The proposed Generative Domain Adaptation (GDA) framework combines two carefully designed consistency constraints: 1) Inter-domain neural statistic consistency guides the generator in narrowing the inter-domain gap. 2) Dual-level semantic consistency ensures the semantic quality of stylized images. Besides, we propose intra-domain spectrum mixup to further expand target data distributions to ensure generalization and reduce the intra-domain gap. Extensive experiments and visualizations demonstrate the effectiveness of our method against the state-of-the-art methods.
PDF Accepted to European Conference on Computer Vision (ECCV), 2022
点此查看论文截图
Rethinking Unsupervised Domain Adaptation for Semantic Segmentation
Authors:Zhijie Wang, Masanori Suganuma, Takayuki Okatani
Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}$\rightarrow$Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.
PDF
点此查看论文截图
AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection
Authors:Yipeng Gao, Lingxiao Yang, Yunmu Huang, Song Xie, Shiyong Li, Wei-shi Zheng
Under the domain shift, cross-domain few-shot object detection aims to adapt object detectors in the target domain with a few annotated target data. There exists two significant challenges: (1) Highly insufficient target domain data; (2) Potential over-adaptation and misleading caused by inappropriately amplified target samples without any restriction. To address these challenges, we propose an adaptive method consisting of two parts. First, we propose an adaptive optimization strategy to select augmented data similar to target samples rather than blindly increasing the amount. Specifically, we filter the augmented candidates which significantly deviate from the target feature distribution in the very beginning. Second, to further relieve the data limitation, we propose the multi-level domain-aware data augmentation to increase the diversity and rationality of augmented data, which exploits the cross-image foreground-background mixture. Experiments show that the proposed method achieves state-of-the-art performance on multiple benchmarks.
PDF Accepted in ECCV 2022
点此查看论文截图
Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object Detection
Authors:Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall
Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN.
PDF
点此查看论文截图
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Authors:Wenhan Xiong, Anchit Gupta, Shubham Toshniwal, Yashar Mehdad, Wen-tau Yih
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline — model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying length. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes.
PDF
点此查看论文截图
Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccines
Authors:Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan
Covid-19 has spread across the world and several vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pre-trained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art pre-trained transformer models RoBERTa, XLNet and BERT, and the domain-specific transformer models CT-BERT and BERTweet that are pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using Language Model based Oversampling Technique (LMOTE) to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of domain-specific transformer models for the classification task.
PDF The paper has been accepted for publication in Computer Science journal: http://journals.agh.edu.pl/csci
点此查看论文截图
CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains
Authors:Julian Gebele, Bonifaz Stuhr, Johann Haselberger
Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the evaluated domain adaptation methods are high compared to those of fully supervised baselines. This affirms the need for benchmarks such as CARLANE to further strengthen research in Unsupervised Domain Adaptation for lane detection. CARLANE, all evaluated models and the corresponding implementations are publicly available at https://carlanebenchmark.github.io.
PDF 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks, 22 pages, 11 figures
FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection
Authors:Xinli Xu, Shaocong Dong, Lihe Ding, Jie Wang, Tingfa Xu, Jianan Li
3D object detection with multi-sensors is essential for an accurate and reliable perception system of autonomous driving and robotics. Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm which merely relies on LiDAR point clouds for 3D proposal refinement. Though impressive, the sparsity of point clouds, especially for the points far away, making it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.To address this problem, we propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI). FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from camera in a unified attention mechanism. Specifically, it first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step; then leverages an intra-modality self-attention to enhance the domain-specific features, following by a well-designed cross-attention to fuse the information from two modalities.FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors.Remarkably, FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on Waymo, and outperforms competing two-stage approaches. Code will be released soon at https://github.com/xxlbigbrother/Fusion-RCNN.
PDF 7 pages, 3 figures
点此查看论文截图
Discrepancy-Based Active Learning for Domain Adaptation
Authors:Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis
The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images.
PDF 32 pages, 15 figures
点此查看论文截图
AirFi: Empowering WiFi-based Passive Human Gesture Recognition to Unseen Environment via Domain Generalization
Authors:Dazhuo Wang, Jianfei Yang, Wei Cui, Lihua Xie, Sumei Sun
WiFi-based smart human sensing technology enabled by Channel State Information (CSI) has received great attention in recent years. However, CSI-based sensing systems suffer from performance degradation when deployed in different environments. Existing works solve this problem by domain adaptation using massive unlabeled high-quality data from the new environment, which is usually unavailable in practice. In this paper, we propose a novel augmented environment-invariant robust WiFi gesture recognition system named AirFi that deals with the issue of environment dependency from a new perspective. The AirFi is a novel domain generalization framework that learns the critical part of CSI regardless of different environments and generalizes the model to unseen scenarios, which does not require collecting any data for adaptation to the new environment. AirFi extracts the common features from several training environment settings and minimizes the distribution differences among them. The feature is further augmented to be more robust to environments. Moreover, the system can be further improved by few-shot learning techniques. Compared to state-of-the-art methods, AirFi is able to work in different environment settings without acquiring any CSI data from the new environment. The experimental results demonstrate that our system remains robust in the new environment and outperforms the compared systems.
PDF 11 pages, 3 figures, paper under review
点此查看论文截图
Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Authors:Lin Chen, Zhixiang Wei, Xin Jin, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin
In unsupervised domain adaptation (UDA), directly adapting from the source to the target domain usually suffers significant discrepancies and leads to insufficient alignment. Thus, many UDA works attempt to vanish the domain gap gradually and softly via various intermediate spaces, dubbed domain bridging (DB). However, for dense prediction tasks such as domain adaptive semantic segmentation (DASS), existing solutions have mostly relied on rough style transfer and how to elegantly bridge domains is still under-explored. In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space. At the heart of DDB lies a dual-path domain bridging step for generating two intermediate domains using the coarse-wise and the fine-wise data mixing techniques, alongside a cross-path knowledge distillation step for taking two complementary models trained on generated intermediate samples as ‘teachers’ to develop a superior ‘student’ in a multi-teacher distillation manner. These two optimization steps work in an alternating way and reinforce each other to give rise to DDB with strong adaptation power. Extensive experiments on adaptive segmentation tasks with different settings demonstrate that our DDB significantly outperforms state-of-the-art methods. Code is available at https://github.com/xiaoachen98/DDB.git.
PDF Accepted at NeurIPS2022
点此查看论文截图
3D-PL: Domain Adaptive Depth Estimation with 3D-aware Pseudo-Labeling
Authors:Yu-Ting Yen, Chia-Ni Lu, Wei-Chen Chiu, Yi-Hsuan Tsai
For monocular depth estimation, acquiring ground truths for real data is not easy, and thus domain adaptation methods are commonly adopted using the supervised synthetic data. However, this may still incur a large domain gap due to the lack of supervision from the real data. In this paper, we develop a domain adaptation framework via generating reliable pseudo ground truths of depth from real data to provide direct supervisions. Specifically, we propose two mechanisms for pseudo-labeling: 1) 2D-based pseudo-labels via measuring the consistency of depth predictions when images are with the same content but different styles; 2) 3D-aware pseudo-labels via a point cloud completion network that learns to complete the depth values in the 3D space, thus providing more structural information in a scene to refine and generate more reliable pseudo-labels. In experiments, we show that our pseudo-labeling methods improve depth estimation in various settings, including the usage of stereo pairs during training. Furthermore, the proposed method performs favorably against several state-of-the-art unsupervised domain adaptation approaches in real-world datasets.
PDF Accepted in ECCV 2022. Project page: https://ccc870206.github.io/3D-PL/
点此查看论文截图
Noise transfer for unsupervised domain adaptation of retinal OCT images
Authors:Valentin Koch, Olle Holmberg, Hannah Spitzer, Johannes Schiefelbein, Ben Asani, Michael Hafner, Fabian J Theis
Optical coherence tomography (OCT) imaging from different camera devices causes challenging domain shifts and can cause a severe drop in accuracy for machine learning models. In this work, we introduce a minimal noise adaptation method based on a singular value decomposition (SVDNA) to overcome the domain gap between target domains from three different device manufacturers in retinal OCT imaging. Our method utilizes the difference in noise structure to successfully bridge the domain gap between different OCT devices and transfer the style from unlabeled target domain images to source images for which manual annotations are available. We demonstrate how this method, despite its simplicity, compares or even outperforms state-of-the-art unsupervised domain adaptation methods for semantic segmentation on a public OCT dataset. SVDNA can be integrated with just a few lines of code into the augmentation pipeline of any network which is in contrast to many state-of-the-art domain adaptation methods which often need to change the underlying model architecture or train a separate style transfer model. The full code implementation for SVDNA is available at https://github.com/ValentinKoch/SVDNA.
PDF published at MICCAI 2022
点此查看论文截图
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Authors:Christian Steinparz, Thomas Schmied, Fabian Paischer, Marius-Constantin Dinu, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-zadeh, Sepp Hochreiter
In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.
PDF CoLLAs 2022
点此查看论文截图
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
Authors:Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr
State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.
PDF COLING 2022 pre-print
点此查看论文截图
Domain Adaptation for Question Answering via Question Classification
Authors:Zhenrui Yue, Huimin Zeng, Ziyi Kou, Lanyu Shang, Dong Wang
Question answering (QA) has demonstrated impressive progress in answering questions from customized domains. Nevertheless, domain adaptation remains one of the most elusive challenges for QA systems, especially when QA systems are trained in a source domain but deployed in a different target domain. In this work, we investigate the potential benefits of question classification for QA domain adaptation. We propose a novel framework: Question Classification for Question Answering (QC4QA). Specifically, a question classifier is adopted to assign question classes to both the source and target data. Then, we perform joint training in a self-supervised fashion via pseudo-labeling. For optimization, inter-domain discrepancy between the source and target domain is reduced via maximum mean discrepancy (MMD) distance. We additionally minimize intra-class discrepancy among QA samples of the same question class for fine-grained adaptation performance. To the best of our knowledge, this is the first work in QA domain adaptation to leverage question classification with self-supervised adaptation. We demonstrate the effectiveness of the proposed QC4QA with consistent improvements against the state-of-the-art baselines on multiple datasets.
PDF Accepted to COLING 2022
点此查看论文截图
Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation
Authors:Zhedong Zheng, Yi Yang
Domain adaptation is to transfer the shared knowledge learned from the source domain to a new environment, i.e., target domain. One common practice is to train the model on both labeled source-domain data and unlabeled target-domain data. Yet the learned models are usually biased due to the strong supervision of the source domain. Most researchers adopt the early-stopping strategy to prevent over-fitting, but when to stop training remains a challenging problem since the lack of the target-domain validation set. In this paper, we propose one efficient bootstrapping method, called Adaboost Student, explicitly learning complementary models during training and liberating users from empirical early stopping. Adaboost Student combines the deep model learning with the conventional training strategy, i.e., adaptive boosting, and enables interactions between learned models and the data sampler. We adopt one adaptive data sampler to progressively facilitate learning on hard samples and aggregate “weak” models to prevent over-fitting. Extensive experiments show that (1) Without the need to worry about the stopping time, AdaBoost Student provides one robust solution by efficient complementary model learning during training. (2) AdaBoost Student is orthogonal to most domain adaptation methods, which can be combined with existing approaches to further improve the state-of-the-art performance. We have achieved competitive results on three widely-used scene segmentation domain adaptation benchmarks.
PDF 11 pages, 9 tables, 5 figures
点此查看论文截图
A Novel Framework based on Unknown Estimation via Principal Sub-space for Universal Domain Adaption
Authors:Yifan Wang, Lin Zhang, Ran Song, Lin Ma, Wei Zhang
Universal domain adaptation (UniDA) aims to transfer the knowledge of common classes from source domain to target domain without any prior knowledge on the label set, which requires to distinguish the unknown samples from the known ones in the target domain. Like the traditional unsupervised domain adaptation problem, the misalignment between two domains exists due to the biased and less-discriminative embedding. Recent methods proposed to complete the domain misalignment by clustering target samples with the nearest neighbors or the prototypes. However, it is dangerous to do so since we do not have any prior knowledge about the distributions of unknown samples which can magnify the misalignment especially when the unknown set is big. Meanwhile, other existing classifier-based methods could easily produce overconfident predictions of unknown samples because of the supervised objective in source domain leading the whole model to be biased towards the common classes in the target domain. Therefore, we propose a novel non-parameter unknown samples detection method based on mapping the samples in the original feature space into a reliable linear sub-space which makes data points more sparse to reduce the misalignment between unknown samples and source samples. Moreover, unlike the recent methods applying extra parameters to improve the classification of unknown samples, this paper well balances the confidence values of both known and unknown samples through an unknown-adaptive margin loss which can control the gradient updating of the classifier learning on supervised source samples depending on the confidence level of detected unknown samples at current step. Finally, experiments on four public datasets demonstrate that our method significantly outperforms existing state-of-the-art methods.
PDF 13 pages. arXiv admin note: text overlap with arXiv:2207.09280
点此查看论文截图
Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions
Authors:Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng
Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg). However, the recent MedISeg publications usually focus on presentations of the major contributions (e.g., network architectures, training strategies, and loss functions) while unwittingly ignoring some marginal implementation details (also known as “tricks”), leading to a potential problem of the unfair experimental result comparisons. In this paper, we collect a series of MedISeg tricks for different model implementation phases (i.e., pre-training model, data pre-processing, data augmentation, model implementation, model inference, and result post-processing), and experimentally explore the effectiveness of these tricks on the consistent baseline models. Compared to paper-driven surveys that only blandly focus on the advantages and limitation analyses of segmentation models, our work provides a large number of solid experiments and is more technically operable. With the extensive experimental results on both the representative 2D and 3D medical image datasets, we explicitly clarify the effect of these tricks. Moreover, based on the surveyed tricks, we also open-sourced a strong MedISeg repository, where each of its components has the advantage of plug-and-play. We believe that this milestone work not only completes a comprehensive and complementary survey of the state-of-the-art MedISeg approaches, but also offers a practical guide for addressing the future medical image processing challenges including but not limited to small dataset learning, class imbalance learning, multi-modality learning, and domain adaptation. The code has been released at: https://github.com/hust-linyi/MedISeg
PDF Under consideration