2022-10-21 更新
Robustcaps: a transformation-robust capsule network for image classification
Authors:Sai Raam Venkataraman, S. Balasubramanian, R. Raghunatha Sarma
Geometric transformations of the training data as well as the test data present challenges to the use of deep neural networks to vision-based learning tasks. In order to address this issue, we present a deep neural network model that exhibits the desirable property of transformation-robustness. Our model, termed RobustCaps, uses group-equivariant convolutions in an improved capsule network model. RobustCaps uses a global context-normalised procedure in its routing algorithm to learn transformation-invariant part-whole relationships within image data. This learning of such relationships allows our model to outperform both capsule and convolutional neural network baselines on transformation-robust classification tasks. Specifically, RobustCaps achieves state-of-the-art accuracies on CIFAR-10, FashionMNIST, and CIFAR-100 when the images in these datasets are subjected to train and test-time rotations and translations.
PDF
点此查看论文截图
Uni6Dv3: 5D Anchor Mechanism for 6D Pose Estimation
Authors:Jianqiu Chen, Mingshan Sun, Ye Zheng, Tianpeng Bao, Zhenyu He, Donghai Li, Guoqiang Jin, Rui Zhao, Liwei Wu
Unlike indirect methods that usually require time-consuming post-processing, recent deep learning-based direct methods for 6D pose estimation try to predict the 3D rotation and 3D translation from RGB-D data directly. However, direct methods, regressing the absolute translation of the pose, suffer from diverse object translation distribution between training and test data, which is usually caused by expensive data collection and annotation in practice. To this end, we propose a 5D anchor mechanism by defining the anchor with 3D coordinates in the physical space and 2D coordinates in the image plane. Inspired by anchor-based object detection methods, 5D anchor regresses the offset between the target and anchor, which eliminates the distribution gap and transforms the regression target to a small range. But regressing offset leads to the mismatch between the absolute input and relative output. We build an anchor-based projection model by replacing the absolute input with the relative one, which further improves the performance. By plugging 5D anchor into the latest direct methods, Uni6Dv2 and ES6D obtain 38.7% and 3.5% improvement, respectively. Specifically, Uni6Dv2+5D anchor, dubbed Uni6Dv3, achieves state-of-the-art overall results on datasets including Occlusion LineMOD (79.3%), LineMOD (99.5%), and YCB-Video datasets (91.5%), and requires only 10% of training data to reach comparable performance as full data.
PDF