2022-11-05 更新
Clenshaw Graph Neural Networks
Authors:Yuhe Guo, Zhewei Wei
Graph Convolutional Networks (GCNs), which use a message-passing paradigm with stacked convolution layers, are foundational methods for learning graph representations. Recent GCN models use various residual connection techniques to alleviate the model degradation problem such as over-smoothing and gradient vanishing. Existing residual connection techniques, however, fail to make extensive use of underlying graph structure as in the graph spectral domain, which is critical for obtaining satisfactory results on heterophilic graphs. In this paper, we introduce ClenshawGCN, a GNN model that employs the Clenshaw Summation Algorithm to enhance the expressiveness of the GCN model. ClenshawGCN equips the standard GCN model with two straightforward residual modules: the adaptive initial residual connection and the negative second-order residual connection. We show that by adding these two residual modules, ClenshawGCN implicitly simulates a polynomial filter under the Chebyshev basis, giving it at least as much expressive power as polynomial spectral GNNs. In addition, we conduct comprehensive experiments to demonstrate the superiority of our model over spatial and spectral GNN models.
PDF 10 pages, 2 figures
点此查看论文截图
Synthesizing Programs with Continuous Optimization
Authors:Shantanu Mandal, Todd A. Anderson, Javier Turek, Justin Gottschlich, Abdullah Muzahid
Automatic software generation based on some specification is known as program synthesis. Most existing approaches formulate program synthesis as a search problem with discrete parameters. In this paper, we present a novel formulation of program synthesis as a continuous optimization problem and use a state-of-the-art evolutionary approach, known as Covariance Matrix Adaptation Evolution Strategy to solve the problem. We then propose a mapping scheme to convert the continuous formulation into actual programs. We compare our system, called GENESYS, with several recent program synthesis techniques (in both discrete and continuous domains) and show that GENESYS synthesizes more programs within a fixed time budget than those existing schemes. For example, for programs of length 10, GENESYS synthesizes 28% more programs than those existing schemes within the same time budget.
PDF
点此查看论文截图
Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
Authors:Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, Yuqi Zhu, Zhi Jin
Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS
PDF NeurIPS 2022
点此查看论文截图
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Authors:Shuhao Gu, Bojie Hu, Yang Feng
This paper considers continual learning of large-scale pretrained neural machine translation model without accessing the previous training data or introducing model separation. We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem and cannot always achieve a good balance between the previous and new tasks. To solve the problem, we propose a two-stage training method based on the local features of the real loss. We first search low forgetting risk regions, where the model can retain the performance on the previous task as the parameters are updated, to avoid the catastrophic forgetting problem. Then we can continually train the model within this region only with the new training data to fit the new task. Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively. We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements compared with several strong baselines.
PDF EMNLP 2020 Main Conference Long Paper
点此查看论文截图
$N$-gram Is Back: Residual Learning of Neural Text Generation with $n$-gram Language Model
Authors:Huayang Li, Deng Cai, Jin Xu, Taro Watanabe
$N$-gram language models (LM) have been largely superseded by neural LMs as the latter exhibits better performance. However, we find that $n$-gram models can achieve satisfactory performance on a large proportion of testing cases, indicating they have already captured abundant knowledge of the language with relatively low computational cost. With this observation, we propose to learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution. The combination of $n$-gram and neural LMs not only allows the neural part to focus on the deeper understanding of language but also provides a flexible way to customize an LM by switching the underlying $n$-gram model without changing the neural model. Experimental results on three typical language tasks (i.e., language modeling, machine translation, and summarization) demonstrate that our approach attains additional performance gains over popular standalone neural models consistently. We also show that our approach allows for effective domain adaptation by simply switching to a domain-specific $n$-gram model, without any extra training. Our code is released at https://github.com/ghrua/NgramRes.
PDF Accepted to findings of EMNLP 2022
点此查看论文截图
Operator Selection in Adaptive Large Neighborhood Search using Deep Reinforcement Learning
Authors:Robbert Reijnen, Yingqian Zhang, Hoong Chuin Lau, Zaharah Bukhsh
Large Neighborhood Search (LNS) is a popular heuristic for solving combinatorial optimization problems. LNS iteratively explores the neighborhoods in solution spaces using destroy and repair operators. Determining the best operators for LNS to solve a problem at hand is a labor-intensive process. Hence, Adaptive Large Neighborhood Search (ALNS) has been proposed to adaptively select operators during the search process based on operator performances of the previous search iterations. Such an operator selection procedure is a heuristic, based on domain knowledge, which is ineffective with complex, large solution spaces. In this paper, we address the problem of selecting operators for each search iteration of ALNS as a sequential decision problem and propose a Deep Reinforcement Learning based method called Deep Reinforced Adaptive Large Neighborhood Search. As such, the proposed method aims to learn based on the state of the search which operation to select to obtain a high long-term reward, i.e., a good solution to the underlying optimization problem. The proposed method is evaluated on a time-dependent orienteering problem with stochastic weights and time windows. Results show that our approach effectively learns a strategy that adaptively selects operators for large neighborhood search, obtaining competitive results compared to a state-of-the-art machine learning approach while trained with much fewer observations on small-sized problem instances.
PDF
点此查看论文截图
Unsupervised Model Adaptation for Source-free Segmentation of Medical Images
Authors:Serban Stan, Mohammad Rostami
The recent prevalence of deep neural networks has lead semantic segmentation networks to achieve human-level performance in the medical field when sufficient training data is provided. Such networks however fail to generalize when tasked with predicting semantic maps for out-of-distribution images, requiring model re-training on the new distributions. This expensive process necessitates expert knowledge in order to generate training labels. Distribution shifts can arise naturally in the medical field via the choice of imaging device, i.e. MRI or CT scanners. To combat the need for labeling images in a target domain after a model is successfully trained in a fully annotated \textit{source domain} with a different data distribution, unsupervised domain adaptation (UDA) can be used. Most UDA approaches ensure target generalization by creating a shared source/target latent feature space. This allows a source trained classifier to maintain performance on the target domain. However most UDA approaches require joint source and target data access, which may create privacy leaks with respect to patient information. We propose an UDA algorithm for medical image segmentation that does not require access to source data during adaptation, and is thus capable in maintaining patient data privacy. We rely on an approximation of the source latent features at adaptation time, and create a joint source/target embedding space by minimizing a distributional distance metric based on optimal transport. We demonstrate our approach is competitive to recent UDA medical segmentation works even with the added privacy requisite.
PDF