Speech


2024-04-14 更新

Linguistic Changes in Spontaneous Speech for Detecting Parkinsons Disease Using Large Language Models

Authors:Jonathan Crawford

Parkinsons disease is the second most prevalent neurodegenerative disorder with over ten million active cases worldwide and one million new diagnoses per year. Detecting and subsequently diagnosing the disease is challenging because of symptom heterogeneity with respect to complexity, as well as the type and timing of phenotypic manifestations. Typically, language impairment can present in the prodromal phase and precede motor symptoms suggesting that a linguistic-based approach could serve as a diagnostic method for incipient Parkinsons disease. Additionally, improved linguistic models may enhance other approaches through ensemble techniques. The field of large language models is advancing rapidly, presenting the opportunity to explore the use of these new models for detecting Parkinsons disease and to improve on current linguistic approaches with high-dimensional representations of linguistics. We evaluate the application of state-of-the-art large language models to detect Parkinsons disease automatically from spontaneous speech with up to 73% accuracy.
PDF 12 pages, 3 figures

点此查看论文截图

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Authors:Khai Le-Duc

Due to privacy restrictions, there’s a shortage of publicly available speech recognition datasets in the medical domain. In this work, we present VietMed - a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world’s largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country. Moreover, we release the first public large-scale pre-trained models for Vietnamese ASR, w2v2-Viet and XLSR-53-Viet, along with the first public large-scale fine-tuned models for medical ASR. Even without any medical data in unsupervised pre-training, our best pre-trained model XLSR-53-Viet generalizes very well to the medical domain by outperforming state-of-the-art XLSR-53, from 51.8% to 29.6% WER on test set (a relative reduction of more than 40%). All code, data and models are made publicly available here: https://github.com/leduckhai/MultiMed.
PDF LREC-COLING 2024

点此查看论文截图

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

Authors:Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.
PDF 5 pages, 3 figures. Report of a challenge

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录