基于BiLSTM和Conformer模型的英文语音识别抗噪声性能对比研究

[2]GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5-6): 602-610.

[3]GULATI A, et al. Conformer: convolution-augmented transformer for speech recognition [EB/OL]. (2020-05-16) [2026-03-13]. https: //arxiv.org/abs/2005.08100.

[4]VINCENT E, WATANABE S, NUGRAHA A A, et al. An analysis of the CHiME-4 sound source separation and ASR challenge [J]. Computer Speech & Language, 2017, 46: 535-557.

[5]XU Y, DU J, DAI L R, et al. A regression approach to speech enhancement based on deep neural networks [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1): 7-19.

[6]SELTZER M L, YU D, WANG Y. An investigation of deep neural networks for noise robust speech recognition [C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vancouver: IEEE, 2013: 7398-7402.

[7]LI J, et al. Developing far-field speaker system for CHiME-5 challenge [C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary: IEEE, 2018: 5744-5748.

[8]WANG D, et al. On the use of ideal binary mask as a target for supervised speech separation [J]. Speech Communication, 2019, 107: 9-17.

[9]PANAYOTOV V, CHEN G, POVEY D, et al. Librispeech: an ASR corpus based on public domain audio books [C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane: IEEE, 2015: 5206-5210.

版权与开放获取声明

作为一本开放获取的学术期刊，所有文章均遵循 Creative Commons Attribution 4.0 International License (CC BY 4.0) 协议发布，允许用户在署名原作者的前提下自由共享与再利用内容。所有文章均可免费供读者和机构阅读、下载、引用与传播，EWA Publishing 不会通过期刊的出版发行向读者或机构收取任何费用。