End-to-End Speech Recognition in English and Mandarin

w语音识别、噪音、方言,算法迭代。

https://arxiv.org/abs/1512.02595

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

时间: 2024-12-23 06:11:49

End-to-End Speech Recognition in English and Mandarin的相关文章

EE 519: Speech Recognition and Processing for Multimedia

Out: Mar 30 2019Due: Apr 13 2019EE 519: Speech Recognition andProcessing for MultimediaSpring 2019Homework 5There are 2 problems in this homework, with several questions. Please make sure to show the details of working foreach question. Answers witho

Bing Speech Recognition 标记

Bing Speech Services Bing Bing Speech Services provide speech capabilities for Windows and Windows Phone https://msdn.microsoft.com/en-us/library/dn303461.aspx 已失效.

5、《Speech recognition with speech synthesis models by marginalising over decision tree leaves》_1

2.1 Decision Tree Marginalization 现在决策树边缘化的基本过程已经了解了 简单叙述一下: 这个决策树是HMM合成的决策树 给定的triphone标注是:r-ih+z 然后,根据给定的triphone标注,利用当前的语音合成的模型,去推理得到语音识别的模型 对给定的triphone利用当前的语音合成的决策树,从根节点开始往下跑 根节点问题,右边的是清音吗?右边的音明显是z,是浊音, 所以前往左边的节点,然后问题是:音节是重度吗?擦,这个问题在上下文信息中是没有的,怎

speech recognition resource

sirius http://sirius.clarity-lab.org/sirius/#install $ tar xzf sirius-1.0.1.tar.gz $ cd sirius/sirius-application $ tar xzf question-answer.tar.gz $ sudo ./get-dependencies.sh $ sudo ./get-opencv.sh $ ./get-kaldi.sh $ ./compile-sirius-servers.sh $ ./

课程推荐:Speech Recognition (Columbia University)

课程网站:http://www.ee.columbia.edu/~stanchen/fall09/e6870/ slide 下载地址:http://www.ee.columbia.edu/~stanchen/fall09/e6870/slides/ 推荐读物:www.cs.bham.ac.uk/~pxc/nlp/NLPA-Phon2.pdf

Awesome Torch

Awesome Torch This blog from: A curated list of awesome Torch tutorials, projects and communities. Table of Contents Tutorials Model Zoo Recurrent Networks Convolutional Networks ETC Libraries Model related GPU related IDE related ETC Links Tutorials

google batchnorm 资料总结

训练webface 李子青提出的大网络,总是出现过拟合,效果差. 尝试使用batchnorm. 参考博客: http://blog.csdn.net/malefactor/article/details/51549771 cnn 和rnn 中如何引入batchnorm http://blog.csdn.net/happynear/article/details/44238541  Google paper <Batch Normalization Accelerating Deep Networ

《RECURRENT BATCH NORMALIZATION》

Covariate 协变量:在实验的设计中,协变量是一个独立变量(解释变量),不为实验者所操纵,但仍影响实验结果. whiting : https://blog.csdn.net/elaine_bao/article/details/50890491 <Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift>:https://blog.csdn.net/sinat_

2019年,这8款自动语音识别方案你应该了解!

2019年,这8款自动语音识别方案你应该了解! 原创: AI前线小组 译 AI前线 1周前 作者 | Derrick Mwiti翻译 | 核子可乐编辑 | LindaAI 前线导读: 基于计算机的人类语音识别与处理能力,被统称为语音识别.目前,这项技术被广泛用于验证系统中的某些用户,以及面向谷歌智能助手.Siri 或者 Cortana 等智能设备下达指令. 从本质上讲,我们通过存储人声与训练自动语音识别系统以发现语音当中的词汇与表达模式.在本文中,我们将一同了解几篇旨在利用机器学习与深度学习技术