利用微软认知服务实现语音识别功能

  想实现语音识别已经很久了,也尝试了许多次,终究还是失败了,原因很多,识别效果不理想,个人在技术上没有成功实现,种种原因,以至于花费了好多时间在上面。语音识别,我尝试过的有科大讯飞、百度语音,微软系。最终还是喜欢微软系的简洁高效。(勿喷,纯个人感觉)

  最开始自己的想法是我说一句话(暂且在控制台上做Demo),控制台程序能识别我说的是什么,然后显示出来,并且根据我说的信息,执行相应的行为.(想法很美好,现实很糟心)初入语音识别,各种错误各种来,徘徊不定的选择哪家公司的api,百度上查找各种语音识别的demo,学习参考,可是真正在.NET平台上运行成功的却是寥寥无几,或许是我查找方向有问题,经历了许多的坑,没一次成功过,心灰且意冷,打了几次退堂鼓,却终究忍受不住想玩语音识别。

  可以看看我VS中的语音demo

  

  第一个是今天的主角-稍后再提。

  第二个和第三个是微软系的系统自带的System.Speech.dll和看了微软博客里面的一篇文章而去尝试的Microsoft.Speech.dll 可惜文章写的挺好的,我尝试却是失败   的,并且发现一个问题,就是英文版的微软语音识别是无效的(Microsoft.Speech.Recognition),而中文版的语音合成是无效的(Microsoft.Speech.Synthesis).,因    此,我不得不将两个dll混合使用,来达到我想要的效果,最终效果确实达到了,不过却是极其简单的,一旦识别词汇多起来,这识别率直接下降,我一直认为是采样  频率的问题,可是怎么也找不到采样频率的属性或是字段,如有会的朋友可给我点信息,让我也飞起来,哈哈。

  第四个是百度语音识别demo,代码简洁许多,实现难度不难,可是小细节很多,需要注意,然后是雷区挺多的,但是呢,指导走出雷区的说明书却是太少了,我是  踩了雷,很痛的那群。

  首先来看看,现在市面上主流语音识别设计方式:

  1、离线语音识别

  离线语音识别很好理解,就是语音识别库在本地或是局域网内,无需发起远程连接。这个也是我当初的想法,自己弄一套语音识别库,然后根据里面的内容设计想要的行为请求。利用微软系的System.Speech.dll中的语音识别和语音合成功能。实现了简单的中文语音识别功能,但是一旦我将语音识别库逐渐加大,识别率就越来越低,不知是我电脑麦克风不行还是其它原因。最终受打击,放弃。当我试着学习百度语音时,也发现了离线语音识别库,但是呢官方并没有给出具体的操作流程和设计思路,我也没有去深入了解,有时间我要好好了解一番。

 1 using System;
 2 //using Microsoft.Speech.Synthesis;//中文版tts不能发声
 3 using Microsoft.Speech.Recognition;
 4 using System.Speech.Synthesis;
 5 //using System.Speech.Recognition;
 6
 7 namespace SAssassin.SpeechDemo
 8 {
 9     /// <summary>
10     /// 微软语音识别 中文版 貌似效果还好点
11     /// </summary>
12     class Program
13     {
14         static SpeechSynthesizer sy = new SpeechSynthesizer();
15         static void Main(string[] args)
16         {
17             //创建中文识别器
18             using (SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("zh-CN")))
19             {
20                 foreach (var config in SpeechRecognitionEngine.InstalledRecognizers())
21                 {
22                     Console.WriteLine(config.Id);
23                 }
24                 //初始化命令词
25                 Choices commonds = new Choices();
26                 string[] commond1 = new string[] { "一", "二", "三", "四", "五", "六", "七", "八", "九" };
27                 string[] commond2 = new string[] { "很高兴见到你", "识别率", "assassin", "长沙", "湖南", "实习" };
28                 string[] commond3 = new string[] { "开灯", "关灯", "播放音乐", "关闭音乐", "浇水", "停止浇水", "打开背景灯", "关闭背景灯" };
29                 //添加命令词
30                 commonds.Add(commond1);
31                 commonds.Add(commond2);
32                 commonds.Add(commond3);
33                 //初始化命令词管理
34                 GrammarBuilder gBuilder = new GrammarBuilder();
35                 //将命令词添加到管理中
36                 gBuilder.Append(commonds);
37                 //实例化命令词管理
38                 Grammar grammar = new Grammar(gBuilder);
39
40                 //创建并加载听写语法(添加命令词汇识别的比较精准)
41                 recognizer.LoadGrammarAsync(grammar);
42                 //为语音识别事件添加处理程序。
43                 recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(Recognizer_SpeechRRecongized);
44                 //将输入配置到语音识别器。
45                 recognizer.SetInputToDefaultAudioDevice();
46                 //启动异步,连续语音识别。
47                 recognizer.RecognizeAsync(RecognizeMode.Multiple);
48                 //保持控制台窗口打开。
49                 Console.WriteLine("你好");
50                 sy.Speak("你好");
51                 Console.ReadLine();
52             }
53         }
54
55         //speechrecognized事件处理
56         static void Recognizer_SpeechRRecongized(object sender, SpeechRecognizedEventArgs e)
57         {
58             Console.WriteLine("识别结果:" + e.Result.Text + " " + e.Result.Confidence + " " + DateTime.Now);
59             sy.Speak(e.Result.Text);
60         }
61     }
62 }

  2、在线语音识别。

  在线语音识别是我们当前程序将语音文件发送到远程服务中心,待远程服务中心匹配解决后将匹配结果进行返回的过程。其使用的一般是Restful风格,利用Json数据往返识别结果。

  刚开始学习科大讯飞的语音识别,刚开始什么也不懂,听朋友推荐加上自己百度学习,科大讯飞都说很不错,也抱着心态去学习学习,可是windows平台下只有C++的demo,无奈我是C#,虽说语言很大程度上不分家,可是不想过于麻烦,网上找了一个demo,据说是最全的C#版本的讯飞语音识别demo,可是当看到里面错综复杂的源代码时,内心是忧伤的,这里是直接通过一种方式引用c++的函数,运行了该demo,成功了,能简单的录音然后识别,但是有些地方存在问题,也得不到解决方案,不得已,放弃。

  后来,百度语音吸引我了,七月份时,重新开始看百度语音的demo,官网demo比较简单,尝试着学习了一下,首先你得到百度语音开放平台去创建应用得到App key 和Secret key,然后下载着demo,在构造函数或者字段中又或是写入配置文件中,将这两个得到的key写入,程序会根据这两个key去发起请求的。就如同开头所说,这是在线语音识别,利用Restful风格,将语音文件上传至百度语音识别中心,然后识别后将回执数据返回到我们的程序中,刚开始,配置的时候自己技术不怎么样,配置各种出错,地雷开始踩了,总要炸几次,最终还是能将demo中的测试文件识别出来,算是我个人的一小步把.(如果有朋友正好碰到踩雷问题,不妨可与我一起探讨,或许我也不懂,但在我踩过的里面至少我懂了,哈哈)

  

  接下来是设计思路的问题,语音识别能成功了,语音合成也能成功了,这里要注意,语音识别和语音合成要分别开通,并且这两个都有App Key和Secret Key 虽然是一样的,但是还是要注意,不然语音合成就会出问题的。接下来要考虑的问题就是,百度语音的设计思路是根据文件识别,但是我们考虑的最多的就是我直接麦克风语音输入,然后识别,这也是我的想法,接下来解决这一问题,设计思路是,我将输入的信息作为文件形式保存,等我输入完,然后就调用语音识别方法,这不就行了吗,确实也是可以的,此处,又开始进入雷区了,利用NAudio.dll文件实现录音功能,这个包可以在Nuget中下载。

 1 using NAudio.Wave;
 2 using System;
 3
 4 namespace SAssassin.VOC
 5 {
 6     /// <summary>
 7     /// 实现录音功能
 8     /// </summary>
 9     public class RecordWaveToFile
10     {
11         private WaveFileWriter waveFileWriter = null;
12         private WaveIn myWaveIn = null;
13
14         public void StartRecord()
15         {
16             ConfigWave();
17             myWaveIn.StartRecording();
18         }
19
20         private void ConfigWave()
21         {
22             string filePath = AppDomain.CurrentDomain.BaseDirectory + "Temp.wav";
23             myWaveIn = new WaveIn()
24             {
25                 WaveFormat = new WaveFormat(16000, 16, 1)//8k,16bit,单频
26                 //WaveFormat = new WaveFormat()//识别音质清晰
27             };
28             myWaveIn.DataAvailable += new System.EventHandler<WaveInEventArgs>(WaveIn_DataAvailable);
29             myWaveIn.RecordingStopped += new System.EventHandler<StoppedEventArgs>(WaveIn_RecordingStopped);
30             waveFileWriter = new WaveFileWriter(filePath, myWaveIn.WaveFormat);
31         }
32
33         private void WaveIn_DataAvailable(object sender,WaveInEventArgs e)
34         {
35             if(waveFileWriter != null)
36             {
37                 waveFileWriter.Write(e.Buffer,0,e.BytesRecorded);
38                 waveFileWriter.Flush();
39             }
40         }
41
42         private void WaveIn_RecordingStopped(object sender,StoppedEventArgs e)
43         {
44             myWaveIn.StopRecording();
45         }
46     }
47 }

此处控制器中使用WaveInEvent不会报错,可就在这之前,我用的是WaveIn类,然后直接报错了

“System.InvalidOperationException:“Use WaveInEvent to record on a background thread””

  在StackOverFlow上找到了解决方案,就是将WaveIn类换成WaveInEvent类即可,进入类里面看一下,其实发现都是引用同一个接口,甚至说两个类的结构都是一模一样的,只是一个用于GUI线程,一个用于后台线程。一切就绪,录音也能实现,可是当我查看自己的录音文件时,杂音好多,音质不侵袭,甚至是直接失真了,没什么用,送百度也识别失败,当将采样频率提高到44k时效果很好,录音文件很不错,但是问题来了,百度语音识别规定的pcm文件只能是8k-16bit,糟心,想换成其它格式的文件,采取压缩形式保存,但是一旦将采样频率降下来,这个效果就很糟糕,识别也是成了问题。不得不说,这还要慢慢来解决哈。

  进入今天重头戏,这也是我博客园第一篇随笔文章,该讲点重点了,微软认知服务,七月中旬的时候接触到了必应的语音识别api,在微软bing官网里,并且里面的识别效果,让我惊呼,这识别率太高了。然后想找它的api,发现文档全是英文资料,糟心。把资料看完,感觉使用方式很不错,也是远程调用的方式,但是api呢,官网找了老半天,只有文档,那时也没看上面的产品,试用版什么的,只能看着,却不能用,心累。也就在这几天,重新看了下必应的语音识别文档,才接触到这个词--"微软认知服务",     恕我见识太浅,这个好东西却没听过,百度一查,真是不错,微软太牛了,这个里面包含很多api,语音识别都只算小菜一只,人脸识别,语义感知,等等很牛的功能,找到Api,找到免费试用,登录获得app的secret key ,便可以用起来了。下载一个demo,将secret key输入,测试一下,哇塞,这识别效果,简直了,太强了。并且从百度中看到很多结果,使用到微软认知服务语音识别功能的很少,我也因此有写一点东西的想法。

  我将demo中的很多地方抽出来直接形成了一个控制器程序,源码如下

  1 public class SpeechConfig
  2     {
  3         #region Fields
  4         /// <summary>
  5         /// The isolated storage subscription key file name.
  6         /// </summary>
  7         private const string IsolatedStorageSubscriptionKeyFileName = "Subscription.txt";
  8
  9         /// <summary>
 10         /// The default subscription key prompt message
 11         /// </summary>
 12         private const string DefaultSubscriptionKeyPromptMessage = "Secret key";
 13
 14         /// <summary>
 15         /// You can also put the primary key in app.config, instead of using UI.
 16         /// string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"];
 17         /// </summary>
 18         private string subscriptionKey = ConfigurationManager.AppSettings["primaryKey"];
 19
 20         /// <summary>
 21         /// Gets or sets subscription key
 22         /// </summary>
 23         public string SubscriptionKey
 24         {
 25             get
 26             {
 27                 return this.subscriptionKey;
 28             }
 29
 30             set
 31             {
 32                 this.subscriptionKey = value;
 33                 this.OnPropertyChanged<string>();
 34             }
 35         }
 36
 37         /// <summary>
 38         /// The data recognition client
 39         /// </summary>
 40         private DataRecognitionClient dataClient;
 41
 42         /// <summary>
 43         /// The microphone client
 44         /// </summary>
 45         private MicrophoneRecognitionClient micClient;
 46
 47         #endregion Fields
 48
 49         #region event
 50         /// <summary>
 51         /// Implement INotifyPropertyChanged interface
 52         /// </summary>
 53         public event PropertyChangedEventHandler PropertyChanged;
 54
 55         /// <summary>
 56         /// Helper function for INotifyPropertyChanged interface
 57         /// </summary>
 58         /// <typeparam name="T">Property type</typeparam>
 59         /// <param name="caller">Property name</param>
 60         private void OnPropertyChanged<T>([CallerMemberName]string caller = null)
 61         {
 62             this.PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(caller));
 63         }
 64         #endregion event
 65
 66         #region 属性
 67         /// <summary>
 68         /// Gets the current speech recognition mode.
 69         /// </summary>
 70         /// <value>
 71         /// The speech recognition mode.
 72         /// </value>
 73         private SpeechRecognitionMode Mode
 74         {
 75             get
 76             {
 77                 if (this.IsMicrophoneClientDictation ||
 78                     this.IsDataClientDictation)
 79                 {
 80                     return SpeechRecognitionMode.LongDictation;
 81                 }
 82
 83                 return SpeechRecognitionMode.ShortPhrase;
 84             }
 85         }
 86
 87         /// <summary>
 88         /// Gets the default locale.
 89         /// </summary>
 90         /// <value>
 91         /// The default locale.
 92         /// </value>
 93         private string DefaultLocale
 94         {
 95             //get { return "en-US"; }
 96             get { return "zh-CN"; }
 97
 98         }
 99
100         /// <summary>
101         /// Gets the Cognitive Service Authentication Uri.
102         /// </summary>
103         /// <value>
104         /// The Cognitive Service Authentication Uri.  Empty if the global default is to be used.
105         /// </value>
106         private string AuthenticationUri
107         {
108             get
109             {
110                 return ConfigurationManager.AppSettings["AuthenticationUri"];
111             }
112         }
113
114         /// <summary>
115         /// Gets a value indicating whether or not to use the microphone.
116         /// </summary>
117         /// <value>
118         ///   <c>true</c> if [use microphone]; otherwise, <c>false</c>.
119         /// </value>
120         private bool UseMicrophone
121         {
122             get
123             {
124                 return this.IsMicrophoneClientWithIntent ||
125                     this.IsMicrophoneClientDictation ||
126                     this.IsMicrophoneClientShortPhrase;
127             }
128         }
129
130         /// <summary>
131         /// Gets the short wave file path.
132         /// </summary>
133         /// <value>
134         /// The short wave file.
135         /// </value>
136         private string ShortWaveFile
137         {
138             get
139             {
140                 return ConfigurationManager.AppSettings["ShortWaveFile"];
141             }
142         }
143
144         /// <summary>
145         /// Gets the long wave file path.
146         /// </summary>
147         /// <value>
148         /// The long wave file.
149         /// </value>
150         private string LongWaveFile
151         {
152             get
153             {
154                 return ConfigurationManager.AppSettings["LongWaveFile"];
155             }
156         }
157         #endregion 属性
158
159         #region 模式选择控制器设置
160         /// <summary>
161         /// Gets or sets a value indicating whether this instance is microphone client short phrase.
162         /// </summary>
163         /// <value>
164         /// <c>true</c> if this instance is microphone client short phrase; otherwise, <c>false</c>.
165         /// </value>
166         public bool IsMicrophoneClientShortPhrase { get; set; }
167
168         /// <summary>
169         /// Gets or sets a value indicating whether this instance is microphone client dictation.
170         /// </summary>
171         /// <value>
172         /// <c>true</c> if this instance is microphone client dictation; otherwise, <c>false</c>.
173         /// </value>
174         public bool IsMicrophoneClientDictation { get; set; }
175
176         /// <summary>
177         /// Gets or sets a value indicating whether this instance is microphone client with intent.
178         /// </summary>
179         /// <value>
180         /// <c>true</c> if this instance is microphone client with intent; otherwise, <c>false</c>.
181         /// </value>
182         public bool IsMicrophoneClientWithIntent { get; set; }
183
184         /// <summary>
185         /// Gets or sets a value indicating whether this instance is data client short phrase.
186         /// </summary>
187         /// <value>
188         /// <c>true</c> if this instance is data client short phrase; otherwise, <c>false</c>.
189         /// </value>
190         public bool IsDataClientShortPhrase { get; set; }
191
192         /// <summary>
193         /// Gets or sets a value indicating whether this instance is data client with intent.
194         /// </summary>
195         /// <value>
196         /// <c>true</c> if this instance is data client with intent; otherwise, <c>false</c>.
197         /// </value>
198         public bool IsDataClientWithIntent { get; set; }
199
200         /// <summary>
201         /// Gets or sets a value indicating whether this instance is data client dictation.
202         /// </summary>
203         /// <value>
204         /// <c>true</c> if this instance is data client dictation; otherwise, <c>false</c>.
205         /// </value>
206         public bool IsDataClientDictation { get; set; }
207
208         #endregion
209
210         #region 委托执行对象
211         /// <summary>
212         /// Called when the microphone status has changed.
213         /// </summary>
214         /// <param name="sender">The sender.</param>
215         /// <param name="e">The <see cref="MicrophoneEventArgs"/> instance containing the event data.</param>
216         private void OnMicrophoneStatus(object sender, MicrophoneEventArgs e)
217         {
218             Task task = new Task(() =>
219             {
220                 Console.WriteLine("--- Microphone status change received by OnMicrophoneStatus() ---");
221                 Console.WriteLine("********* Microphone status: {0} *********", e.Recording);
222                 if (e.Recording)
223                 {
224                     Console.WriteLine("Please start speaking.");
225                 }
226
227                 Console.WriteLine();
228             });
229             task.Start();
230         }
231
232         /// <summary>
233         /// Called when a partial response is received.
234         /// </summary>
235         /// <param name="sender">The sender.</param>
236         /// <param name="e">The <see cref="PartialSpeechResponseEventArgs"/> instance containing the event data.</param>
237         private void OnPartialResponseReceivedHandler(object sender, PartialSpeechResponseEventArgs e)
238         {
239             Console.WriteLine("--- Partial result received by OnPartialResponseReceivedHandler() ---");
240             Console.WriteLine("{0}", e.PartialResult);
241             Console.WriteLine();
242         }
243
244         /// <summary>
245         /// Called when an error is received.
246         /// </summary>
247         /// <param name="sender">The sender.</param>
248         /// <param name="e">The <see cref="SpeechErrorEventArgs"/> instance containing the event data.</param>
249         private void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e)
250         {
251             Console.WriteLine("--- Error received by OnConversationErrorHandler() ---");
252             Console.WriteLine("Error code: {0}", e.SpeechErrorCode.ToString());
253             Console.WriteLine("Error text: {0}", e.SpeechErrorText);
254             Console.WriteLine();
255         }
256
257         /// <summary>
258         /// Called when a final response is received;
259         /// </summary>
260         /// <param name="sender">The sender.</param>
261         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
262         private void OnMicShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
263         {
264             Task task = new Task(() =>
265             {
266                 Console.WriteLine("--- OnMicShortPhraseResponseReceivedHandler ---");
267
268                 // we got the final result, so it we can end the mic reco.  No need to do this
269                 // for dataReco, since we already called endAudio() on it as soon as we were done
270                 // sending all the data.
271                 this.micClient.EndMicAndRecognition();
272
273                 this.WriteResponseResult(e);
274             });
275             task.Start();
276         }
277
278         /// <summary>
279         /// Called when a final response is received;
280         /// </summary>
281         /// <param name="sender">The sender.</param>
282         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
283         private void OnDataShortPhraseResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
284         {
285             Task task = new Task(() =>
286             {
287                 Console.WriteLine("--- OnDataShortPhraseResponseReceivedHandler ---");
288
289                 // we got the final result, so it we can end the mic reco.  No need to do this
290                 // for dataReco, since we already called endAudio() on it as soon as we were done
291                 // sending all the data.
292                 this.WriteResponseResult(e);
293
294             });
295             task.Start();
296         }
297
298         /// <summary>
299         /// Called when a final response is received;
300         /// </summary>
301         /// <param name="sender">The sender.</param>
302         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
303         private void OnMicDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
304         {
305             Console.WriteLine("--- OnMicDictationResponseReceivedHandler ---");
306             if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation ||
307                 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout)
308             {
309                 Task task = new Task(() =>
310                 {
311                     // we got the final result, so it we can end the mic reco.  No need to do this
312                     // for dataReco, since we already called endAudio() on it as soon as we were done
313                     // sending all the data.
314                     this.micClient.EndMicAndRecognition();
315                 });
316                 task.Start();
317             }
318
319             this.WriteResponseResult(e);
320         }
321
322         /// <summary>
323         /// Called when a final response is received;
324         /// </summary>
325         /// <param name="sender">The sender.</param>
326         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
327         private void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
328         {
329             Console.WriteLine("--- OnDataDictationResponseReceivedHandler ---");
330             if (e.PhraseResponse.RecognitionStatus == RecognitionStatus.EndOfDictation ||
331                 e.PhraseResponse.RecognitionStatus == RecognitionStatus.DictationEndSilenceTimeout)
332             {
333                 Task task = new Task(() =>
334                 {
335
336                     // we got the final result, so it we can end the mic reco.  No need to do this
337                     // for dataReco, since we already called endAudio() on it as soon as we were done
338                     // sending all the data.
339                 });
340                 task.Start();
341             }
342
343             this.WriteResponseResult(e);
344         }
345
346         /// <summary>
347         /// Sends the audio helper.
348         /// </summary>
349         /// <param name="wavFileName">Name of the wav file.</param>
350         private void SendAudioHelper(string wavFileName)
351         {
352             using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
353             {
354                 // Note for wave files, we can just send data from the file right to the server.
355                 // In the case you are not an audio file in wave format, and instead you have just
356                 // raw data (for example audio coming over bluetooth), then before sending up any
357                 // audio data, you must first send up an SpeechAudioFormat descriptor to describe
358                 // the layout and format of your raw audio data via DataRecognitionClient‘s sendAudioFormat() method.
359                 int bytesRead = 0;
360                 byte[] buffer = new byte[1024];
361
362                 try
363                 {
364                     do
365                     {
366                         // Get more Audio data to send into byte buffer.
367                         bytesRead = fileStream.Read(buffer, 0, buffer.Length);
368
369                         // Send of audio data to service.
370                         this.dataClient.SendAudio(buffer, bytesRead);
371                     }
372                     while (bytesRead > 0);
373                 }
374                 finally
375                 {
376                     // We are done sending audio.  Final recognition results will arrive in OnResponseReceived event call.
377                     this.dataClient.EndAudio();
378                 }
379             }
380         }
381         #endregion 委托执行对象
382
383         #region 辅助方法
384         /// <summary>
385         /// Gets the subscription key from isolated storage.
386         /// </summary>
387         /// <returns>The subscription key.</returns>
388         private string GetSubscriptionKeyFromIsolatedStorage()
389         {
390             string subscriptionKey = null;
391
392             using (IsolatedStorageFile isoStore = IsolatedStorageFile.GetStore(IsolatedStorageScope.User | IsolatedStorageScope.Assembly, null, null))
393             {
394                 try
395                 {
396                     using (var iStream = new IsolatedStorageFileStream(IsolatedStorageSubscriptionKeyFileName, FileMode.Open, isoStore))
397                     {
398                         using (var reader = new StreamReader(iStream))
399                         {
400                             subscriptionKey = reader.ReadLine();
401                         }
402                     }
403                 }
404                 catch (FileNotFoundException)
405                 {
406                     subscriptionKey = null;
407                 }
408             }
409
410             if (string.IsNullOrEmpty(subscriptionKey))
411             {
412                 subscriptionKey = DefaultSubscriptionKeyPromptMessage;
413             }
414
415             return subscriptionKey;
416         }
417
418         /// <summary>
419         /// Creates a new microphone reco client without LUIS intent support.
420         /// </summary>
421         private void CreateMicrophoneRecoClient()
422         {
423             this.micClient = SpeechRecognitionServiceFactory.CreateMicrophoneClient(
424                 this.Mode,this.DefaultLocale,this.SubscriptionKey);
425
426             this.micClient.AuthenticationUri = this.AuthenticationUri;
427
428             // Event handlers for speech recognition results
429             this.micClient.OnMicrophoneStatus += this.OnMicrophoneStatus;
430             this.micClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler;
431             if (this.Mode == SpeechRecognitionMode.ShortPhrase)
432             {
433                 this.micClient.OnResponseReceived += this.OnMicShortPhraseResponseReceivedHandler;
434             }
435             else if (this.Mode == SpeechRecognitionMode.LongDictation)
436             {
437                 this.micClient.OnResponseReceived += this.OnMicDictationResponseReceivedHandler;
438             }
439
440             this.micClient.OnConversationError += this.OnConversationErrorHandler;
441         }
442
443         /// <summary>
444         /// Creates a data client without LUIS intent support.
445         /// Speech recognition with data (for example from a file or audio source).
446         /// The data is broken up into buffers and each buffer is sent to the Speech Recognition Service.
447         /// No modification is done to the buffers, so the user can apply their
448         /// own Silence Detection if desired.
449         /// </summary>
450         private void CreateDataRecoClient()
451         {
452             this.dataClient = SpeechRecognitionServiceFactory.CreateDataClient(
453                 this.Mode,
454                 this.DefaultLocale,
455                 this.SubscriptionKey);
456             this.dataClient.AuthenticationUri = this.AuthenticationUri;
457
458             // Event handlers for speech recognition results
459             if (this.Mode == SpeechRecognitionMode.ShortPhrase)
460             {
461                 this.dataClient.OnResponseReceived += this.OnDataShortPhraseResponseReceivedHandler;
462             }
463             else
464             {
465                 this.dataClient.OnResponseReceived += this.OnDataDictationResponseReceivedHandler;
466             }
467
468             this.dataClient.OnPartialResponseReceived += this.OnPartialResponseReceivedHandler;
469             this.dataClient.OnConversationError += this.OnConversationErrorHandler;
470         }
471
472         /// <summary>
473         /// Writes the response result.
474         /// </summary>
475         /// <param name="e">The <see cref="SpeechResponseEventArgs"/> instance containing the event data.</param>
476         private void WriteResponseResult(SpeechResponseEventArgs e)
477         {
478             if (e.PhraseResponse.Results.Length == 0)
479             {
480                 Console.WriteLine("No phrase response is available.");
481             }
482             else
483             {
484                 Console.WriteLine("********* Final n-BEST Results *********");
485                 for (int i = 0; i < e.PhraseResponse.Results.Length; i++)
486                 {
487                     Console.WriteLine(
488                         "[{0}] Confidence={1}, Text=\"{2}\"",
489                         i,
490                         e.PhraseResponse.Results[i].Confidence,
491                         e.PhraseResponse.Results[i].DisplayText);
492                     if (e.PhraseResponse.Results[i].DisplayText == "关闭。")
493                     {
494                         Console.WriteLine("收到命令,马上关闭");
495                     }
496                 }
497
498                 Console.WriteLine();
499             }
500         }
501         #endregion 辅助方法
502
503         #region Init
504         public SpeechConfig()
505         {
506             this.IsMicrophoneClientShortPhrase = true;
507             this.IsMicrophoneClientWithIntent = false;
508             this.IsMicrophoneClientDictation = false;
509             this.IsDataClientShortPhrase = false;
510             this.IsDataClientWithIntent = false;
511             this.IsDataClientDictation = false;
512
513             this.SubscriptionKey = this.GetSubscriptionKeyFromIsolatedStorage();
514         }
515
516         /// <summary>
517         /// 语音识别开始执行
518         /// </summary>
519         public void SpeechRecognize()
520         {
521             if (this.UseMicrophone)
522             {
523                 if (this.micClient == null)
524                 {
525                     this.CreateMicrophoneRecoClient();
526                 }
527
528                 this.micClient.StartMicAndRecognition();
529             }
530             else
531             {
532                 if (null == this.dataClient)
533                 {
534                     this.CreateDataRecoClient();
535                 }
536
537                 this.SendAudioHelper((this.Mode == SpeechRecognitionMode.ShortPhrase) ? this.ShortWaveFile : this.LongWaveFile);
538             }
539         }
540         #endregion Init
541     }

  在这其中有几个引用文件可以通过nuget包下载,基本没什么问题。

对了这里注意的一个问题就是,下载Microsoft.Speech的时候一定是两个包都需要下载,不然会报错的,版本必须是4.5+以上的。

  只需替换默认的key就行,程序便可跑起来,效果真是很6

这识别率真是很好很好,很满意,可是这个微软的免费试用只有一个月,那就只能在这个月里多让它开花结果了哈哈。

  第一篇博客我推荐了微软认知服务-语音识别api,亲僧体会过强大,才想将其作为首篇博客内容。

  2017-08-20,望技术有成后能回来看见自己的脚步。

时间: 2024-10-05 15:42:43

利用微软认知服务实现语音识别功能的相关文章

【AI开发第一步】微软认知服务API应用

目录 介绍 API分类 使用‘视觉’API完成的Demo 点击直接看干货 介绍 从3月份Google家的阿尔法狗打败韩国围棋冠军选手李世石,到之后微软Build2016大会宣布的“智能机器人”战略.种种迹象表明未来慢慢会进入“人工智能”时代,人工智能不再像以前那样听起来高大上,普通的码农屌丝也能开发出具备人类智慧的APP.听起来是不是很叼? 以前是这样的: You:吴博士,您研究的主要方向是撒? 吴博士:人工智能. You:挖槽,好叼.你觉得未来机器人会不会控制人类呢? 吴博士:...... 现

微软认知服务应用秘籍 – 搭建基于云端的中间层以支持跨平台的智能视觉服务

不断演进的应用场景 初级应用场景—宅在家里 场景:Bob同学有一天在网上看到了一张建筑物的图片,大发感慨:"好漂亮啊!这是哪里?我要去亲眼看看!"Bob同学不想问别人,可笑的自尊心让他觉得这肯定是个著名的建筑,如果自己不知道多丢脸!怎么解决Bob同学的烦恼呢? 我们看看微软认知服务是否能帮助到Bob同学,打开这个链接: https://azure.microsoft.com/zh-cn/services/cognitive-services/computer-vision/ 向下卷滚屏

微软认知服务开发实践(3) - 人脸识别

前言 人们对人脸识别的研究已经有很长一段时间,起初局限于获取基础的人脸信息,随着机器学习领域的发展,人脸识别的应用更加广泛,已经可以被用于人脸搜索.人脸鉴权等相关应用.本文将针对微软认知服务中提供的人脸识别API的调用方法进行一些初步的讲解. Face API Face API中提供了3方面功能: 人脸检测 人脸分组 人脸识别(搜索) 首先是人脸检测,主要是指传统概念上的人脸识别功能,识别图片中的人的面孔,给出人脸出现的坐标区域,并根据识别出来的人脸分析出一些基本的信息(例如年龄). 其次是人脸

微软认知服务开发实践(2) - 计算机视觉

前言 计算机视觉所涉及的面很广泛,Computer Vision API中提供了几个常用的分析功能,可实现解读图片内容信息,对图片进行OCR识别,生成缩略图,未来也许会增加更多功能.本文将针对其分别是本文将针对Cognitive Services中提供的功能的调用方法做一些讲解. Computer Vision API 微软认知服务中的计算机视觉分析主要是针对一张静态图片进行分析,当然图片的格式会有一定的要求, 图片的格式需要是JPEG.PNG.GIF以及BMP 图片尺寸不可以大于4MB 图片像

使用微软认知服务进行人脸识别

最近在搞一个人脸识别的功能,使用了微软的认知服务,一下讲一个我遇到的小问题. 首先添加相关相应的NuGet:Microsoft.ProjectOxford.Face 然后构造FaceServiceClient并调用DetectAsync方法识别人脸信息. 构造FaceServiceClient需要一个key,之前公司申请好了,过程没什么问题.之后由于使用的是免费的API,调用次数受到限制,打算自己再申请一个账户,过程麻烦的一B,实名制以及上传身份证等,,, 最后当然还是搞定了并拿到了Key,然而

PHP使用微软认知服务Face API

下面主要介绍基于PHP语言,基于guzzle类库,调用微软最新推出的认知服务:人脸识别. 实验环境: IDE:Eclipse for PHP Developers Version: Neon.1 Release (4.6.1) Server:WampServer Version 2.5 HttpClient:guzzle 1. 使用composer安装Guzzle composer.json文件 { "require": { "guzzlehttp/guzzle":

微软在Build 2016开发者大会中发布 “认知服务”,牛津计划有正式名字啦!

2016年3月30日:微软在Build 2016开发者大会中发布"认知服务". 在Build 2016开发者大会中,微软发布了新的智能服务:微软认知服务(Microsoft Cognitive Services).该服务集合了多种智能API以及知识API.借助这些API,开发者可以开发出更智能,更有吸引力的产品.微软认知服务集合了多种来自Bing,前"牛津计划"等项目的智能API.应用了这些API的系统能看,能听,能说话,并且能理解和解读我们通过自然交流方法所传达的

服务大众的人工智能---认知服务

什么是认知服务Cognitive Service? 认知服务是由微软在IBM认知计算[^1]的基础上提出来的,简单来讲,认知服务是基于文本分析.语音理解.以及视觉输入等形式经过人工智能网络分析后所提供的一种服务形式.微软认知服务的前身就是其大名鼎鼎的牛津计划[^2]项目,2015年火爆朋友圈的How-Old.net[^3]应用就是利用图片识别和情感分析等技术来判断一张图片中所有人的年龄,在很大程度上分析一个人的年龄基于一些固定的算法或者由于机器无法“理解”人类的“欺骗”技巧,所以可以对图片进行特

ASP.NET Core环境Web Audio API+SingalR+微软语音服务实现web实时语音识别

处于项目需要,我研究了一下web端的语音识别实现.目前市场上语音服务已经非常成熟了,国内的科大讯飞或是国外的微软在这块都可以提供足够优质的服务,对于我们工程应用来说只需要花钱调用接口就行了,难点在于整体web应用的开发.最开始我实现了一个web端录好音然后上传服务端进行语音识别的简单demo,但是这种结构太过简单,对浏览器的负担太重,而且响应慢,交互差:后来经过调研,发现微软的语音服务接口是支持流输入的连续识别的,因此开发重点就在于实现前后端的流式传输.参考这位国外大牛写的博文Continuou