Camera Fundamentals (Beta 2 SDK)
Content Archived
This content is no longer current. Our recommendation for up to date content: http://channel9.msdn.com/Series/KinectQuickstart/Audio-Fundamentals
Please use the newly updated Kinect for Windows SDK Quickstart series. The content below will only work with the Beta 2 version of the Kinect for Windows SDK.
This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).
The video has not been updated for Beta 2, but the following changes have been made:
The steps below assume you have setup your development environment as explained in the "Setting Up Your Development Environment" video.
We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:
XAML
<Window x:Class="AudioRecorder.MainWindow"
xmlns="https://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml"
Title="Audio Recorder Sample" Height="159" Width="525">
<Grid>
<StackPanel>
<StackPanel Orientation="Horizontal">
<Label Content="Seconds to Record: " />
<Label Content="{Binding ElementName=RecordForTimeSpan, Path=Value}" />
</StackPanel>
<Slider Name="RecordForTimeSpan" Minimum="1" Maximum="25" IsSnapToTickEnabled="True" />
<StackPanel Orientation="Horizontal" HorizontalAlignment="Center">
<Button Content="Record" Height="50" Width="100" Name="RecordButton" />
<Button Content="Play" Height="50" Width="100" Name="PlayButton" />
</StackPanel>
<MediaElement Name="audioPlayer" />
</StackPanel>
</Grid>
</Window>
For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well
The first task is to add in the Kinect Audio library:
C#
using Microsoft.Research.Kinect.Audio;
Visual Basic
Imports Microsoft.Research.Kinect.Audio
There are two ways we can record audio. You can record audio synchronously, meaning that the UI thread will in effect be “frozen” while we record audio using it. Alternatively, you can record audio on a separate thread so that the UI thread remains responsive to events while the recording happens in parallel. Our sample includes both methods so you can choose which one is required for your application.
We’ll build variables to hold the amount of time we’ll record, the file name of the recording, and to enable asynchronous recording, we’ll use the FinishedRecording event to notify the UI thread that we're done recording:
C#
double _amountOfTimeToRecord; string _lastRecordedFileName; private event RoutedEventHandler FinishedRecording;
Visual Basic
Private _amountOfTimeToRecord As Double Private _lastRecordedFileName As String Private Event FinishedRecording As RoutedEventHandler
Next we’ll create the RecordAudio method that will do the actual audio recording.
C#
private void RecordAudio()
{
}
Visual Basic
Private Sub RecordAudio() End Sub
To create threads, we'll add in the System.Threading namespace:
C#
using System.Threading;
Visual Basic
Imports System.Threading
Now we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name.
Then we have the option of calling the RecordAudio method either synchronously or asynchronously as shown below:
‘C#
private void RecordButton_Click(object sender, RoutedEventArgs e)
{
RecordButton.IsEnabled = false;
PlayButton.IsEnabled = false;
_amountOfTimeToRecord = RecordForTimeSpan.Value;
_lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav";
var t = new Thread(new ThreadStart(RecordAudio));
t.SetApartmentState(ApartmentState.MTA);
t.Start();
}
Visual Basic
Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs)
RecordButton.IsEnabled = False
PlayButton.IsEnabled = False
_amountOfTimeToRecord = RecordForTimeSpan.Value
_lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav"
Dim t = New Thread(New ThreadStart(AddressOf RecordAudio))
t.SetApartmentState(ApartmentState.MTA)
t.Start()
End Sub
From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:
C#
using System.IO;
Visual Basic
Imports System.IO
The entire RecordAudio method:
C#
private void RecordAudio()
{
using (var source = new KinectAudioSource())
{
var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
var buffer = new byte[1024];
source.SystemMode = SystemMode.OptibeamArrayOnly;
using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
{
WriteWavHeader(fileStream, recordingLength);
//Start capturing audio
using (var audioStream = source.Start())
{
//Simply copy the data from the stream down to the file
int count, totalCount = 0;
while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
{
fileStream.Write(buffer, 0, count);
totalCount += count;
}
}
}
if (FinishedRecording != null)
FinishedRecording(null, null);
}
}
Visual Basic
Private Sub RecordAudio()
Using source = New KinectAudioSource
Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000
Dim buffer = New Byte(1023) {}
source.SystemMode = SystemMode.OptibeamArrayOnly
Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create)
WriteWavHeader(fileStream, recordingLength)
'Start capturing audio
Using audioStream = source.Start()
'Simply copy the data from the stream down to the file
Dim count As Integer, totalCount As Integer = 0
count = audioStream.Read(buffer, 0, buffer.Length)
Do While count > 0 AndAlso totalCount < recordingLength
fileStream.Write(buffer, 0, count)
totalCount += count
count = audioStream.Read(buffer, 0, buffer.Length)
Loop
End Using
End Using
RaiseEvent FinishedRecording(Nothing, Nothing)
End Using
End Sub
So we've recorded the audio, saved it, and fired off an event that said we're done—let's hook into it. We'll wire up that event in the MainWindow constructor:
c#
public MainWindow()
{
InitializeComponent();
FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);
}
Visual Basic
Public Sub New()
InitializeComponent()
AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecording
End Sub
Since that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:
C#
void MainWindow_FinishedRecording(object sender, RoutedEventArgs e)
{
Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));
}
private void ReenableButtons()
{
RecordButton.IsEnabled = true;
PlayButton.IsEnabled = true;
}
Visual Basic
Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs)
Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))
End Sub
Private Sub ReenableButtons()
RecordButton.IsEnabled = True
PlayButton.IsEnabled = True
End Sub
And finally, we'll make the Media element play back the audio we just saved! We'll also verify both that the file exists and that the user recorded some audio:
c#
private void PlayButton_Click(object sender, RoutedEventArgs e)
{
if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName))
{
audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute);
audioPlayer.LoadedBehavior = MediaState.Play;
audioPlayer.UnloadedBehavior = MediaState.Close;
}
}
Visual Basic
Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs)
If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then
audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute)
audioPlayer.LoadedBehavior = MediaState.Play
audioPlayer.UnloadedBehavior = MediaState.Close
End If
End Sub
To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK:
C#
using Microsoft.Speech.AudioFormat; using Microsoft.Speech.Recognition;
Visual Basic
Imports Microsoft.Speech.AudioFormat Imports Microsoft.Speech.Recognition
In VB we'll also need to add in a MTA flag as well under the Sub Main. C# does not need this.
Visual Basic
<MTAThread()> _ Shared Sub Main(ByVal args() As String)
Next, we need to setup the KinectAudioSource in a way that's compatbile for speech recognition:
C#
using (var source = new KinectAudioSource())
{
source.FeatureMode = true;
source.AutomaticGainControl = false; //Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
}
Visual Basic
Using source = New KinectAudioSource source.FeatureMode = True source.AutomaticGainControl = False 'Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sample End Using
With that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:
C#
private const string RecognizerId = "SR_MS_en-US_Kinect_10.0"; RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();
Visual Basic
Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0" Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()
Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green".
C#
using (var sre = new SpeechRecognitionEngine(ri.Id))
{
var colors = new Choices();
colors.Add("red");
colors.Add("green");
colors.Add("blue");
var gb = new GrammarBuilder();
//Specify the culture to match the recognizer in case we are running in a different culture.
gb.Culture = ri.Culture;
gb.Append(colors);
// Create the actual Grammar instance, and then load it into the speech recognizer.
var g = new Grammar(gb);
sre.LoadGrammar(g);
}
Visual Basic
Using sre = New SpeechRecognitionEngine(ri.Id)
Dim colors = New Choices
colors.Add("red")
colors.Add("green")
colors.Add("blue")
Dim gb = New GrammarBuilder
'Specify the culture to match the recognizer in case we are running in a different culture
gb.Culture = ri.Culture
gb.Append(colors)
' Create the actual Grammar instance, and then load it into the speech recognizer.
Dim g = New Grammar(gb)
sre.LoadGrammar(g)
End Using
Next, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected:
C#
sre.SpeechRecognized += SreSpeechRecognized; sre.SpeechHypothesized += SreSpeechHypothesized; sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;
Visual Basic
AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognized AddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesized AddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejected
Finally, the audio stream source from the Kinect is applied to the speech recognition engine:
C#
using (Stream s = source.Start())
{
sre.SetInputToAudioStream(s,
new SpeechAudioFormatInfo(
EncodingFormat.Pcm, 16000, 16, 1,
32000, 2, null));
Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
sre.RecognizeAsync(RecognizeMode.Multiple);
Console.ReadLine();
Console.WriteLine("Stopping recognizer ...");
sre.RecognizeAsyncStop();
}
Visual Basic
Using s As Stream = source.Start()
sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))
Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop")
sre.RecognizeAsync(RecognizeMode.Multiple)
Console.ReadLine()
Console.WriteLine("Stopping recognizer ...")
sre.RecognizeAsyncStop()
End Using
The event handlers specified earlier display information based on the result of the user's speech being recognized:
C#
static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
Console.WriteLine("\nSpeech Rejected");
if (e.Result != null)
DumpRecordedAudio(e.Result.Audio);
}
static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);
}
static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
}
private static void DumpRecordedAudio(RecognizedAudio audio)
{
if (audio == null)
return;
int fileId = 0;
string filename;
while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
fileId++;
Console.WriteLine("\nWriting file: {0}", filename);
using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
audio.WriteToWaveStream(file);
}
Visual Basic
Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs)
Console.WriteLine(vbLf & "Speech Rejected")
If e.Result IsNot Nothing Then
DumpRecordedAudio(e.Result.Audio)
End If
End Sub
Private Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs)
Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)
End Sub
Private Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)
Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)
End Sub
Private Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio)
If audio Is Nothing Then
Return
End If
Dim fileId As Integer = 0
Dim filename As String
filename = "RetainedAudio_" & fileId & ".wav"
Do While File.Exists(filename)
fileId += 1
filename = "RetainedAudio_" & fileId & ".wav"
Loop
Console.WriteLine(vbLf & "Writing file: {0}", filename)
Using file = New FileStream(filename, System.IO.FileMode.CreateNew)
audio.WriteToWaveStream(file)
End Using
End Sub
In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later.
We've created an application that can record audio for a variable amount of time with Kinect!
This is awesome!!
Great!
But how about other languages? Like German, French oder Spanish?
Are these supported?
How about a brief code sample of how general dictation might be used? When I try to modify the sample code to add
gb.AddDictation()
It crashes on
sre.LoadGrammar(g)
I've searched high and low for a solution but it appears this is a general issue (that the dictation stuff doesn't work) with the speech API so why is it there? Any help is much appreciated.
Thanks
typo: visaul -> visual
@George Birbilis: fixed the typo
@TheZar: are you using the x86 or x64 speech APIs?
Very good the tutorial but, do you have the code for Speech Recognition? Thanks
Is there any good resource out there for learning the SRGS XML format? The W3C specification is too.. specificationy, and all the tutorials I've found so far deal with the BNF format rather than the XML format.
Hi, thanks for sharing us such a good tutorial. But I personally find it is not so difficult to record streaming audio from microphone by standalone audio recorders, not built-in ones.
Hi,
I'm trying to get both speech recognition and Text to speech to work on a WPF app (C#)
I have the Recognition down but the synthesizer part keeps giving an error of "No voice installed on the system or none available with the current security setting."
I have both "Microsoft Speech Platform - Software Development Kit (SDK) (Version 10.2)" and "Microsoft Speech Platform - Server Runtime (Version 10.2)" in X86 and X64 installed on my system.
Can anyone tell me whats wrong? I would really really appreciate it.
Thanks,
Hiva
I am trying to add speech recognition to a WPF C# app. I am receiving video, skeletal, and depth data correctly, but whenever I start capturing the audio I receive the exception error bellow. I can run the demo above correctly. Is there a reference or an extra step needed when using WPF.
System.InvalidCastException was unhandled
Message=Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Research.Kinect.Audio.IMediaObject'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{D8AD0F58-5494-4102-97C5-EC798E59BCF4}' failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).
Source=mscorlib
StackTrace:
at System.StubHelpers.StubHelpers.GetCOMIPFromRCW(Object objSrc, IntPtr pCPCMD, Boolean& pfNeedsRelease)
at Microsoft.Research.Kinect.Audio.IMediaObject.ProcessOutput(Int32 dwFlags, Int32 cOutputBufferCount, DMO_OUTPUT_DATA_BUFFER[] pOutputBuffers, Int32& pdwStatus)
at Microsoft.Research.Kinect.Audio.KinectAudioStream.RunCapture(Object notused)
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart(Object obj)
InnerException:
[/code]
For some reason i only have the Microsoft Lightweight Speech Recognizer v11.0 (SR_MS_ZXX_Lightweight_v11.0) showing up as an available speech recognizer. I've double-checked that i have everything installed correctly, and i'm referencing the C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll. Any ideas why i don't see the Kinect Recognizer?