Content Archived

This content is no longer current. Our recommendation for up to date content: http://channel9.msdn.com/Series/KinectQuickstart/Audio-Fundamentals

Kinect for Windows SDK Beta 2 Quickstarts

Audio Fundamentals (Beta 2 SDK)

Jun 16, 2011 at 10:15AM

by Clint Rutkas, Dan Fernandez
Average of 4.5 out of 5 stars 5 ratings

Sign in to rate

12 comments

Play Audio Fundamentals (Beta 2 SDK)

Sign in to queue

Description

Update: Kinect for Window SDK v1 Quickstart Series now Available (Feb 1st)

Please use the newly updated Kinect for Windows SDK Quickstart series. The content below will only work with the Beta 2 version of the Kinect for Windows SDK.

This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).

[00:35] Kinect microphone information
[01:10] Audio data
[02:15] Speech recognition information
[05:08] Recording audio
[08:17] Speech recognition demo

Updates for Kinect for Windows SDK Beta 2 (Nov, 2011)

The video has not been updated for Beta 2, but the following changes have been made:

Beta 2 now enables you to record audio on a Single-Threaded Apartment (STA) thread, the default thread that is used for WPF applications. Previously, you had to create a new thread marked as a Multi-Threaded Apartment (MTA) for audio processing to work.
Beta 2 includes a new WPF audio example, KinectAudioDemo, that demonstrates speech recognition and calculating the angle of the current sound source.

Setup

The steps below assume you have setup your development environment as explained in the "Setting Up Your Development Environment" video.

Task: Designing Your UI

We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:

XAML

<Window x:Class="AudioRecorder.MainWindow"
        xmlns="https://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml"
        Title="Audio Recorder Sample" Height="159" Width="525">
    <Grid>
        <StackPanel>
            <StackPanel Orientation="Horizontal">
                <Label Content="Seconds to Record: " />
                <Label Content="{Binding ElementName=RecordForTimeSpan, Path=Value}" />
            </StackPanel>
            <Slider Name="RecordForTimeSpan" Minimum="1"  Maximum="25" IsSnapToTickEnabled="True" />
            <StackPanel Orientation="Horizontal" HorizontalAlignment="Center">
                <Button Content="Record" Height="50" Width="100" Name="RecordButton" />
                <Button Content="Play" Height="50" Width="100" Name="PlayButton" />
            </StackPanel>
            <MediaElement Name="audioPlayer" />
        </StackPanel>
    </Grid>
</Window>

Creating Click events

For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well

Task: Working with the KinectAudioSource

The first task is to add in the Kinect Audio library:

C#

using Microsoft.Research.Kinect.Audio;

Visual Basic

Imports Microsoft.Research.Kinect.Audio

Synchronous and asynchronous recording

There are two ways we can record audio. You can record audio synchronously, meaning that the UI thread will in effect be “frozen” while we record audio using it. Alternatively, you can record audio on a separate thread so that the UI thread remains responsive to events while the recording happens in parallel. Our sample includes both methods so you can choose which one is required for your application.

We’ll build variables to hold the amount of time we’ll record, the file name of the recording, and to enable asynchronous recording, we’ll use the FinishedRecording event to notify the UI thread that we're done recording:

C#

double _amountOfTimeToRecord;
string _lastRecordedFileName;
private event RoutedEventHandler FinishedRecording;

Visual Basic

Private _amountOfTimeToRecord As Double
Private _lastRecordedFileName As String
Private Event FinishedRecording As RoutedEventHandler

Next we’ll create the RecordAudio method that will do the actual audio recording.

C#

private void RecordAudio()
{
}

Visual Basic

Private Sub RecordAudio()
End Sub

To create threads, we'll add in the System.Threading namespace:

C#

using System.Threading;

Visual Basic

Imports System.Threading

Now we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name.

Then we have the option of calling the RecordAudio method either synchronously or asynchronously as shown below:

‘C#

private void RecordButton_Click(object sender, RoutedEventArgs e)
{
    RecordButton.IsEnabled = false;
    PlayButton.IsEnabled = false;
    _amountOfTimeToRecord = RecordForTimeSpan.Value; 
    _lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav";
            
    var t = new Thread(new ThreadStart(RecordAudio));
    t.SetApartmentState(ApartmentState.MTA);
    t.Start();
}

Visual Basic

Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs)

    RecordButton.IsEnabled = False
    PlayButton.IsEnabled = False
    _amountOfTimeToRecord = RecordForTimeSpan.Value
    _lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav"

    Dim t = New Thread(New ThreadStart(AddressOf RecordAudio))
    t.SetApartmentState(ApartmentState.MTA)
    t.Start()

End Sub

Task: Capturing Audio Data

From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

C#

using System.IO;

Visual Basic

Imports System.IO

The entire RecordAudio method:

C#

private void RecordAudio()
{
    using (var source = new KinectAudioSource())
    {
        var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
        var buffer = new byte[1024];
        source.SystemMode = SystemMode.OptibeamArrayOnly;
        using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
        {
            WriteWavHeader(fileStream, recordingLength);

            //Start capturing audio                               
            using (var audioStream = source.Start())
            {
                //Simply copy the data from the stream down to the file
                int count, totalCount = 0;
                while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
                {
                    fileStream.Write(buffer, 0, count);
                    totalCount += count;
                }
            }
        }

        if (FinishedRecording != null)
            FinishedRecording(null, null);
    }
}

Visual Basic

Private Sub RecordAudio()
    Using source = New KinectAudioSource

        Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000
        Dim buffer = New Byte(1023) {}

        source.SystemMode = SystemMode.OptibeamArrayOnly

        Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create)

            WriteWavHeader(fileStream, recordingLength)

            'Start capturing audio                               
            Using audioStream = source.Start()

                'Simply copy the data from the stream down to the file
                Dim count As Integer, totalCount As Integer = 0
                count = audioStream.Read(buffer, 0, buffer.Length)
                Do While count > 0 AndAlso totalCount < recordingLength

                    fileStream.Write(buffer, 0, count)
                    totalCount += count

                    count = audioStream.Read(buffer, 0, buffer.Length)
                Loop

            End Using

        End Using

        RaiseEvent FinishedRecording(Nothing, Nothing)

    End Using

End Sub

Task: Playing Back the Audio We Just Captured

So we've recorded the audio, saved it, and fired off an event that said we're done—let's hook into it. We'll wire up that event in the MainWindow constructor:

c#

public MainWindow()
{
    InitializeComponent();

    FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);
}

Visual Basic

Public Sub New()
    InitializeComponent()

    AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecording
End Sub

Since that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:

C#

void MainWindow_FinishedRecording(object sender, RoutedEventArgs e)
{
    Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));
}

private void ReenableButtons()
{
    RecordButton.IsEnabled = true;
    PlayButton.IsEnabled = true;
}

Visual Basic

Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs)
    Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))
End Sub

Private Sub ReenableButtons()
    RecordButton.IsEnabled = True
    PlayButton.IsEnabled = True
End Sub

And finally, we'll make the Media element play back the audio we just saved! We'll also verify both that the file exists and that the user recorded some audio:

c#

private void PlayButton_Click(object sender, RoutedEventArgs e)
{
    if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName))
    {
        audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute);
        audioPlayer.LoadedBehavior = MediaState.Play;
        audioPlayer.UnloadedBehavior = MediaState.Close;
    }
}

Visual Basic

Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs)

    If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then

        audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute)
        audioPlayer.LoadedBehavior = MediaState.Play
        audioPlayer.UnloadedBehavior = MediaState.Close

    End If

End Sub

Task: Speech Recognition

To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK:

C#

using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;

Visual Basic

Imports Microsoft.Speech.AudioFormat
Imports Microsoft.Speech.Recognition

In VB we'll also need to add in a MTA flag as well under the Sub Main. C# does not need this.

Visual Basic

<MTAThread()> _
Shared Sub Main(ByVal args() As String)

Next, we need to setup the KinectAudioSource in a way that's compatbile for speech recognition:

C#

using (var source = new KinectAudioSource())
{
    source.FeatureMode = true;
    source.AutomaticGainControl = false; //Important to turn this off for speech recognition
    source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample
}

Visual Basic

Using source = New KinectAudioSource

source.FeatureMode = True
source.AutomaticGainControl = False 'Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sample

End Using

With that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:

C#

private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();

Visual Basic

Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"
Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()

Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green".

C#

using (var sre = new SpeechRecognitionEngine(ri.Id))
{                
    var colors = new Choices();
    colors.Add("red");
    colors.Add("green");
    colors.Add("blue");
    var gb = new GrammarBuilder();
    //Specify the culture to match the recognizer in case we are running in a different culture.                                 
    gb.Culture = ri.Culture;
    gb.Append(colors);
  
    // Create the actual Grammar instance, and then load it into the speech recognizer.
    var g = new Grammar(gb);                  
    sre.LoadGrammar(g);
}

Visual Basic

Using sre = New SpeechRecognitionEngine(ri.Id)

Dim colors = New Choices
colors.Add("red")
colors.Add("green")
colors.Add("blue")

Dim gb = New GrammarBuilder
'Specify the culture to match the recognizer in case we are running in a different culture
gb.Culture = ri.Culture
gb.Append(colors)

' Create the actual Grammar instance, and then load it into the speech recognizer.
Dim g = New Grammar(gb)

sre.LoadGrammar(g)

End Using

Next, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected:

C#

sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;

Visual Basic

AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognized
AddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesized
AddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejected

Finally, the audio stream source from the Kinect is applied to the speech recognition engine:

C#

using (Stream s = source.Start())
{
    sre.SetInputToAudioStream(s,
                              new SpeechAudioFormatInfo(
                                  EncodingFormat.Pcm, 16000, 16, 1,
                                  32000, 2, null));
    Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    sre.RecognizeAsync(RecognizeMode.Multiple);
    Console.ReadLine();
    Console.WriteLine("Stopping recognizer ...");
    sre.RecognizeAsyncStop();                       
}

Visual Basic

Using s As Stream = source.Start()

sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))

Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop")

sre.RecognizeAsync(RecognizeMode.Multiple)
Console.ReadLine()
Console.WriteLine("Stopping recognizer ...")
sre.RecognizeAsyncStop()

End Using

The event handlers specified earlier display information based on the result of the user's speech being recognized:

C#

static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
{
    Console.WriteLine("\nSpeech Rejected");
    if (e.Result != null)
        DumpRecordedAudio(e.Result.Audio);
}

static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
    Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);
}

static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);
}

private static void DumpRecordedAudio(RecognizedAudio audio)
{
    if (audio == null)
        return;

    int fileId = 0;
    string filename;
    while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))
        fileId++;

    Console.WriteLine("\nWriting file: {0}", filename);
    using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))
        audio.WriteToWaveStream(file);
}

Visual Basic

Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs)

     Console.WriteLine(vbLf & "Speech Rejected")
     If e.Result IsNot Nothing Then
          DumpRecordedAudio(e.Result.Audio)
     End If

End Sub
Private Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs)

     Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)

End Sub
Private Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)

     Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)

End Sub

Private Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio)
     If audio Is Nothing Then
          Return
     End If

     Dim fileId As Integer = 0
     Dim filename As String
     filename = "RetainedAudio_" & fileId & ".wav"
     Do While File.Exists(filename)
          fileId += 1
          filename = "RetainedAudio_" & fileId & ".wav"
     Loop

     Console.WriteLine(vbLf & "Writing file: {0}", filename)
     Using file = New FileStream(filename, System.IO.FileMode.CreateNew)
          audio.WriteToWaveStream(file)
     End Using

End Sub

In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later.

Recap

We've created an application that can record audio for a variable amount of time with Kinect!

Embed

Download

Download this episode

High Quality WMV (123.7 MB)
MP3 (5.4 MB)
Mid Quality WMV (78.1 MB)
High Quality MP4 (215.5 MB)
Low Quality MP4 (29.6 MB)

More episodes in this series

Kinect for Windows SDK Beta 2 Quickstarts

Camera Fundamentals (Beta 2 SDK)

Camera Fundamentals (Beta 2 SDK)

Content Archived 15:36

Kinect for Windows SDK Beta 2 Quickstarts

Working with Depth Data (Beta 2 SDK)

Working with Depth Data (Beta 2 SDK)

Content Archived 16:57

The Discussion

Song

This is awesome!!

Last modified Jun 17, 2011 at 4:40PM
MikeH

Great!

But how about other languages? Like German, French oder Spanish?

Are these supported?

Last modified Jun 18, 2011 at 11:45AM
TheZar

How about a brief code sample of how general dictation might be used? When I try to modify the sample code to add
gb.AddDictation()
It crashes on
sre.LoadGrammar(g)
I've searched high and low for a solution but it appears this is a general issue (that the dictation stuff doesn't work) with the speech API so why is it there? Any help is much appreciated.

Thanks

Last modified Jun 20, 2011 at 1:13PM
George Birbilis

typo: visaul -> visual

Last modified Jun 27, 2011 at 5:53AM
Clint

@George Birbilis: fixed the typo

Last modified Jun 28, 2011 at 4:40PM
Clint

@TheZar: are you using the x86 or x64 speech APIs?

Last modified Jun 28, 2011 at 4:41PM
nnarvaez

Very good the tutorial but, do you have the code for Speech Recognition? Thanks

Last modified Jul 02, 2011 at 9:50AM
Bas

Is there any good resource out there for learning the SRGS XML format? The W3C specification is too.. specificationy, and all the tutorials I've found so far deal with the BNF format rather than the XML format.

Last modified Jul 03, 2011 at 2:29AM
Cleveland

Hi, thanks for sharing us such a good tutorial. But I personally find it is not so difficult to record streaming audio from microphone by standalone audio recorders, not built-in ones.

Last modified Aug 11, 2011 at 2:20AM
Hiva Javaher

Hi,
I'm trying to get both speech recognition and Text to speech to work on a WPF app (C#)
I have the Recognition down but the synthesizer part keeps giving an error of "No voice installed on the system or none available with the current security setting."
I have both "Microsoft Speech Platform - Software Development Kit (SDK) (Version 10.2)" and "Microsoft Speech Platform - Server Runtime (Version 10.2)" in X86 and X64 installed on my system.

Can anyone tell me whats wrong? I would really really appreciate it.

Thanks,
Hiva

Last modified Aug 13, 2011 at 12:04AM
DarkImpetus

I am trying to add speech recognition to a WPF C# app. I am receiving video, skeletal, and depth data correctly, but whenever I start capturing the audio I receive the exception error bellow. I can run the demo above correctly. Is there a reference or an extra step needed when using WPF.

System.InvalidCastException was unhandled
  Message=Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Research.Kinect.Audio.IMediaObject'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{D8AD0F58-5494-4102-97C5-EC798E59BCF4}' failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).
  Source=mscorlib
  StackTrace:
       at System.StubHelpers.StubHelpers.GetCOMIPFromRCW(Object objSrc, IntPtr pCPCMD, Boolean& pfNeedsRelease)
       at Microsoft.Research.Kinect.Audio.IMediaObject.ProcessOutput(Int32 dwFlags, Int32 cOutputBufferCount, DMO_OUTPUT_DATA_BUFFER[] pOutputBuffers, Int32& pdwStatus)
       at Microsoft.Research.Kinect.Audio.KinectAudioStream.RunCapture(Object notused)
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart(Object obj)
  InnerException:
[/code]

Last modified Sep 26, 2011 at 11:59AM
kramer

For some reason i only have the Microsoft Lightweight Speech Recognizer v11.0 (SR_MS_ZXX_Lightweight_v11.0) showing up as an available speech recognizer. I've double-checked that i have everything installed correctly, and i'm referencing the C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll. Any ideas why i don't see the Kinect Recognizer?

Last modified Jan 20, 2012 at 2:07PM