NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to make Kinect V2 speech recognition as DLL and use it from Unity

2016.09.12: created by

2016.09.30: revised by

2016.10.18: revised by

2017.10.07: revised by

To Table of Contents

Prerequisite knowledge

Creating Dynamic Link Library using Speech Recognition with Kinect V2

"Speech recognition", "multi-threading with Kinect", and "making NtKinect's program as DLL" are explained in
"NtKinect: How to recognize speech with Kinect V2".,
"NtKinect: How to run Kinect V2 in a multi-thread environment",
"NtKinect: How to make Kinect V2 program as DLL and use it from Unity",
respectively.

In this article, we will explain how to create a DLL file for speech recognition.

How to write program

Start using the Visual Studio 2017's project KinectV2_dll.zip of "How to make Kinect V2 program as DLL and use it from Unity" .
In the following explanation, it is assumed that the folder name of the project is changed to "NtKinectDll3".

It should have the following file structure. There may be some folders and files other than those listed, but ignore them now.

NtKinectDll3/NtKinectDll.sln
             x64/Release/
             NtKinectDll/dllmain.cpp
                         NtKiect.h
                         NtKiectDll.h
                         NtKiectDll.cpp
                         stdafx.cpp
                         stdafx.h
                         targetver.h

Replace NtKinect.h to the latest version.
Add files necessary for speech recognition to the project.

Copy KinectAudioStream.cpp , KinectAudioStream.h , and WaveFile.h to the folder where the project source files are located , NtKinectDll3/NtKinectDll/ in this example.

The list of the folder is as follows.

NtKinectDll3/NtKinectDll.sln
             x64/Release/
             NtKinectDll/dllmain.cpp
                         NtKiect.h
                         NtKiectDll.h
                         NtKiectDll.cpp
                         stdafx.cpp
                         stdafx.h
                         targetver.h
                         KinectAudioStream.cpp
                         KinectAudioStream.h
                         WaveFile.h

Click NtKinectDll.sln to open the project in Visual Studio 2017.
Add the previously copied files to the project.

Add KinectAudioStream.cpp to the project's "Source Files".

"Solution Explorer" -> Right-click over the "Source Files" -> "Add" -> "Add existing item" -> select "KinectAudioStream.cpp"

Add KinectAudioStream.h and WaveFile.h to the project's "header file".

"Solution Explorer"-> Right-click over the "Header File" -> "Add" -> "Add existing item" -> select "KinectAudioStream.h"

"Solution Explorer" -> Right-click over "header file" -> 「追加」 -> 「既存の項目の追加」 -> WaveFile.h を選択する

Three files have been added to the project. NtKinect.h has been added to the project from the beginning.

The upper menu of the Visual Studio window should be "x64". Since speed is necessary this time, select build option "Release".

Configure settings of project's properties for include files and libraries. Since the option changes depending on whether the build is "Release" or "Debug", please make sure that it is properly set.

In the Solution Explorer drag right over the project name and select "Properties".

Make settings at the state of Configuration: "All Configuration", Platform: "Acvive (x64)". By doing this, you can configure "Debug" and "Release" mode at the same time. Of course, you can change the settings separately.
Add the location of the include file. Even if a header file with the same name exists in another place, make sure to load the correct file.

"Configuration Properties" -> "VC++ directory" -> "Include directory" -> Add

  $(ProgramW6432)\Microsoft SDKs\Speech\V11.0\Include

Add the location of the library.

"Configuration Properties" -> "VC++ directory" -> "Library directory" -> Add

  $(ProgramW6432)\Microsoft SDKs\Speech\V11.0\Lib

Add the library sap.lib just in case. (It seems to work without adding sapi.lib here.)

"Configuration Properties" -> "Linker" -> "General" -> "Input" -> Add.

    sapi.lib

The following two libraries should already be set up, but please confirm.

  Kinect20.lib
  opencv_world310.lib

Describe declarations in header file. The name of header file is "ProjectName.h", in this case it will be "NtKinectDll.h".

The green letter part is related to import/export of the DLL defined since the project was created. When NtKinectDll.h is loaded, it becomes a declaration for export within this project and it becomes a declaration for import in other projects.

The blue letter part is related to the functions defined by ourselves. It is called mangling that the function name is changed by the C++ compiler to the name including return value type and argument type. In forder to avoid mangling c++ function names, declare function prototype in extern "C" {}. This makes it possible to use this DLL from other languages.

In order to avoid name conflicts, we define a namespace NtKinectSpeech and declare the function prototypes and variables.

Speech recognition is always done in another thread, and the recognized result is saved in the speechQueue . When accessing the queue, mutex is used for exclusive control of the thread.

NtKinectDll.h

#ifdef NTKINECTDLL_EXPORTS
#define NTKINECTDLL_API __declspec(dllexport)
#else
#define NTKINECTDLL_API __declspec(dllimport)
#endif

#include <mutex>
#include <list>
#include <thread>

namespace NtKinectSpeech {
  extern "C" {
    NTKINECTDLL_API void* getKinect(void);
    NTKINECTDLL_API void initSpeech(void* kinect);
    NTKINECTDLL_API void setSpeechLang(void* kinect,wchar_t*,wchar_t*);
    NTKINECTDLL_API int speechQueueSize(void* kinect);
    NTKINECTDLL_API int getSpeech(void* kinect,wchar_t*& tagPtr,wchar_t*& itemPtr);
    NTKINECTDLL_API void destroySpeech(void* kinect);
  }
  std::mutex mutex;
  std::thread* speechThread;
  std::list<std::pair<std::wstring,std::wstring>> speechQueue;
  bool speechActive;
#define SPEECH_MAX_LENGTH	1024
  wchar_t tagBuffer[SPEECH_MAX_LENGTH];
  wchar_t itemBuffer[SPEECH_MAX_LENGTH];
}

The function is described in "ProjectName.cpp". In this example it will be "NtKinectDll.cpp".

"NTKINECTDLL_API" must be written at the beginning of the function declaration. This is a macro defined in NtKinectDll.h to facilitate export/import from DLL.

In the DLL, the object must be allocated in the heap. For this reason, the void *getKinect() function allocates NtKinect instance in the heap memory and returns the casted pointer to it.

When executing a function of the DLL, the pointer to the NtKinect Object is given as an argument of type (void *). We cast it to a pointer of (NtKinect *) type and use NtKinect's function via it. For example, access to a member function "acquire()" is described as (*kinect).acquire().

On Unity (C#) side, the data is managed and may be moved by Gabage Collector. Be careful to exchange data between C# and C++.

The first argument's of NtKinect's setSpeechLang() function is string type, and the second argument is wstrnig type. When a string of Unity (C#) is passed to the DLL (C++) function, its type is WideCharacter (UTF16) and it is used as wstring (UTF16) directly in DLL (C++) side. When you need string type data, you must convert it from wstring (UTF16) to strnig (UTF8) in C++ using WideCharToMultiByte() function. In the definition of void setSpeechLang(void*, wchar_t*, wchar_t*) function in NtKinectDll.cpp, it findsthe number of bytes of the converted characters by the first WideCharToMultiByte() call, convert the characters from UTF16 to UTF8 and write it in langBuffer by the second call.

When passing C++ wstring data to Unity (C#), you need to allocate the area of w_char data on the heap and pass the address to the area. From the getSpeech() function, two wstring, tag and item of the recognized word will be returned. C# will pass the two address reference as arguments to the function, and to the reference C++ write the address to the w_char area in heap memory.

NtKinectDll.cpp

#include "stdafx.h"
#include "NtKinectDll.h"

#define USE_THREAD
#define USE_SPEECH
#include "NtKinect.h"

using namespace std;

namespace NtKinectSpeech {
  NTKINECTDLL_API void* getKinect(void) {
    NtKinect* kinect = new NtKinect;
    return static_cast<void*>(kinect);
  }

  NTKINECTDLL_API void setSpeechLang(void* ptr, wchar_t* wlang, wchar_t* grxmlBuffer) {
    NtKinect *kinect = static_cast<NtKinect*>(ptr);
    if (wlang && grxmlBuffer) {
      int len = WideCharToMultiByte(CP_UTF8,NULL,wlang,-1,NULL,0,NULL,NULL) + 1;
      char* langBuffer = new char[len];
      memset(langBuffer,'\0',len);
      WideCharToMultiByte(CP_UTF8,NULL,wlang,-1,langBuffer,len,NULL,NULL);
      string lang(langBuffer);
      wstring grxml(grxmlBuffer);
      (*kinect).acquire();
      (*kinect).setSpeechLang(lang,grxml);
      (*kinect).release();
    }
  }

  void speechThreadFunc(NtKinect* kinect) {
    ERROR_CHECK(CoInitializeEx(NULL,COINIT_MULTITHREADED));
    (*kinect).acquire();
    (*kinect).startSpeech();
    (*kinect).release();
    while (speechActive) {
      pair<wstring,wstring> p;
      bool flag = (*kinect)._setSpeech(p);
      if (flag) {
        mutex.lock();
        speechQueue.push_back(p);
        mutex.unlock();
      }
      std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }
  }
  
  NTKINECTDLL_API void initSpeech(void* ptr) {
    NtKinect *kinect = static_cast<NtKinect*>(ptr);
    speechActive = true;
    speechThread = new std::thread(NtKinectSpeech::speechThreadFunc, kinect);
    return;
  }
  
  NTKINECTDLL_API int speechQueueSize(void* ptr) {
    int n=0;
    mutex.lock();
    n = (int) speechQueue.size();
    mutex.unlock();
    return n;
  }
  
  NTKINECTDLL_API int getSpeech(void* ptr, wchar_t*& tagPtr, wchar_t*& itemPtr) {
    NtKinect *kinect = static_cast<NtKinect*>(ptr);
    wmemset(tagBuffer,'\0',SPEECH_MAX_LENGTH);
    wmemset(itemBuffer,'\0',SPEECH_MAX_LENGTH);
    pair<wstring,wstring> p;
    mutex.lock();
    bool empty = speechQueue.empty();
    if (! empty) {
      p = speechQueue.front();
      speechQueue.pop_front();
    }
    mutex.unlock();
    if (!empty) {
      wsprintf(tagBuffer,L"%ls",p.first);
      wsprintf(itemBuffer,L"%ls",p.second);
    }
    tagPtr = tagBuffer;
    itemPtr = itemBuffer;
    return !empty;
  }
  
  NTKINECTDLL_API void destroySpeech(void* ptr) {
    speechActive = false;
    speechThread->join();
    NtKinect *kinect = static_cast<NtKinect*>(ptr);
    (*kinect).acquire();
    (*kinect).stopSpeech();
    (*kinect).release();
    delete speechThread;
    CoUninitialize();
  }
}

In "Release" and "x64" mode, select "Build" -> "Rebuild NtKinectDll" to generate a DLL file. NtKinectDll.lib and NtKinectDll.dll are generated in folder x64/Release.

[Caution](Oct/07/2017 added) If you encounter "dllimport ..." error when building with Visual Studio 2017 Update 2, please refer to here and deal with it to define NTKINECTDLL_EXPORTS in NtKinectDll.cpp.

Please click here for this sample project NtKinectDll3.zip。

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

Use NtKinect DLL from Unity

Let's write a Unity program using the above DLL file to recognize speech with Kinect V2.

For more details on how to use DLL with Unity, please see the official manual .
The data of Unity(C#) is managed which means that the data may be moved by the Gabage Collector of C#. And the data of DLL (C++) is unmanged which means that the data is note moved. In order to pass data between C# and C++, it is necessary to convert the data state. To do this, the functions of System.Runtime.InteropServices.Marshal Class in C# can be used.
For details on how to pass data between different languages, refer to the " Interop Marshaling " section of the " Interoperating with Unmanaged Code " at MSDN.
Start a new Unity project.

Copy NtKinectDll.dll to the project folder Assets/Plugins/x86_64/ .

The sapi.lib is used for speech recognition, and is usually at the folder "C:\Program Files\Microsoft SDKs\Speech\v11.0\Lib". It seems that the library only has to be installed in the Windows10 environment, and there is no need to put a copy under the Unity's individual project.

Place a "Cube" in the scene.

From the menu at the top, "Game Object"-> "3D Object" -> "Cube"

Create a new C# script at "Assets/Scripts/" in the project.

From the menu at the top, "Assets" -> "Create" -> "C# Script" -> Filename is CubeBehaviour

C++ pointers are treated as System.IntPtr in C#.

The string of Unity (C#) is managed, and the string and wstring of DLL (C++) are unmanaged. When passing Unity (C#) string as an argument to a function of DLL (C++), convert it to unmanaged UTF16 string on heap memory using Marshal.StringToHGlobalUni() function. This operation eliminates the danger that data may be moved by the garbase collector of C#. When the unmanaged data becomes unnecessary, you must call Marshal. FreeHGlobal() function to free the memory.

In order to return two strings (UTF16) recognized on C++ side to C# side, reference to pointer with ref declaration is passed as arguments. Since the memory reference allocated by C++ side is returned, C# side immediately converts it to the managed string with Marshal. PtrToStringUni() function.

The character encoding of the string (= System.String) of Unity is UTF16 .

CubeBehaviour.cs

using UnityEngine;
using System.Collections;
using System;
using System.Runtime.InteropServices;

public class CubeBehaviour : MonoBehaviour {
    [DllImport ("NtKinectDll")] private static extern IntPtr getKinect();
    [DllImport ("NtKinectDll")] private static extern void initSpeech(IntPtr kinect);
    [DllImport ("NtKinectDll")] private static extern void setSpeechLang(IntPtr kinect,IntPtr lang,IntPtr grxml);
    [DllImport ("NtKinectDll")] private static extern int getSpeech(IntPtr kinect,ref IntPtr tagPtr,ref IntPtr itemPtr);
    [DllImport ("NtKinectDll")] private static extern void destroySpeech(IntPtr kinect);
    private IntPtr kinect;
    
    void Start () {
	kinect = getKinect();
	IntPtr lang = Marshal.StringToHGlobalUni("ja-JP"); // "en-US"
	IntPtr grxml = Marshal.StringToHGlobalUni("Grammar_jaJP.grxml"); // "Grammar_enUS.grxml"
	setSpeechLang(kinect,lang,grxml);
	initSpeech(kinect);
	Marshal.FreeHGlobal(lang);
	Marshal.FreeHGlobal(grxml);
    }
    
    void Update () {
	IntPtr tagPtr = (IntPtr)0;
	IntPtr itemPtr = (IntPtr)0;
	int flag = getSpeech(kinect,ref tagPtr,ref itemPtr);
	string speechTag = Marshal.PtrToStringUni(tagPtr);
	string speechItem = Marshal.PtrToStringUni(itemPtr);
	if (flag > 0) {
	    Debug.Log("tag = "+speechTag);
	    Debug.Log("item = "+speechItem);
	}
	if (flag>0 && speechTag.CompareTo("RED")==0) {
	    gameObject.GetComponent<Renderer>().material.color = new Color(1.0f, 0.0f, 0.0f, 1.0f);
	} else if (flag>0 && speechTag.CompareTo("GREEN")==0) {
	    gameObject.GetComponent<Renderer>().material.color = new Color(0.0f, 1.0f, 0.0f, 1.0f);
	} else if (flag>0 && speechTag.CompareTo("BLUE")==0) {
	    gameObject.GetComponent<Renderer>().material.color = new Color(0.0f, 0.0f, 1.0f, 1.0f);
	} else if (flag>0 && speechTag.CompareTo("EXIT")==0) {
	    Application.Quit();
	    UnityEditor.EditorApplication.isPlaying = false;
	}
    }
    void OnApplicationQuit() {
	destroySpeech(kinect);
    }
}

Drag "CubeBehaviour.cs" in Project panel onto Cube object in the Hierarchy panel and add it as a Component.

Copy the word definition file ".grxml" to the folder directly under the Unity project.

You only need one word definition file for speech recognition, but put both Japanese file "Grammar_jaJP.grxml" and English file "Grammar_enUS.grxml" under the Unity's project , so that you can experiment with switchng languages later.

Grammar_jaJP.grxml

<?xml version="1.0" encoding="utf-8" ?>
<grammar version="1.0" xml:lang="ja-JP" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
  <rule id="rootRule">
    <one-of>
      <item>
        <tag>RED</tag>
        <one-of>
          <item> 赤 </item>
          <item> 赤色 </item>
        </one-of>
      </item>
      <item>
        <tag>GREEN</tag>
        <one-of>
          <item> 緑 </item>
          <item> 緑色 </item>
        </one-of>
      </item>
      <item>
        <tag>BLUE</tag>
        <one-of>
          <item> 青 </item>
          <item> 青色 </item>
        </one-of>
      </item>
      <item>
        <tag>EXIT</tag>
        <one-of>
          <item> 終わり </item>
          <item> 終了 </item>
        </one-of>
      </item>
    </one-of>
  </rule>
</grammar>

Grammar_enUS.grxml

<?xml version="1.0" encoding="utf-8" ?>
<grammar version="1.0" xml:lang="en-US" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
  <rule id="rootRule">
    <one-of>
      <item>
        <tag>RED</tag>
        <one-of>
          <item> Red </item>
        </one-of>
      </item>
      <item>
        <tag>GREEN</tag>
        <one-of>
          <item> Green </item>
        </one-of>
      </item>
      <item>
        <tag>BLUE</tag>
        <one-of>
          <item> Blue </item>
        </one-of>
      </item>
      <item>
        <tag>EXIT</tag>
        <one-of>
          <item> Exit </item>
          <item> Quit </item>
          <item> Stop </item>
        </one-of>
      </item>
    </one-of>
  </rule>
</grammar>

When you run the program, the color of the Cube changes to red, blue and green by speech recognition of 8 japanese words like "AKA", "AKAIRO", "AO", "AOIRO", "MIDORI", "MIDORIIRO", "SHUURYOU", "OWARI". Each word means "red", "red color", "blue", "blue color", "green", and "green color". If it recognize the word "SHUURYOU" or "OWARI" which means "end" in English, the program ends.

[Notice] We generates an OpenCV window in DLL to display the skeleton recognition state. Note that when the OpenCV window is focused, that is, when the Unity game window is not focused, the screen of Unity will not change. Click on the top of Unity's window and make it focused, then try the program's behaviour.

Let's change it to recognize English and run it.

Edit CubeBehaviour.cs and change as the following.

from	to
"ja-JP"	"en-US"
"Grammar_jaJP.grxml"	"Grammar_enUS.grxml"

Let's increase the description of the word definition file so that the program recognize more words.
Please click here for this Unity sample project CheckNtKinectDll3.zip

http://nw.tsuda.ac.jp/