Writing a gesture service with the Kinect for Windows SDK


After further experimenting with the Kinect SDK, it became obvious what needed to come next. If you were to create an application using the Kinect SDK, you will want to be able to control the application using gestures (i.e. waving, swiping, motions to access menus, etc.). From this, we decided to write a gesture service in c# that would analyse the gestures. This blog post outlines how we did this and how you can implement the same functionality.

To be able to recognize gestures, it is first important to understand what makes a gesture.

We concluded that gestures were made up of parts. Each part of a gesture is a specific movement that, when combined with other gesture parts, makes up the whole gesture. For example the diagram below shows the two parts of a simple wave gesture and how they can be identified:

Untitled

However this is not quite enough to be able to recognize multiple gestures with any degree of accuracy. The problem occurs when you think about multiple gestures being recognized at once.

It’s not as simple as looking for the next part of the gesture. . For example, consider the wave gesture shown above. If I was to drop my hand between the two parts, then it would still be recognized as a wave as both parts of the gesture were completed in the order they were defined; yet I clearly did not perform a wave. To solve this problem we came up with three results that a gesture part can return when it checks to see if it has been completed or not. The diagram below shows these three results and the impact of returning each of them:

Untitled1

A result of ‘Pausing’ allows the system to identify a movement that does not fulfil the gesture but could be a result of the user moving slowly. In short the three results mean the following:

  • Fail – The gesture failed. The user moved in a way that was inconsistent with the gesture and as such the gesture will start again at the beginning.
  • Pausing – The user did not fail the gesture but they did not perform the next part either. The system will check again for this part after a short pause. A result of pausing can only be returned a maximum of 100 times before the gesture will fail and recognition will start again at the beginning.
  • Succeed – the user performed this part of the gesture. After a short pause the system will start looking for the next part of the gesture.
The Solution

The overall gesture service is made up of three main parts each of which is detailed below:

Untitled2

The Gesture Controller:

The gesture controller is a way of controlling all of the gestures that a user can perform. The code for this can be seen below:

 1: #region using...
 2: using System;
 3: using System.Collections.Generic;
 4: using Microsoft.Research.Kinect.Nui;
 5: #endregion
 6:  
 7: /// <summary>
 8: /// The gesture controller
 9: /// </summary>
 10: public class GestureControler
 11: {
 12:  
 13:     /// <summary>
 14:     /// The list of all gestures we are currently looking for
 15:     /// </summary>
 16:     private List<Gesture> gestures = new List<Gesture>();
 17:  
 18:     /// <summary>
 19:     /// Initializes a new instance of the <see cref="GestureControler"/> class.
 20:     /// </summary>
 21:     public GestureControler()
 22:     {
 23:     }
 24:  
 25:     /// <summary>
 26:     /// Occurs when [gesture recognized].
 27:     /// </summary>
 28:     public event EventHandler<GestureEventArgs> GestureRecognised;
 29:  
 30:     /// <summary>
 31:     /// Updates all gestures.
 32:     /// </summary>
 33:     /// <param name="data">The skeleton data.</param>
 34:     public void UpdateAllGestures(SkeletonData data)
 35:     {
 36:         foreach (Gesture gesture in this.gestures)
 37:         {
 38:             gesture.UpdateGesture(data);
 39:         }
 40:     }
 41:  
 42:     /// <summary>
 43:     /// Adds the gesture.
 44:     /// </summary>
 45:     /// <param name="type">The gesture type.</param>
 46:     /// <param name="gestureDefinition">The gesture definition.</param>
 47:     public void AddGesture(GestureType type, IRelativeGestureSegment[] gestureDefinition)
 48:     {
 49:         Gesture gesture = new Gesture(type, gestureDefinition);
 50:         gesture.GestureRecognised += new EventHandler<GestureEventArgs>(this.Gesture_GestureRecognised);
 51:         this.gestures.Add(gesture);
 52:     }
 53:  
 54:     /// <summary>
 55:     /// Handles the GestureRecognised event of the g control.
 56:     /// </summary>
 57:     /// <param name="sender">The source of the event.</param>
 58:     /// <param name="e">The <see cref="KinectSkeltonTracker.GestureEventArgs"/> instance containing the event data.</param>
 59:     private void Gesture_GestureRecognised(object sender, GestureEventArgs e)
 60:     {
 61:         if (this.GestureRecognised != null)
 62:         {
 63:         this.GestureRecognised(this, e);
 64:         }
 65:  
 66:         foreach (Gesture g in this.gestures)
 67:         {
 68:             g.Reset();
 69:         }
 70:     }
 71: }
A Gesture:

This controls all of the parts of a gesture and which one is currently being checked. It contains an array of IRelativeGestureSegment which are individual implementations of the IRelativeGestureSegment interface (which I will mention later). When a skeleton frame is created it is passed through to each Gesture which then passes it through to the current gesture segment. When the final segment returns a result of ‘Succeed’ it raises a gesture recognized event which is caught by the gesture controller. The code for the Gesture class can be seen below:

 1: #region using...
 2: using System;
 3: using Microsoft.Research.Kinect.Nui;
 4: #endregion
 5:  
 6: <summary>
 7: /// A single gesture
 8: /// </summary>
 9: public class Gesture
 10: {
 11:     /// <summary>
 12:     /// The parts that make up this gesture
 13:     /// </summary>
 14:     private IRelativeGestureSegment[] gestureParts;
 15:  
 16:     /// <summary>
 17:     /// The current gesture part that we are matching against
 18:     /// </summary>
 19:     private int currentGesturePart = 0;
 20:  
 21:     /// <summary>
 22:     /// the number of frames to pause for when a pause is initiated
 23:     /// </summary>
 24:     private int pausedFrameCount = 10;
 25:     
 26:     /// <summary>
 27:     /// The current frame that we are on
 28:     /// </summary>
 29:     private int frameCount = 0;
 30:  
 31:     /// <summary>
 32:     /// Are we paused?
 33:     /// </summary>
 34:     private bool paused = false;
 35:  
 36:     /// <summary>
 37:     /// The type of gesture that this is
 38:     /// </summary>
 39:     private GestureType type;
 40:  
 41:     /// <summary>
 42:     /// Initializes a new instance of the <see cref="Gesture"/> class.
 43:     /// </summary>
 44:     /// <param name="type">The type of gesture.</param>
 45:     /// <param name="gestureParts">The gesture parts.</param>
 46:     public Gesture(GestureType type, IRelativeGestureSegment[] gestureParts)
 47:     {
 48:         this.gestureParts = gestureParts;
 49:         this.type = type;
 50:     }
 51:  
 52:     /// <summary>
 53:     /// Occurs when [gesture recognised].
 54:     /// </summary>
 55:     public event EventHandler<GestureEventArgs> GestureRecognised;
 56:  
 57:     /// <summary>
 58:     /// Updates the gesture.
 59:     /// </summary>
 60:     /// <param name="data">The skeleton data.</param>
 61:     public void UpdateGesture(SkeletonData data)
 62:     {
 63:         if (this.paused)
 64:         {
 65:             if (this.frameCount == this.pausedFrameCount)
 66:             {
 67:                 this.paused = false;
 68:             }
 69:  
 70:             this.frameCount++;
 71:         }
 72:  
 73:         GesturePartResult result = this.gestureParts[this.currentGesturePart].CheckGesture(data);
 74:         if (result == GesturePartResult.Suceed)
 75:         {
 76:             if (this.currentGesturePart + 1 < this.gestureParts.Length)
 77:             {
 78:                 this.currentGesturePart++;
 79:                 this.frameCount = 0;
 80:                 this.pausedFrameCount = 10;
 81:                 this.paused = true;
 82:             }
 83:             else
 84:             {
 85:                 if (this.GestureRecognised != null)
 86:                 {
 87:                     this.GestureRecognised(this, new GestureEventArgs(this.type, data.TrackingID, data.UserIndex));
 88:                     this.Reset();
 89:                 }
 90:             }
 91:         }
 92:         else if (result == GesturePartResult.Fail || this.frameCount == 50)
 93:         {
 94:             this.currentGesturePart = 0;
 95:             this.frameCount = 0;
 96:             this.pausedFrameCount = 5;
 97:             this.paused = true;
 98:         }
 99:         else
 100:         {
 101:             this.frameCount++;
 102:             this.pausedFrameCount = 5;
 103:             this.paused = true;
 104:         }
 105:     }
 106:  
 107:     /// <summary>
 108:     /// Resets this instance.
 109:     /// </summary>
 110:     public void Reset()
 111:     {
 112:         this.currentGesturePart = 0;
 113:         this.frameCount = 0;
 114:         this.pausedFrameCount = 5;
 115:         this.paused = true;
 116:     }
 117: }
The IRelativeGestureSegment:

This is the final part of a gesture. It is essentially the individual segments that make up a gesture. Below is the IRelativeGestureSegment class and the implementations of this class for a wave gesture

 1: #region using...
 2: using Microsoft.Research.Kinect.Nui;
 3: #endregion
 4:  
 5: /// <summary>
 6: /// Defines a single gesture segment which uses relative positioning 
 7: /// of body parts to detect a gesture
 8: /// </summary>
 9: public interface IRelativeGestureSegment
 10: {
 11:     /// <summary>
 12:     /// Checks the gesture.
 13:     /// </summary>
 14:     /// <param name="skeleton">The skeleton.</param>
 15:     /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
 16:     GesturePartResult CheckGesture(SkeletonData skeleton);
 17: }
 
Wave gesture
 1: #region using...
 2: using Microsoft.Research.Kinect.Nui;
 3: #endregion
 4:  
 5: /// <summary>
 6: /// the first part of the wave left gesture
 7: /// </summary>
 8: public class WaveLeftSegment1 : IRelativeGestureSegment
 9: {
 10:     /// <summary>
 11:     /// Checks the gesture.
 12:     /// </summary>
 13:     /// <param name="skeleton">The skeleton.</param>
 14:     /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
 15:     public GesturePartResult CheckGesture(SkeletonData skeleton)
 16:     {
 17:         // hand above elbow
 18:         if (skeleton.Joints[JointID.HandLeft].Position.Y > skeleton.Joints[JointID.ElbowLeft].Position.Y)
 19:         {
 20:             // hand right of elbow
 21:             if (skeleton.Joints[JointID.HandLeft].Position.X > skeleton.Joints[JointID.ElbowLeft].Position.X)
 22:             {
 23:                 return GesturePartResult.Suceed;
 24:             }
 25:             // hand has not dropped but is not quite where we expect it to be, pausing till next frame
 26:             return GesturePartResult.Pausing;
 27:         }
 28:     
 29:         // hand dropped - no gesture fails
 30:         return GesturePartResult.Fail;
 31:         }
 32:     }
 33:  
 34:     /// <summary>
 35:     /// The second part of the wave left gesture
 36:     /// </summary>
 37:     public class WaveLeftSegment2 : IRelativeGestureSegment
 38:     {
 39:         /// <summary>
 40:         /// Checks the gesture.
 41:         /// </summary>
 42:         /// <param name="skeleton">The skeleton.</param>
 43:         /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
 44:         public GesturePartResult CheckGesture(SkeletonData skeleton)
 45:         {
 46:             // hand above elbow
 47:             if (skeleton.Joints[JointID.HandLeft].Position.Y > skeleton.Joints[JointID.ElbowLeft].Position.Y)
 48:             {
 49:                 // hand right of elbow
 50:                 if (skeleton.Joints[JointID.HandLeft].Position.X < skeleton.Joints[JointID.ElbowLeft].Position.X)
 51:                 {
 52:                     return GesturePartResult.Suceed;
 53:                 }
 54:                 // hand has not dropped but is not quite where we expect it to be, pausing till next frame
 55:                 return GesturePartResult.Pausing;
 56:             }
 57:             // hand dropped - no gesture fails
 58:             return GesturePartResult.Fail;
 59:         }
 60:     }
 61: }

NOTE: a wave gesture is made up of two parts that are repeated three times. For example the code to create a new Wave gesture would look like this (gestures is the gesture controller):

 1: IRelativeGestureSegment[] waveLeftSegments = new IRelativeGestureSegment[6];
 2: WaveLeftSegment1 waveLeftSegment1 = new WaveLeftSegment1();
 3: WaveLeftSegment2 waveLeftSegment2 = new WaveLeftSegment2();
 4: waveLeftSegments[0] = waveLeftSegment1;
 5: waveLeftSegments[1] = waveLeftSegment2;
 6: waveLeftSegments[2] = waveLeftSegment1;
 7: waveLeftSegments[3] = waveLeftSegment2;
 8: waveLeftSegments[4] = waveLeftSegment1;
 9: waveLeftSegments[5] = waveLeftSegment2;
 10: this.gestures.AddGesture(GestureType.WaveLeft, waveLeftSegments);

The full source code for this example (and for skeleton tracking) can be downloaded here. It contains a wave gestures with both hands as well as swipe left, swipe right and a menu gesture. When writing your own gestures it is important to consider the amount of checking that is required and optimize this for each of the parts. Generally smaller segments work better as there is less checking to be done which improves performance.

Written by Michael Tsikkos and James Glading

Comments (10)

  1. Tony says:

    I am going to try this tonight from the code you put in-line, but it would be great if I could find the link to the full source that you mentioned.  I am looking forward to getting some gestures working!  Thanks!

  2. rON says:

    pls share the code

  3. Paul says:

    Carl Franklin (from .NET Rocks) has written a gesture recording application and recognition API for .NET. http://gesturepak.com. No code. You, the developer, pose to create the gestures. $99. Looks good.

  4. Nicholas Pappas says:

    I updated this example to work with v1.5 of the SDK and have moved the source into a more re-usable library structure.  I also wrote a short blog post on how to use the gesture library.

    The blog post is at the link below; which contains a link to the new library and demo application:

    blog.exceptontuesdays.com/…/gestures-with-microsoft-kinect-for-windows-sdk-v1-5

  5. Bharat Sharma says:

    Hi Nicholas,

    Thanks for this example. It is very helpful. I am trying to implement push gesture with both hands. Pushing both your hands in forward direction. Can you please provide some guidance on this?

  6. Evil Closet Monkey says:

    Hi Bharat,

    Apologies for not responding sooner — I don't come back here very often.  Hopefully you haven't given up looking here!

    Are you wanting a gesture, or a real-time action?  A gesture (in this library context) is an discrete action that is executed on after it is complete, while a real-time gesture would dynamically update as the motion continues (e.g., pinch to zoom).  This library only supports the discrete gestures, and you would need to write your own tracking algorithm to do a real-time gesture.

    The real-time stuff isn't hard.  I've written one that does what you're wanting.

    If you are still needing help, and hopefully have come back here, I would suggest visiting a site like "www.stackoverflow.com" to ask your question.  I frequent that site and watch the "kinect" tag, as do others who can help.  I don't normally come back here. 🙂

  7. Evil Closet Monkey says:

    Please be aware that my blog post about updating this library to v1.5+ has moved:

    http://www.exceptontuesdays.com

  8. Siva says:

    Any one have any idea of how a jump detection could be implemented using this concept?

  9. Ezu says:

    very good tutorial that help me in the process of working with Kinect sensor for robotic applications. Also, I add a link to this tutorial into an article with many more tutorial related to Kinect sensor http://www.intorobotics.com/working-with-kinect-3d-sensor-in-robotics-setup-tutorials-applications

  10. Kumar Gaurav says:

    Thanks man! this was really helpful 🙂