Writing a gesture service with the Kinect for Windows SDK

Article
08/08/2011

After further experimenting with the Kinect SDK, it became obvious what needed to come next. If you were to create an application using the Kinect SDK, you will want to be able to control the application using gestures (i.e. waving, swiping, motions to access menus, etc.). From this, we decided to write a gesture service in c# that would analyse the gestures. This blog post outlines how we did this and how you can implement the same functionality.

To be able to recognize gestures, it is first important to understand what makes a gesture.

We concluded that gestures were made up of parts. Each part of a gesture is a specific movement that, when combined with other gesture parts, makes up the whole gesture. For example the diagram below shows the two parts of a simple wave gesture and how they can be identified:

However this is not quite enough to be able to recognize multiple gestures with any degree of accuracy. The problem occurs when you think about multiple gestures being recognized at once.

It’s not as simple as looking for the next part of the gesture. . For example, consider the wave gesture shown above. If I was to drop my hand between the two parts, then it would still be recognized as a wave as both parts of the gesture were completed in the order they were defined; yet I clearly did not perform a wave. To solve this problem we came up with three results that a gesture part can return when it checks to see if it has been completed or not. The diagram below shows these three results and the impact of returning each of them:

A result of ‘Pausing’ allows the system to identify a movement that does not fulfil the gesture but could be a result of the user moving slowly. In short the three results mean the following:

Fail – The gesture failed. The user moved in a way that was inconsistent with the gesture and as such the gesture will start again at the beginning.
Pausing – The user did not fail the gesture but they did not perform the next part either. The system will check again for this part after a short pause. A result of pausing can only be returned a maximum of 100 times before the gesture will fail and recognition will start again at the beginning.
Succeed – the user performed this part of the gesture. After a short pause the system will start looking for the next part of the gesture.

The Solution

The overall gesture service is made up of three main parts each of which is detailed below:

The Gesture Controller:

The gesture controller is a way of controlling all of the gestures that a user can perform. The code for this can be seen below:

  1: #region using...
  2: using System;
  3: using System.Collections.Generic;
  4: using Microsoft.Research.Kinect.Nui;
  5: #endregion
  6:  
  7: /// <summary>
  8: /// The gesture controller
  9: /// </summary>
  10: public class GestureControler
  11: {
  12:  
  13:     /// <summary>
  14:     /// The list of all gestures we are currently looking for
  15:     /// </summary>
  16:     private List<Gesture> gestures = new List<Gesture>();
  17:  
  18:     /// <summary>
  19:     /// Initializes a new instance of the <see cref="GestureControler"/> class.
  20:     /// </summary>
  21:     public GestureControler()
  22:     {
  23:     }
  24:  
  25:     /// <summary>
  26:     /// Occurs when [gesture recognized].
  27:     /// </summary>
  28:     public event EventHandler<GestureEventArgs> GestureRecognised;
  29:  
  30:     /// <summary>
  31:     /// Updates all gestures.
  32:     /// </summary>
  33:     /// <param name="data">The skeleton data.</param>
  34:     public void UpdateAllGestures(SkeletonData data)
  35:     {
  36:         foreach (Gesture gesture in this.gestures)
  37:         {
  38:             gesture.UpdateGesture(data);
  39:         }
  40:     }
  41:  
  42:     /// <summary>
  43:     /// Adds the gesture.
  44:     /// </summary>
  45:     /// <param name="type">The gesture type.</param>
  46:     /// <param name="gestureDefinition">The gesture definition.</param>
  47:     public void AddGesture(GestureType type, IRelativeGestureSegment[] gestureDefinition)
  48:     {
  49:         Gesture gesture = new Gesture(type, gestureDefinition);
  50:         gesture.GestureRecognised += new EventHandler<GestureEventArgs>(this.Gesture_GestureRecognised);
  51:         this.gestures.Add(gesture);
  52:     }
  53:  
  54:     /// <summary>
  55:     /// Handles the GestureRecognised event of the g control.
  56:     /// </summary>
  57:     /// <param name="sender">The source of the event.</param>
  58:     /// <param name="e">The <see cref="KinectSkeltonTracker.GestureEventArgs"/> instance containing the event data.</param>
  59:     private void Gesture_GestureRecognised(object sender, GestureEventArgs e)
  60:     {
  61:         if (this.GestureRecognised != null)
  62:         {
  63:         this.GestureRecognised(this, e);
  64:         }
  65:  
  66:         foreach (Gesture g in this.gestures)
  67:         {
  68:             g.Reset();
  69:         }
  70:     }
  71: }

A Gesture:

This controls all of the parts of a gesture and which one is currently being checked. It contains an array of IRelativeGestureSegment which are individual implementations of the IRelativeGestureSegment interface (which I will mention later). When a skeleton frame is created it is passed through to each Gesture which then passes it through to the current gesture segment. When the final segment returns a result of ‘Succeed’ it raises a gesture recognized event which is caught by the gesture controller. The code for the Gesture class can be seen below:

  1: #region using...
  2: using System;
  3: using Microsoft.Research.Kinect.Nui;
  4: #endregion
  5:  
  6: <summary>
  7: /// A single gesture
  8: /// </summary>
  9: public class Gesture
  10: {
  11:     /// <summary>
  12:     /// The parts that make up this gesture
  13:     /// </summary>
  14:     private IRelativeGestureSegment[] gestureParts;
  15:  
  16:     /// <summary>
  17:     /// The current gesture part that we are matching against
  18:     /// </summary>
  19:     private int currentGesturePart = 0;
  20:  
  21:     /// <summary>
  22:     /// the number of frames to pause for when a pause is initiated
  23:     /// </summary>
  24:     private int pausedFrameCount = 10;
  25:     
  26:     /// <summary>
  27:     /// The current frame that we are on
  28:     /// </summary>
  29:     private int frameCount = 0;
  30:  
  31:     /// <summary>
  32:     /// Are we paused?
  33:     /// </summary>
  34:     private bool paused = false;
  35:  
  36:     /// <summary>
  37:     /// The type of gesture that this is
  38:     /// </summary>
  39:     private GestureType type;
  40:  
  41:     /// <summary>
  42:     /// Initializes a new instance of the <see cref="Gesture"/> class.
  43:     /// </summary>
  44:     /// <param name="type">The type of gesture.</param>
  45:     /// <param name="gestureParts">The gesture parts.</param>
  46:     public Gesture(GestureType type, IRelativeGestureSegment[] gestureParts)
  47:     {
  48:         this.gestureParts = gestureParts;
  49:         this.type = type;
  50:     }
  51:  
  52:     /// <summary>
  53:     /// Occurs when [gesture recognised].
  54:     /// </summary>
  55:     public event EventHandler<GestureEventArgs> GestureRecognised;
  56:  
  57:     /// <summary>
  58:     /// Updates the gesture.
  59:     /// </summary>
  60:     /// <param name="data">The skeleton data.</param>
  61:     public void UpdateGesture(SkeletonData data)
  62:     {
  63:         if (this.paused)
  64:         {
  65:             if (this.frameCount == this.pausedFrameCount)
  66:             {
  67:                 this.paused = false;
  68:             }
  69:  
  70:             this.frameCount++;
  71:         }
  72:  
  73:         GesturePartResult result = this.gestureParts[this.currentGesturePart].CheckGesture(data);
  74:         if (result == GesturePartResult.Suceed)
  75:         {
  76:             if (this.currentGesturePart + 1 < this.gestureParts.Length)
  77:             {
  78:                 this.currentGesturePart++;
  79:                 this.frameCount = 0;
  80:                 this.pausedFrameCount = 10;
  81:                 this.paused = true;
  82:             }
  83:             else
  84:             {
  85:                 if (this.GestureRecognised != null)
  86:                 {
  87:                     this.GestureRecognised(this, new GestureEventArgs(this.type, data.TrackingID, data.UserIndex));
  88:                     this.Reset();
  89:                 }
  90:             }
  91:         }
  92:         else if (result == GesturePartResult.Fail || this.frameCount == 50)
  93:         {
  94:             this.currentGesturePart = 0;
  95:             this.frameCount = 0;
  96:             this.pausedFrameCount = 5;
  97:             this.paused = true;
  98:         }
  99:         else
  100:         {
  101:             this.frameCount++;
  102:             this.pausedFrameCount = 5;
  103:             this.paused = true;
  104:         }
  105:     }
  106:  
  107:     /// <summary>
  108:     /// Resets this instance.
  109:     /// </summary>
  110:     public void Reset()
  111:     {
  112:         this.currentGesturePart = 0;
  113:         this.frameCount = 0;
  114:         this.pausedFrameCount = 5;
  115:         this.paused = true;
  116:     }
  117: }

The IRelativeGestureSegment:

This is the final part of a gesture. It is essentially the individual segments that make up a gesture. Below is the IRelativeGestureSegment class and the implementations of this class for a wave gesture

  1: #region using...
  2: using Microsoft.Research.Kinect.Nui;
  3: #endregion
  4:  
  5: /// <summary>
  6: /// Defines a single gesture segment which uses relative positioning 
  7: /// of body parts to detect a gesture
  8: /// </summary>
  9: public interface IRelativeGestureSegment
  10: {
  11:     /// <summary>
  12:     /// Checks the gesture.
  13:     /// </summary>
  14:     /// <param name="skeleton">The skeleton.</param>
  15:     /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
  16:     GesturePartResult CheckGesture(SkeletonData skeleton);
  17: }

Wave gesture

  1: #region using...
  2: using Microsoft.Research.Kinect.Nui;
  3: #endregion
  4:  
  5: /// <summary>
  6: /// the first part of the wave left gesture
  7: /// </summary>
  8: public class WaveLeftSegment1 : IRelativeGestureSegment
  9: {
  10:     /// <summary>
  11:     /// Checks the gesture.
  12:     /// </summary>
  13:     /// <param name="skeleton">The skeleton.</param>
  14:     /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
  15:     public GesturePartResult CheckGesture(SkeletonData skeleton)
  16:     {
  17:         // hand above elbow
  18:         if (skeleton.Joints[JointID.HandLeft].Position.Y > skeleton.Joints[JointID.ElbowLeft].Position.Y)
  19:         {
  20:             // hand right of elbow
  21:             if (skeleton.Joints[JointID.HandLeft].Position.X > skeleton.Joints[JointID.ElbowLeft].Position.X)
  22:             {
  23:                 return GesturePartResult.Suceed;
  24:             }
  25:             // hand has not dropped but is not quite where we expect it to be, pausing till next frame
  26:             return GesturePartResult.Pausing;
  27:         }
  28:     
  29:         // hand dropped - no gesture fails
  30:         return GesturePartResult.Fail;
  31:         }
  32:     }
  33:  
  34:     /// <summary>
  35:     /// The second part of the wave left gesture
  36:     /// </summary>
  37:     public class WaveLeftSegment2 : IRelativeGestureSegment
  38:     {
  39:         /// <summary>
  40:         /// Checks the gesture.
  41:         /// </summary>
  42:         /// <param name="skeleton">The skeleton.</param>
  43:         /// <returns>GesturePartResult based on if the gesture part has been completed</returns>
  44:         public GesturePartResult CheckGesture(SkeletonData skeleton)
  45:         {
  46:             // hand above elbow
  47:             if (skeleton.Joints[JointID.HandLeft].Position.Y > skeleton.Joints[JointID.ElbowLeft].Position.Y)
  48:             {
  49:                 // hand right of elbow
  50:                 if (skeleton.Joints[JointID.HandLeft].Position.X < skeleton.Joints[JointID.ElbowLeft].Position.X)
  51:                 {
  52:                     return GesturePartResult.Suceed;
  53:                 }
  54:                 // hand has not dropped but is not quite where we expect it to be, pausing till next frame
  55:                 return GesturePartResult.Pausing;
  56:             }
  57:             // hand dropped - no gesture fails
  58:             return GesturePartResult.Fail;
  59:         }
  60:     }
  61: }

NOTE: a wave gesture is made up of two parts that are repeated three times. For example the code to create a new Wave gesture would look like this (gestures is the gesture controller):

  1: IRelativeGestureSegment[] waveLeftSegments = new IRelativeGestureSegment[6];
  2: WaveLeftSegment1 waveLeftSegment1 = new WaveLeftSegment1();
  3: WaveLeftSegment2 waveLeftSegment2 = new WaveLeftSegment2();
  4: waveLeftSegments[0] = waveLeftSegment1;
  5: waveLeftSegments[1] = waveLeftSegment2;
  6: waveLeftSegments[2] = waveLeftSegment1;
  7: waveLeftSegments[3] = waveLeftSegment2;
  8: waveLeftSegments[4] = waveLeftSegment1;
  9: waveLeftSegments[5] = waveLeftSegment2;
  10: this.gestures.AddGesture(GestureType.WaveLeft, waveLeftSegments);

The full source code for this example (and for skeleton tracking) can be downloaded here. It contains a wave gestures with both hands as well as swipe left, swipe right and a menu gesture. When writing your own gestures it is important to consider the amount of checking that is required and optimize this for each of the parts. Generally smaller segments work better as there is less checking to be done which improves performance.

Written by Michael Tsikkos and James Glading