After further experimenting with the Kinect SDK, it became obvious what needed to come next. If you were to create an application using the Kinect SDK, you will want to be able to control the application using gestures (i.e. waving, swiping, motions to access menus, etc.). From this, we decided to write a gesture service in c# that would analyse the gestures. This blog post outlines how we did this and how you can implement the same functionality.
To be able to recognize gestures, it is first important to understand what makes a gesture.
We concluded that gestures were made up of parts. Each part of a gesture is a specific movement that, when combined with other gesture parts, makes up the whole gesture. For example the diagram below shows the two parts of a simple wave gesture and how they can be identified:
However this is not quite enough to be able to recognize multiple gestures with any degree of accuracy. The problem occurs when you think about multiple gestures being recognized at once.
It’s not as simple as looking for the next part of the gesture. . For example, consider the wave gesture shown above. If I was to drop my hand between the two parts, then it would still be recognized as a wave as both parts of the gesture were completedin the order they were defined; yet I clearly did not perform a wave. To solve this problem we came up with three results that a gesture part can return when it checks to see if it has been completed or not. The diagram below shows these three results and the impact of returning each of them:
A result of ‘Pausing’ allows the system to identify a movement that does not fulfil the gesture but could be a result of the user moving slowly. In short the three results mean the following:
- Fail – The gesture failed. The user moved in a way that was inconsistent with the gesture and as such the gesture will start again at the beginning.
- Pausing – The user did not fail the gesture but they did not perform the next part either. The system will check again for this part after a short pause. A result of pausing can only be returned a maximum of 100 times before the gesture will fail and recognition will start again at the beginning.
- Succeed – the user performed this part of the gesture. After a short pause the system will start looking for the next part of the gesture.
The overall gesture service is made up of three main parts each of which is detailed below:
The Gesture Controller:
The gesture controller is a way of controlling all of the gestures that a user can perform. The code for this can be seen below:
This controls all of the parts of a gesture and which one is currently being checked. It contains an array of IRelativeGestureSegment which are individual implementations of the IRelativeGestureSegment interface (which I will mention later). When a skeleton frame is created it is passed through to each Gesture which then passes it through to the current gesture segment. When the final segment returns a result of ‘Succeed’ it raises a gesture recognized event which is caught by the gesture controller. The code for the Gesture class can be seen below:
This is the final part of a gesture. It is essentially the individual segments that make up a gesture. Below is the IRelativeGestureSegment class and the implementations of this class for a wave gesture
NOTE: a wave gesture is made up of two parts that are repeated three times. For example the code to create a new Wave gesture would look like this (gestures is the gesture controller):
The full source code for this example (and for skeleton tracking) can be downloaded here. It contains a wave gestures with both hands as well as swipe left, swipe right and a menu gesture. When writing your own gestures it is important to consider the amount of checking that is required and optimize this for each of the parts. Generally smaller segments work better as there is less checking to be done which improves performance.