Mysteries of Kinect for Windows Face Tracking output explained

Article
01/31/2014

Since the release of Kinect for Windows version 1.5, developers have been able to use the Face Tracking software development kit (SDK) to create applications that can track human faces in real time. Figure 1, an illustration from the Face Tracking documentation, displays 87 of the points used to track the face. Thirteen points are not illustrated here—more on those points later.

Figure 1: Tracked points

You have questions...

Based on feedback we received via comments and forum posts, it is clear there is some confusion regarding the face tracking points and the data values found when using the SDK sample code. The managed sample, FaceTrackingBasics-WPF, demonstrates how to visualize mesh data by displaying a 3D model representation on top of the color camera image.

Figure 2: Screenshot from FaceTrackingBasics-WPF

By exploring this sample source code, you will find a set of helper functions under the Microsoft.Kinect.Toolkit.FaceTracking project, in particular GetProjected3DShape(). What many have found was the function returned a collection where the length of the array was 121 values. Additionally, some have also found an enum list, called “FeaturePoint”, that includes 70 items.

We have answers...

As you can see, we have two main sets of numbers that don't seem to add up. This is because these are two sets of values that are provided by the SDK:

3D Shape Points (mesh representation of the face): 121
Tracked Points: 87 + 13

The 3D Shape Points (121 of them) are the mesh vertices that make a 3D face model based on the Candide-3 wireframe.

Figure 3: Wireframe of the Candide-3 model https://www.icg.isy.liu.se/candide/img/candide3_rot128.gif

These vertices are morphed by the FaceTracking APIs to align with the face. The GetProjected3DShape method returns the vertices as an array of Vector3DF[]. These values can be enumerated by name using the "FeaturePoint" list. For example, TopSkull, LeftCornerMouth, or OuterTopRightPupil. Figure 4 shows these values superimposed on top of the color frame.

Figure 4: Feature Point index mapped on mesh model

To get the 100 tracked points mentioned above, we need to dive more deeply into the APIs. The managed APIs, provide an FtInterop.cs file that defines an interface, IFTResult, which contains a Get2DShapePoints function. FtInterop is a wrapper for the native library that exposes its functionality to managed languages. Users of the unmanaged C++ API may have already seen this and figured it out. Get2DShapePoints is the function that will provide the 100 tracked points.

If we have a look at the function, it doesn’t seem to be useful to a managed code developer:

// STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE;void Get2DShapePoints(out IntPtr pointsPtr, out uint pointCount);

To get a better idea of how you can get a collection of points from IntPtr, we need to dive into the unmanaged function:

/// <summary> /// Returns 2D (X,Y) coordinates of the key points on the aligned face in video frame coordinates. /// </summary> /// <param name="ppPoints">Array of 2D points (as FT_VECTOR2D).</param> /// <param name="pPointCount">Number of elements in ppPoints.</param> /// <returns>If the method succeeds, the return value is S_OK. If the method fails, the return value can be E_POINTER.</returns> STDMETHOD(Get2DShapePoints)(THIS_ FT_VECTOR2D** ppPoints, UINT* pPointCount) PURE;

The function will give us a pointer to the FT_VECTOR2D array. To consume the data from the pointer, we have to create a new function for use with managed code.

The managed code

First, you need to create an array to contain the data that is copied to managed memory. Since FT_VECTOR2D is an unmanaged structure, to marshal the data to the managed wrapper, we must have an equivalent data type to match. The managed version of this structure is PointF (structure that uses floats for x and y).

Now that we have a data type, we need to convert IntPtr to PointF[]. Searching the code, we see that the FaceTrackFrame class wraps the IFTResult object. This also contains the GetProjected3DShape() function we used before, so this is a good candidate to add a new function, GetShapePoints. It will look something like this:

// populates an array for the ShapePoints public void GetShapePoints(ref Vector2DF[] vector2DF) { // get the 2D tracked shapes IntPtr ptBuffer = IntPtr.Zero; uint ptCount = 0; this.faceTrackingResultPtr.Get2DShapePoints(out ptBuffer, out ptCount); if (ptCount == 0) { ``vector2DF = null; return; } // create a managed array to hold the values if (vector2DF == null || (vector2DF != null && vector2DF.Length != ptCount)) { vector2DF = new Vector2DF[ptCount]; } ulong sizeInBytes = (ulong)Marshal.SizeOf(typeof(Vector2DF)); for (ulong i = 0; i < ptCount; i++) { vector2DF[i] = (Vector2DF)Marshal.PtrToStructure((IntPtr)((ulong)ptBuffer + (i * sizeInBytes)), typeof(Vector2DF)); } }

To ensure we are using the data correctly, we refer to the documentation on Get2DShapePoints:

IFTResult::Get2DShapePoints Method gets the (x,y) coordinates of the key points on the aligned face in video frame coordinates.

The PointF values represent the mapped values on the color image. Since we know it matches the color frame, there is no need to do apply mapping. You can call the function to get the data, which should align to the color image coordinates.

The sample code

The modified version of FaceTrackingBasics-WPF is available in the sample code that can be downloaded from CodePlex. It has been modified to allow you to display the feature points (by name or by index value) and toggle the mesh drawing. Because of the way WPF renders, the performance can suffer on machines with lower end graphics cards. I recommend that you only enable these one at a time. If your UI becomes unresponsive, you can block the sensor with your hand to prevent FaceTracking data capturing. Since the application will not detect any face tracked data, it will not render any points, giving you the opportunity to reset the features you enabled by using the UI controls.

Figure 5: ShapePoints mapped around the face

As you can see in Figure 5, the additional 13 points are the center of the eyes, the tip of the nose, and the areas above the eyebrows on the forehead. Once you enable a feature and tracking begins, you can zoom into the center and see the values more clearly.

A summary of the changes:

MainWindows.xaml/.cs:

UI changes to enable slider and draw selections

FaceTrackingViewer.cs:

Added a Grid control – used for the UI elements
Modified the constructor to initialize grid
Modified the OnAllFrameReady event
- For any tracked skeletons, create a canvas and add to the grid. Use that as the parent to put the label controls

public partial class FaceTrackingViewer : UserControl, IDisposable{ private Grid grid; public FaceTrackingViewer() { this.InitializeComponent(); // add grid to the layout this.grid = new Grid(); this.grid.Background = Brushes.Transparent; this.Content = this.grid; } private void OnAllFramesReady(object sender, AllFramesReadyEventArgs allFramesReadyEventArgs) { ... // We want keep a record of any skeleton, tracked or untracked. if (!this.trackedSkeletons.ContainsKey(skeleton.TrackingId)) { // create a new canvas for each tracker Canvas canvas = new Canvas(); canvas.Background = Brushes.Transparent; this.grid.Children.Add( canvas ); this.trackedSkeletons.Add(skeleton.TrackingId, new SkeletonFaceTracker(canvas)); } ... }}

SkeletonFaceTracker class changes:

New property: DrawFraceMesh, DrawShapePoints, DrawFeaturePoint, featurePoints, lastDrawFeaturePoints, shapePoints, labelControls, Canvas
New functions: FindTextControl UpdateTextControls, RemoveAllFromCanvas, SetShapePointsLocations, SetFeaturePointsLocations
Added the constructor to keep track of the parent control
Changed the DrawFaceModel function to draw based on what data was selected
Updated the OnFrameReady event to recalculate the positions based for the drawn elements
- If DrawShapePoints is selected, then we call our new function

private class SkeletonFaceTracker : IDisposable{... // properties to toggle rendering 3D mesh, shape points and feature points public bool DrawFaceMesh { get; set; } public bool DrawShapePoints { get; set; } public DrawFeaturePoint DrawFeaturePoints { get; set; } // defined array for the feature points private Array featurePoints; private DrawFeaturePoint lastDrawFeaturePoints; // array for Points to be used in shape points rendering private PointF[] shapePoints; // map to hold the label controls for the overlay private Dictionary<string, Label> labelControls; // canvas control for new text rendering private Canvas Canvas; // canvas is passed in for every instance public SkeletonFaceTracker(Canvas canvas) { this.Canvas = canvas; } public void DrawFaceModel(DrawingContext drawingContext) { ... // only draw if selected if (this.DrawFaceMesh && this.facePoints != null) { ... } } internal void OnFrameReady(KinectSensor kinectSensor, ColorImageFormat colorImageFormat, byte[] colorImage, DepthImageFormat depthImageFormat, short[] depthImage, Skeleton skeletonOfInterest) { ... if (this.lastFaceTrackSucceeded) { ... if (this.DrawFaceMesh || this.DrawFeaturePoints != DrawFeaturePoint.None) { this.facePoints = frame.GetProjected3DShape(); } // get the shape points array if (this.DrawShapePoints) { this.shapePoints = frame.GetShapePoints(); } } // draw/remove the components SetFeaturePointsLocations(); SetShapePointsLocations(); }
...
}

Pulling it all together...

As we have seen, there are two types of data points that are available from the Face Tracking SDK:

Shape Points: data used to track the face
Mesh Data: vertices of the 3D model from the GetProjected3DShape() function
FeaturePoints: named vertices on the 3D model that play a significant role in face tracking

To get the shape point data, we have to extend the current managed wrapper with a new function that will handle the interop with the native API.

Carmine Sirignano
Developer Support Escalation Engineer
Kinect for Windows

Additional resources