The April issue of MSDN Magazine includes an intriguing feature (Context-Aware Dialogue with Kinect) that explores the unique capabilities of the Kinect for Windows SDK. Leland Holmquest, now an Enterprise Strategy Consultant with Microsoft, created a Kinect-enabled Windows Presentation Foundation application as part of his PhD coursework at George Mason University. The result, called Project Lily, is a virtual office assistant driven by context-aware dialogue and multimodal communication.
I caught up with Leland earlier this month and asked him a few questions about Project Lily and his experience writing his feature for MSDN Magazine.
Michael Desmond: What motivated you to write about developing a Kinect-enabled application, and what sort of challenges did you encounter?
Leland Holmquest: I was taking a class (CS895: Software for Context-Aware Multiuser Systems) at George Mason University, taught by Dr. João Pedro Sousa, as part of my PhD in IT course work. The basic idea behind the course was to understand the significance of context and how it can be incorporated into a multi-user setting. As with most graduate level course, the class concluded with a demonstrable project to showcase the concepts learned throughout the course.
The normal project for this course was to develop an Android phone application that made use of context (sensing external data such as location, direction heading, etc.). One of my biggest handicaps at GMU is that everything is bent towards Java — not the best environment for a MCPD!. I approached Dr. Sousa and asked if he had heard of Kinect and pitched the idea of Lily. Fortunately for me, Dr. Sousa was very excited about the concept and encouraged me to run with it.
I am not an avid gamer, but the Kinect games have provided me and my family hours of entertainment. I was very excited when the Kinect for Windows SDK came out and wanted desperately to “play” with it. This course gave me the idea and the additional motivation of wanting a good grade (I got an A+ for the course!).
MD: Tell us a bit about your background. How did your experience as a working programmer factor into your work on this feature?
LH: At the time I developed Lily, I was working at the Naval Surface Warfare Center Dahlgren Division as a rare commodity — a government employee who develops software. Specifically, I worked software solutions in SharePoint, mostly along the vein of knowledge management. I also wore the hat of KM Lead for the base in an effort sponsored by our CIO Steve Eckel. I have been programming in .NET since 2004 or so and recently earned the Microsoft Certified Professional Developer (MCPD) certification.
Because of my background and due to the excellent job of creating the API that the guys in Microsoft Research had done, learning to program the Kinect was amazingly straight forward, in my opinion. I was really nervous getting started, but I used the content on the Kinect for Windows Web site and found Channel 9 especially useful (the video walk-throughs were fabulous), and quickly got the hang of the basics.
Since then (and after submitting my article to MSDN Magazine) I have taken on a new position as an Enterprise Strategy Consultant with Microsoft. I am currently assisting the US Army in creating an enterprise SharePoint called the Enterprise Collaboration Service.
MD: What were some key challenges or issues you had to overcome to complete your project?
LH: One of the key concepts in my application is the switching in and out of grammars in the speech recognition engine (SRE) to match the context of the on-going dialogue. I was worried that this process would take too long to make it practical. I was wrong. As long as you keep the individual grammars to below 300 phrases (which is a requirement anyway) the system is quite responsive.
Because I used the Kinect for Windows SDK Beta2, I had one issue that I had to overcome — closeness to the Kinect. You needed to be at least 4 feet away from the Kinect unit. I didn’t always have the room necessary to allow the Kinect to “see” my whole skeleton. If Kinect doesn’t see the whole skeleton or can at least interpolate the whole skeleton, then it goes inactive and you can’t get a handle on the joints.
To overcome this, I purchased a set of lenses from a third party (Nyko Zoom) that allowed for a closer range. But this interjected a really fun problem. The lens creates a fisheye effect. So when the person’s body is in just the right configuration, the arms would really elongate which made interacting with elements on the WPF application a little flaky. With the release of the Kinect for Windows SDK v1.0, they have incorporated a “near mode” that is a better solution.
A personal issue that I had (and continue to have) is using the depth stream to model a 3D environment. I do not have the background for this. Every time I would get my head around the concept, when I went to implement it, it would break apart like a whiff of smoke. The capability is there. Just check out the videos on Channel 9 or YouTube — people are doing amazing things modeling in 3D using Kinect.
So far I just haven’t been able to get my head in the right place. But I am going to keep trying because there are some cool things that I want to do!