Encounters with Kinect

Imagine an interactive digital art performance where the audience becomes the performers and interact not just with other audience members but also with lights, sounds, and professional dancers. That’s what visitors to SummerSalt, an outdoor arts festival in Melbourne, Australia, experienced. The self-choreographed event came courtesy of Encounters, an installation created by the Microsoft Research Centre for Social Natural User Interfaces (SocialNUI, for short), a joint research facility sponsored by Microsoft, the University of Melbourne, and the State Government of Victoria. Held in a special exhibition area on the grounds of the university’s Victorian College of the Arts, Encounters featured three Kinect for Windows v2 sensors installed overhead.

The installation ran over four Saturday evenings and was attended by more than 1,200 people. Participants experienced dance performances from Victorian College of the Arts students who interacted with the crowd to create social interactions captured by the Kinect sensors. The results included spectacular visual and audio effects, as the participants came to recognize that their movements and gestures controlled the music and sound effects as well as the light displays on an enormous outdoor screen.

Kinect sensors captured the crowd’s gestures and movements, using them to control audiovisual effects at Encounters, an investigation into social interactions facilitated by natural user interfaces.

The arresting pubic art installation was powered by the three overhead Kinect v2 sensors, whose depth cameras provide the data critical to the special effects. Each sensor tracked a space approximately 4.5 meters by 5 meters (about 14.75 feet by 16.5 feet), from a height of 5 meters (about 16.5 feet). This enabled the sensors to track the movement of up to 15 people at a time on three axes. The sensors' depth cameras detected when people jumped, which was an important interaction mechanism for the installation. Using the depth camera also overcame the problem of varying lighting conditions.

Feeding the depth-camera data into custom software revealed a surprising amount of information about people moving through the space: as already mentioned, it tracked the location of up to 15 people in three axes (X, Y, Z); in addition, it provided information on the participant’s area, their velocity (speed and direction), the length of time they were present, whether they were jumping, whether they were part of a group (and if so, how many people were in that group), and the overall space’s dispersion, crowdedness, and centroid. The technical team achieved this across three separate spaces and maintained frame rates of approximately 30 frames per second.

From a high-level perspective, the end-to-end image processing process involved four steps:

  • Receipt of the raw depth pixels from the Kinect sensor
  • Preliminary filtering and then construction of an image from the depth data
  • Application of OpenCV to recognize contours (blobs) that represented a first guess at where people were located
  • Calculation via a series of heuristics to derive all the information mentioned in the preceding paragraph

The technical team experimented with the sensor in different configurations, at different heights, in different lighting conditions, with different flooring, with different sizes of people, and using different cameras in order to work this all out.

“We really enjoyed working with the Kinect sensor,” says John Downs, a research fellow at the Microsoft Research Centre for SocialNUI and the leader of the technical team on Encounters. “The different types of cameras—RGB, infrared, and depth—gave us a lot of flexibility when we designed Encounters. And as we moved through to the development phase, we appreciated the level of control that the SDK provided, especially the flexibility to process the raw camera images in all sorts of interesting ways. We took advantage of this, and to great effect. Additionally, the entire development process was completed in only six weeks, which is a testament to how simple the SDK is to use.”

The result of all this development creativity was more than just an amazing public art installation—it was also an intriguing social science investigation. Researchers from the SocialNUI Centre conducted qualitative interviews while members of the public interacted with their Kinect-generated effects, probing for insights into the social implications of the experience. As Frank Vetere, director of SocialNUI, explains, “The Centre explores the social aspects of natural user interfaces, so we are interested in the way people form, come together, and explore the public space. And we are interested in the way people might claim and re-orient the public space. This is an important part of starting to take technological developments outside of our lab and reaching out to the public and other groups within the University.”

This unique, cross-disciplinary collaboration was a wonderful success, delighting not only the NUI developers and researchers, but the public as well.

The Kinect for Windows Team

Key links

Skip to main content