Kinect for Robotics

The Kinect sensor is one of the best devices, for its price, to become available for robotics in the last decade. For $150 you get 3D range (distance) data and RGB color (webcam video) data. There is a microphone array thrown in as well. But wait! There’s more. The Kinect can detect people and generate skeleton data and you don’t have to write a single line of code to do the processing – the Kinect for Windows SDK does all the hard work for you.

How it Works

The Kinect uses an infrared (IR) laser to spray out a pseudo-random beam pattern. An IR camera captures an image of the dots that are reflected off objects (as in the picture below) and the electronics inside the Kinect figures out how much the dot pattern has been distorted. The distortion is a measure of distance from the camera. This approach is commonly called Structured Light. All this happens at 30 frames per second – not bad for a cheap consumer device.

Kinect-IR-Pattern

As a programmer, the Kinect gives you an array of depth values that correspond to the pixels of the RGB image. This unusual coordinate space, consisting of x and y as pixel coordinates and z as a distance in millimeters, is called the Depth Image Space. All of the distances are measured from a virtual plane passing through the Kinect camera. You can convert the data into conventional (x, y z) in meters but this requires additional processing overhead.

Depth-Image-Space

Skeleton Space on the other hand uses conventional (x, y, z) coordinates. A skeleton consists of a set of 20 joints, each with its own 3D coordinates. Detecting gestures is relatively easy. For example, to detect a person waving their arm over their head you just need to compare the height of their wrist to see if it is above their head, and then see if it is moving from side to side.

Skeleton-Joints

Comparison to Laser Range Finders

Laser Range Finders, or LIDAR (Light Detection and Ranging) devices, have been on the market for a long time. A LRF works by sending out pulses of infrared laser light and timing the return signal. Therefore they are known as Time of Flight devices.

The German SICK brand of LRFs have long been the workhorses of the research community, but these cost thousands of dollars. More recently, Hokuyo in Japan has been selling a cheaper range of LRFs that are approaching the $1,000 barrier. However, for this amount of money you can buy six Kinect sensors, although you would need enough USB ports to plug them all in and the necessary processing power to make use of all the 3D data.

The primary differences between a LRF and a Kinect are the Field of View (FOV), Maximum Range and Resolution. A conventional LRF has a FOV of 180 degrees, and some are up to 270 degrees. In contrast, the Kinect only has a 57 degree FOV.

LRFs are available in a variety of ranges from as little as 2 meters out to well over 100 meters. The Kinect, because it was designed for use in a living room, has a maximum range of 4 meters.

Even with a very large maximum range, the resolution of a LRF is from millimeters to centimeters (depending on how expensive it is) and the accuracy is constant across its entire range. Distance data from the Kinect varies in accuracy from sub-centimeter up close to as much as a 5cm error at its maximum range. This is a consequence of how it measures distance.

One downside of a LRF is that it operates in 2D, effectively taking a horizontal slice through the environment around the robot. In order to capture 3D data, the LRF must be mounted on a tilt mechanism and tilted up and down or mounted sideways and panned from side to side. This makes capturing 3D data relatively slow.

The Future for Kinect

Currently the Kinect has a limited range of 80cm to 4 meters. This means that the Kinect cannot see objects that are right in front of the robot, so you still need traditional obstacle sensors such as sonar (which is why they are included on the RDS Reference Platform). Next year, when the Kinect for Windows Hardware is released, there will be a new “near mode” that will range from 50cm to 3 meters. Although this helps with detecting nearby obstacles, a robot should still have other sensors for redundancy.

Going beyond next year, Microsoft will continue researching even better Kinect hardware. This means that 3D depth data is now here to stay, so sharpen up your 3D geometry skills and get cracking on applications that take full advantage of these new devices.