In this very first blog post of my Kinect programming series, I am going to talk about the fundamentals of Kinect sensor followed by other posts in which I’ll be discussing Kinect from a developer perspective.
In my spare time I like to develop & play with the Kinect. Kinect for Windows is a technology that is not very much adopted in my country, specifically Academia side.
Most people see it as a toy but I see it as opportunities for mankind, making our lives easier.
For example, doctors use it to help people that are disabled, make life more enjoyable for them and so on.
Building computer vision based applications had always been a difficult task for majority of application developers, since it requires lots of mathematics & similar algorithm information that researchers use in Computer vision, Signal processing and other fields of technology. Microsoft Kinect reduces a lot of development and hardware restriction that developers faces in past but still “What to do” & “How to do” purely depends on the developer.
You will see a lot of stuff on internet regarding Kinect integration with other systems (i.e. Arduino Platform) & using Kinect with other computer vision frameworks & libraries, I’ll try to discuss as much as possible in my future posts, but for now let’s get started to have basic understanding of Microsoft Kinect sensor.
Kinect is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users to control and interact with the Xbox 360 without the need to touch a game controller, through a natural user interface using gestures and spoken commands.
The project is aimed at broadening the Xbox 360’s audience beyond its typical gamer base. A version for Windows was released on February 1, 2012.
After selling a total of 8 million units in its first 60 days, the Kinect holds the Guinness World Record of being the “fastest selling consumer electronics device”.
Microsoft released Kinect software development kit for Windows. This SDK will allow developers to write Kinect enabled apps in C++/CLI, C#, or Visual Basic .NET.
The Kinect sensor is a horizontal bar connected to a small base with a motorized pivot and is designed to be positioned lengthwise above or below the video display. The device has two versions i.e. Kinect for Xbox 360 and Kinect for Windows (for commercial purpose).
The device features
- RGB camera.
- Depth sensor (IR).
- Multi-array microphone.
- Motor to adjust camera angle.
In addition to the above features, Kinect for Windows offer few extra features i.e.
- Facial recognition
enables to track multiple points in your face like Skeleton Tracking.
- Near Mode
enables the camera to see objects as close as 40 centimeters in front of the device without losing accuracy or precision, with graceful degradation out to 3 meters.
- Seated or 10 Joints Mode
skeletal tracking which provides the capability to track the head, neck and arms of either a seated or standing user.
The default RGB video stream uses 8-bit VGA resolution (640 × 480 pixels) with a Bayer color filter, but the hardware is capable of resolutions up to 1280×960 (at a lower frame rate) and other formats such as UYVY
Depth Sensor (IR)
The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions. The sensing range of the depth sensor is adjustable, and the Kinect software is capable of automatically calibrating the sensor based on gameplay and the player’s physical environment, accommodating for the presence of furniture or other obstacles.
The monochrome depth sensing video stream is in VGA resolution (640 × 480 pixels) with 11-bit depth, which provides 2,048 levels of sensitivity. The Kinect sensor has a practical ranging limit of 3.9 – 11 ft. distance when used with the Xbox software.
The area required to play Kinect is roughly 6 m2, although the sensor can maintain tracking through an extended range of approximately 2.3 – 20 ft.
The horizontal field of the Kinect sensor at the minimum viewing distance of ~0.8 m (2.6 ft.) is therefore ~87 cm (34 in), and the vertical field is ~63 cm (25 in), resulting in a resolution of just over 1.3 mm (0.051 in) per pixel.
The microphone array features four microphone capsules and operates with each channel processing 16-bit audio at a sampling rate of 16 KHZ.
In my next post, I’ll be discussing about
- Installation of SDK
- Beginning with Kinect programming
- RGB camera stream
- Skeleton Tracking and more.
If you have any suggestions on topics, have questions, feedback or want to help me out, feel free to contact me by posting your comments below this post and I’ll try to help you out!