Kinect is a motion sensing input device released by Microsoft in 2010 for the Xbox 360 game console, and in 2012 for Windows PC. Based around a webcam-style add-on peripheral, it enables users to control and interact with software without the need to touch a game controller (through a natural user interface using gestures and spoken commands).
The project is aimed at broadening the Xbox 360’s audience beyond its typical gamer base. Kinect competes with the Wii Remote Plus and PlayStation Move with PlayStation Eye motion controllers for the Wii and PlayStation 3 home consoles, respectively. After selling a total of 8 million units in its first 60 days, the Kinect holds the Guinness World Record of being the ‘fastest selling consumer electronics device.’
Kinect builds on software technology developed internally by Rare, a subsidiary of Microsoft Game Studios owned by Microsoft, and on range camera technology by Israeli developer PrimeSense, which developed a system that can interpret specific gestures, making completely hands-free control of electronic devices possible by using an infrared projector and camera and a special microchip to track the movement of objects and individuals in three dimension. This 3D scanner system called ‘Light Coding’ employs a variant of image-based 3D reconstruction. The Kinect sensor is a horizontal bar connected to a small base with a motorized pivot and is designed to be positioned lengthwise above or below the video display. The device features an ‘RGB camera, depth sensor, and multi-array microphone running proprietary software,’ which provide full-body 3D motion capture, facial recognition, and voice recognition capabilities. The Kinect sensor’s microphone array enables the Xbox 360 to conduct acoustic source localization and ambient noise suppression, allowing for things such as headset-free party chat over Xbox Live.
The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions. The sensing range of the depth sensor is adjustable, and the Kinect software is capable of automatically calibrating the sensor based on gameplay and the player’s physical environment, accommodating for the presence of furniture or other obstacles. Described by Microsoft personnel as the primary innovation of Kinect, the technology enables advanced gesture recognition, facial recognition, and voice recognition. According to information supplied to retailers, Kinect is capable of simultaneously tracking up to six people, including two active players for motion analysis with a feature extraction of 20 joints per player. However, PrimeSense has stated that the number of people the device can ‘see’ (but not process as players) is only limited by how many will fit in the field-of-view of the camera.
The depth sensing technology behind Kinect was invented in 2005 by Zeev Zalevsky, Alexander Shpunt, Aviad Maizels and Javier Garcia. Kinect itself was first announced at E3 2009 under the code name ‘Project Natal.’ Following in Microsoft’s tradition of using cities as code names, it was named after the Brazilian city of Natal as a tribute to the country by Brazilian-born Microsoft director Alex Kipman, who incubated the project. The name Natal was also chosen because the word natal means ‘of or relating to birth,’ reflecting Microsoft’s view of the project as ‘the birth of the next generation of home entertainment.’
Although the sensor unit was originally planned to contain a microprocessor that would perform operations such as the system’s skeletal mapping, it was revealed in 2010 that the sensor would no longer feature a dedicated processor. Instead, processing would be handled by one of the processor cores of the Xbox 360’s CPU. Originally, the Kinect system consumed about 10-15% of the Xbox 360’s computing resources. However, that load has since decreased to single digits. A number of observers commented that the computational load required for Kinect makes the addition of Kinect functionality to pre-existing games through software updates unlikely, with concepts specific to Kinect more likely to be the focus for developers using the platform.
In 2011 Microsoft announced that it would release a non-commercial Kinect software development kit (SDK) for Windows. The SDK includes Windows 7 compatible PC drivers for Kinect device. It provides Kinect capabilities to developers to build applications with C++, C#, or Visual Basic by using Microsoft Visual Studio 2010 and includes: Raw sensor streams (access to low-level streams from the depth sensor, color camera sensor, and four-element microphone array); Skeletal tracking (tracks the skeleton image of one or two people moving within the Kinect field of view for gesture-driven applications); and Advanced audio capabilities (acoustic noise suppression and echo cancellation, beam formation to identify the current sound source, and integration with the Windows speech recognition API).
The Kinect system software allows users to operate the Xbox 360 Dashboard console user interface through voice commands and hand gestures. Techniques such as voice recognition and facial recognition are employed to automatically identify users. Among the applications for Kinect is Video Kinect, which enables voice chat or video chat with other Xbox 360 users or users of Windows Live Messenger. The application can use Kinect’s tracking functionality and the Kinect sensor’s motorized pivot to keep users in frame even as they move around. Other applications with Kinect support include ESPN, Zune Marketplace, Netflix, Hulu Plu, and Last.fm; Microsoft later confirmed that all forthcoming applications would require them to have Kinect functionality for certification.
In 2010, Adafruit Industries, an electronics hobbyist company, offered a bounty for an open-source driver for Kinect. Microsoft initially voiced its disapproval of the bounty, stating that it ‘does not condone the modification of its products’ and that it had ‘built in numerous hardware and software safeguards designed to reduce the chances of product tampering.’ This reaction, however, was caused by a misunderstanding within Microsoft, and the company later clarified its position, claiming that while it does not condone hacking of either the physical device or the console, the USB connection was left open by design.
Alex Kipman spoke on NPR’s ‘Science Friday’ to address the issue: ‘The first thing to talk about is, Kinect was not actually hacked. Hacking would mean that someone got to our algorithms that sit inside of the Xbox and was able to actually use them, which hasn’t happened. Or, it means that you put a device between the sensor and the Xbox for means of cheating, which also has not happened. That’s what we call hacking, and that’s what we have put a ton of work and effort to make sure doesn’t actually occur. What has happened is someone wrote an open-source driver for PCs that essentially opens the USB connection, which we didn’t protect, by design, and reads the inputs from the sensor. The sensor, again, as I talked earlier, has eyes and ears, and that’s a whole bunch of noise that someone needs to take and turn into signal.’
Before the month was out Adafruit announced Héctor Martín as the winner of its bounty; he produced a Linux driver that allows the use of both the RGB camera and depth sensitivity functions of the device. It was later revealed that Johnny Lee, a core member of Microsoft’s Kinect development team, had secretly approached Adafruit with the idea of a driver development contest and had personally financed it. Not long after the bounty was won, PrimeSense released their own open source drivers along with motion tracking middleware called ‘NITE.’ PrimeSense later announced that it had teamed up with Asus to develop a PC-compatible device similar to Kinect, which will be called WAVI Xtion.
Alexandre Alahi from EPFL (a Swiss technology institute) presented a video surveillance system that combines multiple Kinect devices to track groups of people even in complete darkness. Companies So touch and Evoluce have developed presentation software for Kinect that can be controlled by hand gestures; among its features is a multi-touch zoom mode. In late 2010, the free public beta of HTPC software ‘KinEmote’ was launched; it allows navigation of Boxee and XBMC menus using a Kinect sensor. Soroush Falahati wrote an application that can be used to create stereoscopic 3D images with a Kinect sensor. For a limited time in 2011, a Topshop store in Moscow set up a Kinect kiosk that could overlay a collection of dresses onto the live video feed of customers. Through automatic tracking, position and rotation of the virtual dress were updated even as customers turned around to see the back of the outfit.
Kinect also shows compelling potential for use in medicine. Researchers at the University of Minnesota have used Kinect to measure a range of disorder symptoms in children, creating new ways of objective evaluation to detect such conditions as autism, attention-deficit disorder, and obsessive-compulsive disorder. Several groups have reported using the Kinect for intraoperative, review of medical imaging, allowing the surgeon to access the information without contamination. This technique is already in use at Sunnybrook Health Sciences Centre in Toronto, where doctors use it to guide imaging during cancer surgery. At least one company, GestSure Technologies, is pursuing the commercialization of such a system.