A Very Brief Introduction to Computer Vision

Article
05/27/2014

We are attempting to create awareness of Computer Vision among our enthusiast Bangladeshi Windows/Windows Phone app developer community. You may expect the subject matter will be elaborated in easiest possible English.

Computer Vision and a few Examples

Computer Vision (CV) is a field that includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions.

We humans see the world through our eyes. From the perceived images of the world we can take decisions thanks to superiority of our brains. But how does a machine can take decisions based on images acquired through their eyes i.e. cameras?

Consider the following image, where all faces were hid behind a colored mask; face detection – this is a work of CV:

Object detection from a scene:

How about action detection out of scenes:

Recognizing car numbers from the number plates to assist police in recognizing each cars:

Controlling Windows using motion gestures:

Even more advanced – augmented/mixed reality at its best:

Do you dream to build such futuristic Windows/Windows Phone projects and submit to Imagine Cup? Stay tuned to this blog.

How can we equip our apps with such capability?

Obviously, you are thinking of open libraries which can support/enable such ventures. You are absolutely correct. Many call it image processing/analyzing open source libraries/Computer Vision libraries (CV Libraries). Such libraries are able to least but not limited to take pixel data (more at the bottom of this post) of captured images/frames, check RGB values and detect shapes, etc. Computer Vision is one of the fields in Computer Science discipline and such libraries give us ample opportunity to build apps using those techniques for both computer scientists and non-scientists alike. OpenCV is arguably the most used CV library across industry and academia. It has been extensively used in desktop as well as mobile platforms. Unfortunately, it’s not available for Windows/Windows Phone as of yet. Some of the OpenCV methods were ported for particular functionality, but there’s no single fully ported library that works for both Windows and Windows Phone.

Therefore I requested Microsoft Bangladesh Intern, Asek-e-Alam Shanto to port Qualcomm’s FastCV into FastCV.NET, which is a C# port of their C++ library. FastCV might not be as feature-complete as OpenCV, yet it is still quite big and perhaps will be able to cater to all sorts of your CV needs. Qualcomm has done an excellent job by writing a C++ version for Windows Phone 8, but in order to make it more accessible to the C# users who don’t have much exposure to C++ or have almost forgotten it already, we have ported it to C#. The intention is to allow you to use Computer Vision to implement your wildest ideas in C# without having the trouble of C++ in Windows Phone 8 and beyond. We are going to launch the open source version of it very soon, so that you can even contribute.

FastCV.NET possible ideas:

Every year we get many amazing ideas in Imagine Cup Bangladesh like the following:

1. An app that takes photo of a leaf and detects disease

2. An app that can tell you the power of your eye glass lenses.

Perhaps your app can take photo of the leaves of a plant and describe its herbal uses. Apps that can solve Sudoku from the newspapers. App that can control traffic more efficiently. The world is a big enough canvas for you to come up with your own idea.

What is Pixel Data?

Each pixel can be represented with a RGB value. Pixel is the smallest addressable value in a digital image. If you zoom-in to the highest level of a digital image or take a very close look at the computer screen, you will find that all images are comprised of such small elements called pixels. A RGB value is a set of Red, Green and Blue color values. Hence, it is often said that RGB is a color model in which red, green, and blue light are added together in various ways to reproduce an array of colors. Here are a few ways how RGB triplets can be represented:

Notation	RGB triplet
Arithmetic	(1.0, 0.0, 0.0)
Percentage	(100%, 0%, 0%)
Digital 8-bit per channel	(255, 0, 0) or sometimes #FF0000 (hexadecimal)
Digital 16-bit per channel	(65535, 0, 0)

Article Courtesy: Asek-e-Alam Shanto and Tanzim Saqib.

A Very Brief Introduction to Computer Vision

Computer Vision and a few Examples

How can we equip our apps with such capability?

FastCV.NET possible ideas:

What is Pixel Data?

Additional resources