What is image recognition technology?
Image recognition (sometimes called computer vision) is a technology that strives to acquire, process, analyse, and understand images and high-dimensional data from the real world in order to produce numerical or symbolic information.
What did you just say?
Don’t worry, italic voice. I know it is complicated. Let me rephrase: When you upload a picture of you and your friends on to Facebook, everyone’s faces will be recognised and get automatically tagged: that’s image recognition.
Ah, ok, that sounds definitely more understandable
OK! Because now it gets more complicated… Computer vision is a very complex area within computer science as there are a lot of aspects involved, such as machine learning, data mining, database knowledge discovery, pattern recognition, and others. Research into this area led to technology that mimics human vision. And to create software that is able to see, you first need a good pair of goggles.
What do you mean?
Well, I mean that to process an image, you first need to capture the moment using a camera. The software then extracts the information required from it and then takes an action based on the data. Until recently, digital cameras were ridiculously expensive, had a very low resolution, and image recognition was not possible to achieve in real time. But with the arrival of the mobile phone and high-speed cameras, the possibilities are endless. As an example of this, did you know that some years ago, a Japanese company created a robot that was able to play ‘rock, paper, scissors’ and win 10 out of 10 games?
I didn’t think so. Here is the link to that. The robot basically uses a high-speed camera to detect the movement of the human’s hand. By checking the movement patterns of his hand at 500 frames per second, the robot is able to instantly react with the counterplay that beats him. To achieve this, the camera captures an image of the shape of the hand as it’s forming the object and sends the information to the software, which recognises the pattern and triggers the robot’s response. The human hand takes 60 ms to form the shape, compared to the robot who takes 1 ms doing all of the above.
Ok, but I thought this was a blog about mobile…
Don’t worry, we’ll get to that. One of the biggest challenges, and the most extensive case study in image recognition is to imitate the human vision by electronically perceiving the image, understanding it, and give the consequent reaction. That’s exactly what the robot was doing in our previous example: it perceives the image by taking a picture, understands what the human is doing, and reacts by counter-playing the human. Of course, we, software engineers, tend to be more interested in the piece of software that recognises patterns than into robotics. So, how does the perceiving part actually work? The short answer is “mathematics”.
The most common thread in pattern recognition algorithms is probabilistic classification. An image is processed against a set of other stored images, and is given a value (a probability) per other image that it matches with. Combining multiple probabilistic classification algorithms that run over the same set of images, which is called an ‘ensemble’, gives a final confidence level per image that can be used by the software to make an educated guess on which image is a match.
As you can imagine, that is quite challenging for a mobile device. You’d think that the processing power is a problem there, and it definitely is! But the most serious bottleneck is the database of images against which matches the original one. In the example of the robot above, you have only a limited subset of images (rock, paper, scissors) to work with, but in the earlier example of image recognition on Facebook, it is not possible to store all the possible faces of every person who is registered on that social media platform on your device. (This is actually not how it works: Facebook stores a unique hash per friend, using certain characteristic values of the face as a seed… but the example was just to convey the idea.).
To overcome this and other problems, image recognition is normally done within the server, where processing power or storage space is not an issue. The mobile device will only send the image there and a neural network or machine will process the request.
But wait! I’ve seen it running in mobile devices without internet connection.
Ok, yes, that’s partially true. The mobile device still needs to send the images to the server side as the server needs to contain them. Once there, the server will process the image, generate a much smaller hash, and return it back to the application. And then, for example, you could go into airplane mode and view the image on your camera phone. The comparison between them will be done offline.
Thanks for all the nerdy tech-talk. Now let’s talk business.
Ah, so you want to know how businesses apply image recognition technology? It sure won’t come to you as a surprise that image recognition has the potential to revolutionise entire industries. In healthcare for example, IBM is starting to use image recognition technology to process massive quantities of medical images. This can help doctors diagnose diseases faster, and with higher accuracy. Baidu has developed a prototype of DuLight: an image recognition product that will help the visually impaired to ‘see’, by capturing their surroundings, and narrating the interpretations through an earpiece. However, there are often legal and ethical implications involved with Artificial Intelligence products. Take for example the automotive industry and Google’s self-driving cars. The technology is there, but a complex and lengthy process will need to be endured to actually bring these cars on the market.
Okay, but I’m not planning to build a self-driving car – what can image recognition technology do for my business?
Frankly, a lot! There are many small-scale methods to apply image recognition technology to derive benefits. Since this is a blog about mobile, let’s take a look at some in use cases of image recognition technology in the mobile channel. One of the big players in the field is Blippar: a visual discovery platform that allows users to scan objects and unlock content about these objects, making the physical world an interactive playground. For plant enthusiasts, there is LeafSnap, for wine lovers there is Delectable. But there are also some image recognition marketing campaigns worth taking a look at such as Makeup Genius, TrackMyMaccas, and SnapFindShop. These brands applied image recognition technologies in a way that drove social sharing and user engagement.
So you’re saying that image recognition technology can help me engage my users?
Well, since this is a blog about mobile, the word ‘engagement’ was going to enter the conversation at some point. The world of apps revolves around engagement: if you don’t succeed to engage the user, chances are, your user will simply not come back to your app. Image recognition gives your app huge opportunities to engage, since the technology allows you to extend beyond the boundaries of the mobile device and into the user’s physical world. Your app can provide something more tangible, which allows you to make a stronger, emotional connection with your users. And since emotion is strongly connected to memory, the odds are in your favor to make an impact that lasts.
Would you like to receive more information regarding Image Recognition Technology? Plan a workshop with us, or if there’s anything that we can help you with, just let us know by sending an email to firstname.lastname@example.org