How Deep-Learning Makes Video Surveillance Smarter
By Oliver Philippou, Senior Analyst, Video Surveillance, IHS Markit
Why is the security industry talking about artificial intelligence?
- IHS Markit forecasts the total market for paid-for video analytics market will reach $700 million in 2021.
- This growth attributed to the widespread adoption of emerging analytics technologies, specifically deep learning architectures, which promise to alter not just the video analytics landscape, but the video surveillance industry as a whole.
- The 2017 edition of the IHS Markit Report on Video Analytics in Security and Business Intelligence is now available.
For a video camera, the world is not a straightforward place. It is a mirage of complex detail, subtle nuances, and ever-changing scenes. As such, for years the reliability of video analytics has been extremely variable, with vendors struggling to develop algorithms that could function in complex real world scenes. However, the industry has come a long way in recent years, and the traditional “rules-based” video analytics have steadily improved their capability. These analytics are still not quite able to cope with the detailed and complex world in which we live, particularly when dealing with scenarios outside a highly controlled environment. Enter ‘deep-learning analytics’. Deep-learning analytics are poised to revolutionise the security industry, and facilitate a significant leap in the capabilities of video analytics. The last couple of years has seen a large increase in research and development in deep-learning neural networks, proving their capabilities, generating considerable excitement, and putting them within reach of a much wider user group.
Deep learning appears to be able to offer a level of accuracy and reliability in object and behaviour classification, which not only enables video analytics to finally deliver on some of the lofty but as yet unrealised claims made in the past, but pushes capabilities far beyond them. Broadly speaking, there are two main areas in which deep learning analytics offer significant benefits over the technology that has preceded it. They are:
A long held complaint levied against traditional analytics products was that these algorithms were unable to distinguish between objects and behaviours that a human being would have no problem classifying. This lack of intuition on the part of computer vision algorithms results either in missed security breaches or false alarms. The ability of deep learning algorithms to view a scene intuitively, in the same way as a human viewer, means that detection accuracy increases, and false alarm rates fall, dramatically. Neural networks allow a computer to apply a series of assessments to a given situation. This is a significant development for the video analytics industry. Although some end-users may not need an analytics solution that is 100 percent accurate 100 percent of the time, many user require a security system be as close to infallible as possible. Users in the critical infrastructure sector, for instance, cannot afford to miss a breach in their security; and can spend a large amount of money investigating false alarms. Deep learning algorithms have proven their ability to learn to achieve 99.9 percent accuracy in certain tasks, in controlled environments like airport immigration face recognition applications, where conventional systems would struggle to achieve 95 percent (according to Paul Sun, CEO, IronYun – Deep learning technology applications for video surveillance). In many security use cases, these few percentage points make all the difference.
Not only has deep learning demonstrated its capacity to radically increase the effectiveness of a computer to reliably classify objects and behaviour. It is also making possible the processing and analysis of increasing volumes of video footage in a fraction of the time that previous analytics would need. Companies such as Avigilon, Qognify, and IronYun are now marketing analytics that leverage deep learning to turn vast amounts of video footage into usable information in a fraction of the time it would have taken in the past. Video processing software that allows users to interact with their surveillance footage using a Google-like interface and natural language search terms drastically reduces the amount of time it takes to find relevant video footage in an archive that might store video from thousands of feeds.
Facial recognition is also an area that has benefitted the most from deep learning architecture. Indeed, most facial recognition analytics on the market today feature some kind of deep learning. Not only does deep learning increase the accuracy of facial recognition sensors, it also enables faces to be identified in larger and more crowded scenes. In the wake of recent terrorist attacks on crowded venues, this capability could radically change the whole approach to security monitoring, allowing law enforcement to track suspects with far greater speed and efficiency. Herta is one such company that specialises in facial recognition in large crowds. Because of this, IHS Markit expects verticals such as large shopping malls, airports, and other transportation hubs, along with city surveillance projects, to be early adopters of this type of analytics.
So what is driving adoption of video analytics based on deep learning?
There are numerous factors that appear likely to drive the widespread adoption of deep learning analytics across a variety of video surveillance application types. One major factor is the astonishing volume of data generated by video surveillance systems each day. Hundreds of petabytes of video footage are recorded daily, and this figure is rising all the time. This is not just thanks to an increase in resolution, but also a general increase in the number of video feeds. Indeed, video surveillance data makes up over half of the volume of what could be referred to as “big data” and the proportion is rising significantly. While the volume of video data is huge and increasing, the ability of security companies and end-users to monitor and review it without assistance is nowhere near up to the task. Deep learning offers a number of advantages in this context, allowing faster, more intuitive review and indexing of recorded footage, and reducing the time it takes to find relevant images from days, weeks or even months, down to minutes.
Computing power will also play a significant facilitating role; deep learning video analytics promise unprecedented performance, but also require significantly more computational power than many traditional video analytics products. The deep-learning model requires very many samples, and carries out very many calculations. In the past, hardware devices were incapable of processing complex deep-learning models with over a hundred layers. For example, in 2011, Google’s DeepMind used 1,000 devices with 16,000 CPUs to simulate a neural network with approximately one billion neurons. Today, only a few GPUs are required to achieve the same sort of computational power with even faster iteration. The rapid development of GPUs, supercomputers, and cloud computing and other high performance hardware platforms has facilitated the rise of deep-learning analytics.
Safe and smart city projects are becoming more and more common. China’s safe city initiatives have, for example, been expanding in complexity and scope since their inception in 2003: this trend is reviewed in depth in the Safe City 2017 report from IHS Markit. Deep-learning video analytics will be an essential facilitator to realise the safe city concept. Indeed, we are increasingly seeing facial recognition, which more often than not requires deep-learning architecture to function reliably, become a required feature in Chinese tender documents.
Huge amounts of data will be generated from the thousands of sensors deployed across safe cities. A key element of the safe city concept is an integrated system that can operate efficiently in real time, with much of the process being automated. Highly reliable analytics, that are capable of making inferences on a more abstract level than traditional machine-vision algorithms, in a wider variety of weather conditions, will be essential to reduce false alarms and the level of manpower required to monitor video feeds.
The IHS Markit Video Analytics in Security and Business Intelligence Report – 2017 includes dedicated in-depth research into deep-learning video analytics market, and is available for purchase through: technology.ihs.com