In-Store Intelligent Video Analytics - Streamline Retail Operations with AI

Updated Nov 27, 2024 • 8 min read

More effective operation management may turn out to be a key driver of competitive advantage for retailers. A simple CCTV camera combined with an intelligent video analytics solution can contribute to achieving that goal. How?

Intelligent video analytics is a new opening for streamlining retail operations. This AI-driven solution processes video footage to detect and classify various objects (like people, retail products, or vehicles) and simplify extracting business value from customer and object tracking data.

Currently, video monitoring systems are used primarily to make recordings for future use, like reviewing footage after an incident. It’s mostly due to the high costs of effective monitoring of all the incoming information, as well as the repetitiveness and tediousness of the work itself. However, with intelligent video analytics your CCTV system can also serve as an optimization tool to optimize shelf-space, better plan inventory, or cut footprint. By leveraging intelligent video analytics in the context of autonomous stores, retailers can gain valuable insights and make data-driven decisions to enhance operational efficiency and improve the overall shopping experience.

What are intelligent video analytics?

Smart video analytics systems can impact various areas of day-to-day retail reality. From remote checkout line monitoring, through assigning extra employees to highly trafficked areas, locating misplaced items or lost children and notifying personnel about it, the list of potential uses for such technology goes on and on.

Retail shops and centers can use video analytics solutions to react in real-time to in-store situations but also to improve future sales strategy based on customers’ behaviour.

Since computers can process larger quantities of data faster than humans, machine learning-powered software provides invaluable insights in such tasks, detecting patterns, recognizing issues, and offering solutions in seconds. Let’s have a closer look at how to put such a system into action.

The privacy challenges of smart video analytics

In our time of concern over big data and privacy, one of the perks of intelligent video analytics is that we can rely on detecting people and their actions on live video feeds. This doesn’t require personal identification capabilities to function well. Knowing whether it is looking at a John Smith or a Jane Brown does not improve the system's work.

In fact, since video data already takes up huge amounts of space, additional features such as facial recognition could actually make the system less efficient, extending the time required to process video footage.

How can retailers benefit from smart video analytics

So why exactly should retailers consider using intelligent video analytics? Introducing such a solution might seem challenging from a financial, technological, and operational standpoint. However, we have some tips on how to make it budget-friendly and efficient.

Achieving real-time information processing

The technical requirements of intelligent video analytics systems seem overwhelming at first, but you can make them affordable and easy to deploy.

To achieve robust performance, we would need to employ Deep Learning (DL) or, more specifically, a deep neural architecture dedicated to Computer Vision: either Convolutional Neural Networks (CNNs) or the more recent Vision Transformers. These networks ensure high performance in person detection and activity recognition.

Convolutional Neural Networks (CNNs)

CNNs are networks specially designed to process images effectively, which is the basis of any type of video footage analytics. Although currently they have very high prediction accuracy and outperform any other solution in this field, their development was blocked for quite a long time due to inefficient technology and knowledge level.

CNNs were used for the first time back in the 1990s by Yann LeCun for digit recognition in ATM machines and have evolved rapidly ever since. 2014 was a turning point for this technology. That year AlexNet, a solution presented by Alex Krizhevsky at the ImageNet competition, revolutionised CNNs by introducing Deep Learning as the innovation driver.

As a result, it initiated rapid development of image recognition and processing technology that we can use today in various fields, like intelligent video analytics.

Vision Transformers

Vision Transformers, on the other hand, are a recent discovery of applying known Natural Language Processing solutions in Computer Vision. They have similar performance as CNNs, but they are faster and, as such, offer significant improvements in image recognition. Research on Vision Transformers is still ongoing, so soon we may see them replacing CNNs all the way in Computer Vision.

Deep neural networks are a perfect match for image recognition, however, their capabilities come with a price. They require powerful graphics processing units (GPUs) to run in near real-time, which can be very expensive. The same applies to cloud computing.

Performing cloud computations for many stores simultaneously produces a huge bill at the end of the month - not to mention the high bandwidth and reliability requirements. However, there are ways to minimize the costs.

Reducing costs and increasing efficiency

One way to reduce the costs of an intelligent video analytics solution is to run a soft real-time system. This means computing on time, but if any process takes longer, it does not weaken the overall performance of the system, as we can just discard a few frames.

We just need to make sure that the processing queue does not get too long, because with 24/7 operation we will lack the time to analyze it later. And we have to keep time resolution in mind.

Most cameras shoot at 24-30 frames per second (FPS), which is the speed needed to make individual images to be perceived as motion. However, for the purposes of monitoring, 1 FPS may be sufficient as it doesn’t need to be neither as fast, nor as precise to be seen as motion.

These solutions can greatly reduce the required processing power, but a standard CPU (central processing unit) will still not be sufficient. Instead, a dedicated edge computing system should be used.

Depending on the needs we can choose between many options, but the most reliable ones are:

Intel Neural Sticks,
Google Coral,
Nvidia Jetson.

The last option seems to be the most universal one, with prices as low as $60 and powerful capabilities. This one-time expense pays off quickly when compared to the cost of performing the computations in the cloud over a long period of time.

Nvidia Jetson Nano solution - an example

Nvidia Jetson Nano is capable of decoding up to 8 full HD video streams and has enough power to run even large deep neural networks on the fly. It is priced at $99 and uses 15W of power. It’s an entry-level piece of hardware, but that’s enough for a small retail store to start using a smart video analytics system at a low cost.

Nvidia Jetson Nano also offers processing of metadata and statistics and sending them to the cloud application. This capability simplifies the management of retail chains and enables, with the support of some business intelligence tools, easy comparison between shops in different locations.

Improving retail operations with intelligent video analytics

You don’t need the most powerful machines to make intelligent video analytics work. Reducing the quality of video and therefore the costs of hardware and energy won’t make the system any less effective.

The accuracy of detection and running speed can be improved with a customized solution. That’s what we are here for. Using our expertise in machine learning, we’re able to build highly efficient and cost-effective smart video analytics systems, tailored to your specific business needs. Drop us a line to see how we can apply our know-how to take your business to the next level.