Detectors and classifiers

ComBox Technology is a developer of its own detectors and classifiers using neural networks of various topologies to solve business problems.

Smoking detector

New norms of the Anti-Tobacco Law came into force In Russia, on June 1, 2014. The law against tobacco regulates relations arising in the field of protecting the health of citizens from the effects of tobacco smoke and the consequences of tobacco use: smoking is prohibited on long-distance trains, on passenger platforms, in hostels and hotels, retail premises, markets, cafes, bars, restaurants.

In order to combat Smoking, many States have introduced laws banning Smoking in public places. "Smoking rooms" were eliminated in all offices and theaters, they were also removed in public catering places.

The introduction of the prohibitions described above implies monitoring compliance with the established rules and regulations. Today, a variety of dust detectors and gas analyzers (eg CO2) are used for this purpose. The general principle of operation of these devices is as follows:

General principle of operation of various dust sensors and gas analyzers

The sensor detects changes in the environment, and the control microcontroller creates a reaction event according to a pre-set algorithm.

An alternative to dust detection sensors and gas analyzers can be object video Analytics using neural networks, where the input is a photo or video stream from a video surveillance camera, and the output is the probability of smoking tobacco or other compounds in the frame or set of frames.

There are several options for implementing the complex:

  1. Separate system in a compact design for installation on site
  2. Centralized system with data transmission and processing in the data center with the possibility of using existing video surveillance systems
  3. Hybrid option, when part of the data is processed in close proximity to the data source, and part is processed in a data center with centralized storage of the result of both systems

Let's consider them in more detail:

General block diagram of the hardware-software complex for smoking detection

Composition of the complex when used in close proximity to the data source:

  • IP camera / direct connection camera or a set of cameras (used as a data source).
  • Switch (when connecting more than one data source).
  • Executive device, Intel NUC8i5BEK computer.

At a low cost of hardware and software complex, many significant and essential security tasks are solved, such as:

  • Monitoring compliance with fire safety rules with high accuracy and photo-fixation of the fact of the offense (including time, date, place of the offense)
  • Identification of violations in hazardous industries and companies whose activities are related to the use of flammable and fuel and lubricants
  • Monitoring compliance with the internal regime at secure facilities

A valid use case is a server architecture, in which data from cameras is transmitted to the data center for further processing:

Server architecture where data from cameras is transmitted to the data center for further processing

When scaling and using this scheme, as a device for centralized inference, it is assumed to use the same Intel NUC8i5BEK, but in a different form factor (server 1U):

Server for execution of neural networks based on 8 Intel NUC8i5BEK

To detect the fact of Smoking on photos (frames of the incoming video stream), the neural network of the SSD Mobilenet v2 topology from Open Model Zoo is used. The network is pre-trained on the COCO dataset and then trained on Tensorflow. Next, the model is converted via Intel OpenVINO for further operation on the CPU/GPU in order to optimize the cost of FPS. The performance of the model after conversion:

In total, on one Intel NUC8i5BEK with a frame rate divisor of 5 (25 FPS / 5 = 5 FPS at the input), you can process up to 40 streams without taking into account decoding costs. With VAAPI hardware decoding and the latest intel-media-driver, decoding costs are minimal.

One of the advantages of the Intel OpenVINO framework is the ability to transfer networks between different devices, for example, the same model with minimal modification can be run on CPU, GPU, FPGA, VPU and other devices.

For the sake of experiment, the smoking detection model was launched on Intel Neural Compute Stick 2 based on Myriad X. Results:

Smoking detection running on Intel Neural Compute Stick 2 based on Myriad X

On the basis of industrial PCs with boards from AAEON or other manufacturers with embedded MyriadX chips, already industrial solutions can be obtained and used.

To demonstrate the operation of the neural network, the Telegram bot was implemented – The input is the image, and the output is the probability of smoking. Trying, watching, experimenting…

Inference runs on GPU Intel NUC8i5BEK

The following advantages of our solution can be noted:

  • Ability to process data from multiple sources in one place
  • The ability to detect the fact of Smoking at a distance limited only by the focal length of the camera, data source, for example, 5, 50 or 100 m (such indicators can not be obtained by classical sensors and/or devices)
  • The ability to detect smoking not only classic cigarettes, but also other devices (for example, vapes or smoking mixtures)
  • The ability to save the fact of the offense (photo and metadata of the event, such as date, time, location) when smoking in the wrong places
  • Possibility of retrofitting existing cameras with the function of detecting the fact of smoking and reactions to this event
  • The ability to integrate with existing monitoring systems and video surveillance systems, for example, Zabbix, Telegraf, Hikvision NVR, etc.

Let's consider some objects and problems for using the described hardware and software complex for detecting smoking in a video stream:

  • Corridors of business centers and other buildings and structures, staircases
  • Schools and kindergartens (due to the ineffectiveness of smoke detectors and other existing solutions in open space and in ventilated areas)
  • Gas station (due to the ineffectiveness of smoke detectors and other existing solutions in open space and in blown areas)
  • Metro (due to the large area, ceiling height and the ability to connect multiple cameras into a single system)
  • Railway stations and waiting areas (due to the large area, ceiling height and the ability to connect multiple cameras into a single system)
  • Airport terminals, runways (due to large area, ceiling heights, and ineffective outdoor smoke detectors)
  • Residential and office premises (to deceive existing sensors, just open a window)
  • Cafes, restaurants (to deceive existing sensors, just open a window)

In our opinion, one of the interesting areas of application is transport, in particular - car sharing, where already now there are penalties in the form of fines for smoking in the salons of rented cars. The amount of the fine varies from 5 to 15 thousand rubles depending on the company. Returning to the comparison of object video analytics and sensors, the sensors do not pick up vapes and other devices for smoking mixtures, and are also practically insensitive when the car windows are open. But this does not cancel the fact of violation and, accordingly, the legal penalty in the form of a fine in accordance with the contract.

In addition, several neural networks can be sequentially applied in transport, such as smoking detection and detection of the fact / time of using a mobile phone. Further, such systems should be scaled, for example, with the integration of telematics and connection to the car's CAN bus to track the use of phones only when the vehicle is moving, but these are already integration details.

An illustrative example of what exactly we detect and what we get as a result:

Demonstration on bots in Telegram (entrance - a picture from a smartphone camera or from a gallery, exit - probability):

If earlier we talked about Intel NUC and servers based on them, as computers for inference, now we will note the operation of the solution in vehicles when the influence of weather conditions appears (heat, cold, dew point, etc.). AAEON has a great solution,  VPC-3350S:


Specifically, our version is with an Intel Atom x5 E3940 processor. Inference - on MyriadX on an expansion board. FPS in inference:

Decoder tests:

Why the device is good and why our choice fell on it:

  • Built-in LTE module.
  • Availability of VPU expansion with Intel MyriadX accelerator.
  • Integrated Intel HD Graphics 500 that can use hardware decoders and encoders to process video streams.
  • Multiple LAN ports for direct connection of network cameras without the need to install a switch.
  • Wide operating temperature range (-20+70).

How it works:

  1. The car is equipped with network cameras powered by Ethernet, POE (one for the driver or two: for the driver and passenger).
  2. The data from the cameras go directly to the computer, in this case AAEON NVR 3350.
  3. The computer performs decoding and cutting of the video stream into frames.
  4. Frames with a given divider of the frame rate are processed by the neural network.
  5. The neural network returns the probability of an event (smoking or having a phone in hand). Each image is passed through these neural networks sequentially. If one of them gave a higher probability, conventionally, 50%, then the photo and a record about it are recorded in a temporary table in the database (in memory).
  6. The time of action / violation is recorded based on the number of recurring events.
  7. If the action time exceeds the specified constant (10 seconds), then the fact of the event is recorded in the database. The event includes the following information:
    • date, time
    • photograph of the violation
    • duration of the event in seconds
    • vehicle identifier (static GUID)
    • camera number
    • event type
  8. Event data upon the availability of 3G / LTE is transmitted to the central data processing server with integration with the existing car sharing information system for billing operations.

We tried to share the experience of implementing and integrating AI solutions using the example of transport infrastructure. It is important that most of the automation objects are already equipped with cameras, and it is possible to process existing flows without any significant modernization.

Cough detector

We implemented a detector for coughing people, but not by posture (since this requires a lot of resources), but by classifying incoming photos after face detection with an expansion of the zone.

Cough detector for Intel NUC

In a complicated language, the business task sounds like this - the detection of people with symptoms of diseases at the stage of security checks at airports and train stations, informing officials about the presence of signs of the disease for additional checks. The expected result in the short term is to minimize the spread of the coronavirus infection COVID-19 within the framework of local and international rail, transport and air transportation.

As a way of implementation, we considered the option using object video analytics to detect the presence of external signs of the disease (for example, cough, its duration and the number of attacks over a period of time) from CCTV cameras. Through the use of neural networks for the detection, re-identification and tracking of objects in the visibility zones, as well as the preservation of signs of the disease and their frequency, at the stage of approaching a particular person to the inspection zone, it is possible to inform the staff about the need for additional checks (for example, measuring body temperature).

First, let's clarify that we are using an Intel NUC8i5BEK with an 8th generation Intel Core i5 processor and integrated Intel Iris Plus 655 graphics. In this case, the execution of neural networks can be run on the GPU, freeing up the CPU for trajectory analysis. And if the number of cameras connected to the device increases, the complex can be retrofitted with accelerators, for example, Intel NCS2.

We use the Intel OpenVINO framework, because it allows us to efficiently execute neural networks on Intel processors and, more importantly, use Intel integrated graphics. The model we use is the SSD mobilenet v2, pre-trained on the COCO dataset. Tensorflow was used to train the model.

Intel NUC8i5BEH

Why we chose NUC:

  1. The low cost of the 8th generation processors bundled with the device, in contrast to the market value of the components separately.
  2. High performance in inference due to the integrated graphics Iris Plus 655. Iris Plus 655 in terms of performance of neural networks is 25% higher than Intel UHD Graphics 630, used in desktop processors (from i5 8400 to i9 9900k).
  3. Ability to increase the number of processed streams by connecting accelerators, for example, Intel NCS2 without changing the network topology and framework.
  4. Low power consumption at maximum load: 28 W versus 65 W for the desktop counterpart.
  5. Ability to use devices within server and cloud infrastructure.

As a part of this solution, we did the following:

  1. Collected and structured the initial data for training (prepared a dataset).
  2. Trained the classifier of the presence of external signs of the disease on SSD Mobilenet V2.
  3. Converted the model to Intel OpenVINO.
  4. We assembled a cascade of neural networks running Intel OpenVINO for sequential execution of the following operations: face detection and determination of the probability of infection signs with recording of events, their frequency and duration.

The classification result is the probability of the presence of a feature in a photo or frame from a video stream. Illustrative example:

You can check the operation of the detector and classifier at Telegram bot. At the entrance, the bot takes a photo from a camera or gallery, and the result gives the probability of a coughing person in the frame.

Next, we assigned the detection zones using the example of a camera at hand. It turned out like this (interface ComBox Monster Vision):

The first detection is faces, a network from the public and accessible model zoo, Intel OpenVINO. OpenCV implements trajectory analysis to hold an object (face) in the frame. Next, faces with zone expansion are passed to the symptom classifier and the probability is returned.

We record the events (cough) and their duration. It is assumed that at the checkpoint, due to the re-identification of persons at 5 points (quickly, but not very accurately), it will be possible to notify the personnel of transport hubs about the need for additional checks (for example, measuring body temperature).

Detector for medical masks and other PPE

One area of application of object video analytics is security. In this situation, it is advisable to control and detect the use of medical masks and other personal protective equipment by the personnel of retail outlets and medical institutions. To demonstrate the capabilities of neural networks, we implemented a simple mask detector based on Mobilenet, which can be found in the Telegram bot -

Passenger traffic detector and counter for the transport industry

To accurately analyze and manage the logistics of a transport system that includes buses, minibuses, trolleybuses, and trams, you need statistics for calculating passenger traffic.

Counting incoming passengers is necessary to create new routes, optimize existing ones, clarify the number of vehicles involved on the route and scheduling, as well as to minimize the risk of theft on commercial routes.

To solve this problem, passenger traffic control systems are implemented in transport. Our solution consists of video cameras that are installed above the entrance areas and a microcomputer with a GPS / LTE module that is responsible for collecting and transmitting data to the data center. The data center implements a system for storing video archives from all cameras connected to the system with a storage depth of 1 month, as well as video analytics, the result of which is the number of incoming and outgoing passengers over a period of time.


  • Counting incoming and outgoing passengers on transport with an accuracy of at least 95%
  • Head count of incoming and outgoing passengers(high angle cameras)

Features and functionality:

  • Availability of the solution for various types of transport (1/2/3 doors on passenger buses, railway transport, etc.)
  • Data processing "on the edge" or in the data center according to the data of previously generated video archives
  • Neural network execution under Intel OpenVINO
  • Wide range power supply
  • Insensitivity to light changes
  • Video recording storage in internal memory for at least 3 days
  • Storage of video archives in the data center for at least 1 month
  • Connecting to third-party monitoring units for REST API
  • Remote data transfer to the server from the vehicle via the GSM network

  1. The accuracy of detecting and counting the number of incoming and outgoing passengers is at least 95%
  2. Accounting not only for incoming traffic, but also for outgoing traffic
  3. A single universal solution for various types of passenger transport
  4. Enabling or disabling video Analytics and cloud storage at the Customer's request
  5. Video fact of confirmation of each entry and exit of passengers during the last calendar month
  6. Low cost of one-time investment for a set of equipment with high detection accuracy
  7. Exclusion of children less than 1 m in height from statistics
  8. No duplication of entrances of passengers who left to skip exiting (tracking and holding in the frame)

User's personal account and displaying statistics on the entrances / exits of passengers:


Traffic accident detector

Project for the introduction of artificial intelligence technologies in the automation of emergency and accident detection processes. To demonstrate the capabilities of technologies at the current stage of development, we have implemented and trained a neural network to recognize the presence of an accident from a photo or video. You can view it and check its work on the bot in Telegram at the link - The bot needs to send a photo with or without an accident and it automatically recognizes the fact of its presence, returning the probability. When working with video, we cut the stream into frames and repeat a similar action (not integrated in the bot). In the next version, we plan to integrate the solution into some vendors' DVRs and mobile platforms to automate the process of obtaining information on the facts of road accidents in real time. The recognition result, photos, and metadata (geolocation, location, time, and author data, if you have the appropriate permissions) can be sent and published in groups in semi-automatic mode (for example, suggested news).

Back to main page
Благодарим за Ваше обращение!
Наш специалист свяжется с Вами в самом ближайшем будущем чтобы обсудить вопросы взаимодействия.