Meet Hailo-8™: An AI Processor That Uses Computer Vision For Multi-Camera Multi-Person Re-Identification
Multi-person re-identification is an important aspect of today’s video surveillance systems. This process allows the user to identify individuals across multiple video streams, which can be helpful in data analysis and security operations. High-performance computing is frequently needed for multi-person re-identification. Multi-person re-identification is put into practice using deep learning, extending to the identification of a particular individual repeatedly, either in a specific location over time or along a trail between multiple locations. Many factors, such as occlusions, various viewpoints, and lighting conditions of each camera, present a significant difficulty for effective tracking.
There are various benefits to running a multi-camera, multi-person, re-identification program on edge devices. The Hailo-8 AI processor provides the efficiency required for accurate, real-time, multi-person re-identification on edge devices. Some benefits of the Hailo-8 AI processor include:
- Compute power enables processing many people concurrently with high accuracy, which is crucial for high-quality re-identification.
- Improves video analytics and is cost-effective without compromising user privacy
- Reduces system costs by installing and maintaining a single AI accelerator to analyze numerous cameras in real-time.
- Maintaining privacy and enhancing data security by eliminating the need to send raw footage
- Detection latency, which is essential for real-time warnings, is also improved.
APPLICATION PIPELINE
Hailo’s TAPPAS (Template APPlications And Solutions) is an infrastructure containing a suite of high-performance pre-trained template AI tasks and applications with pipeline elements, built on top of state-of-the-art deep neural networks, demonstrating Hailo-8™ best-in-class throughput and power efficiency. GStreamer on an embedded host, Hailo-8™ running in real-time (without batching), and four RTSP IP cameras in FHD input resolution are used in the Hailo TAPPAS multi-camera re-identification pipeline. The host acquires the encoded video over Ethernet, decodes it, and sends the decoded frames for processing on Hailo-8™ over PCIe. The final output is displayed on the screen over HDMI.
Decoded and De-warped
The first stages of the application pipeline include decoding and de-warping encoded input to obtain aligned frames for processing. De-warping is a common computer vision component used to eliminate any distortion caused by the camera. Any commonly known distortions, such as the fisheye distortion in security cameras, is removed via de-warping. Before processing, the encoded input is decoded over Ethernet and then de-warped to produce aligned frames. The Hailo-8TM AI processor is then given the frames over PCIe, which it uses to identify every individual and face in each frame. The initial tracking of the objects in each stream is done using the Hailo GStreamer Tracker. After being clipped from the original frame, each person is sent into a Re-ID network. This network generates an embedding vector for each person that may be compared across various cameras using HDMI cables.
Hailo Model Zoo: Deep Learning Models for CV Tasks
The pretrained weights and precompiled models were made available in the Hailo Model Zoo. Hailo Model Zoo consists of pre-trained, deep learning models for various computer vision tasks. All neural network models were compiled using the Hailo Dataflow Compiler. The Hailo Data Complier integrates with existing deep learning development frameworks to allow smooth and easy integration in existing development ecosystems. In order to make it simpler to adapt to different settings, the Hailo Model Zoo also offers a retraining docker environment for custom datasets. The team also highlights that all models can be tuned for particular use cases and that they were all trained using rather general use cases.
The YOLOv5s network, released in 2020, is the foundation of the multi-person or face detection mechanism. The precise single-stage object detector has two classes: person and face. Various datasets were collected and preprocessed to the same annotation format to train the detection network. Modern face identification models trained on publicly accessible datasets were employed for face annotations. The team could detect persons and faces with increased accuracy, even at greater distances, by using strong neural networks such as YOLOv5. This allowed the application to find and follow even minor items.
Based on Rep-VGG-A0, the Pytorch-trained person Re-ID network produces a single embedding vector of length 2048 for each query. The team combined various Re-ID datasets into a single training approach to increase the Rank-1 accuracy on the validation dataset (Market-1501). The team developed a more robust network that generalizes better to real-world settings by using more varied training data. The Hailo Model Zoo contains retraining instructions and a complete docker environment to train the network from pre-trained weights.
Deploying the Pipeline Using HAILO TAPPAS
The pipeline is built using GStreamer in C++ as part of the Hailo TAPPAS program. It features numerous other arguments that allow the user to select the settings for the detector, the tracker (keep/lost frame rate), and the quality estimation (minimum quality threshold), in addition to allowing them to run from video files or RTSP cameras. Researchers can also retrain neural networks using their preferred data using the Hailo Model Zoo, then migrate those networks to the TAPPAS application for quick domain adaptation and customization. A surveillance pipeline based on Hailo-8TM and the embedded host processor is intended to be built with the help of the multi-camera, multi-person re-identification application, which aims to provide quick prototyping and a reliable foundation.
As part of the Hailo runtime library, HailoRT, Hailo has provided a GStreamer plugin for inference on the Hailo-8TM microprocessor (libgsthailo). The entire configuration and inference process is handled by this plugin on the chip, making it simple and easy to integrate the Hailo-8TM into the GStreamer pipeline. To facilitate complex pipelines, it also permits an inference of a multi-network pipeline on a single Hailo-8TM processor. The team also unveiled a network scheduler, which automates the network switch, and makes it easier to run several networks on a single Hailo device. The network scheduler automatically manages when each network is active instead of requiring manual selection. Hailo-8TM pipeline creation is significantly cleaner, easier, and more effective when the scheduler is used.
The team also introduced some additional GStreamer plugins, such as de-warping, box anonymization, and gallery search, in addition to the aforementioned HailoRT components. The box anonymization plugin enables one to blur boxes in an image given a predicted box, while the de-warping plugin, implemented using OpenCV, allows one to correct camera distortions. The database component is added to the pipeline by the gallery search plugin, which enables users to look for matches in the database. To correlate predictions between various cameras and timestamps, this program compares the Re-ID vectors to fresh vectors.
Performance
The following table summarizes the performance of the multi-camera multi-person tracking application on Hailo-8™ and x86 host processor with four RTSP camera in FHD input resolution (1920×1080) as well as the breakdown of the NN standalone performance.
In order to facilitate customization with the Hailo Model Zoo, the Hailo multi-camera multi-person re-identification application offers a whole reference pipeline constructed in GStreamer with Hailo TAPPAS and retraining capabilities for each neural network. This application offers a foundation for creating a specific Hailo-8TM-based VMS product. The TAPPAS documentation contains additional information.
Check out the Hailo Model Zoo GitHub repository and the source for the article. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.
Credit: Source link
Comments are closed.