Paper Summary: Efficient Deep Learning Approach to Recognize Person Attributes by Using Hybrid Transformers for Surveillance Scenarios
Person Re-Id Attributes or Person Attribute Recognition (PAR) identifies and classifies attributes of people in images or videos. PAR is important in video surveillance, autonomous driving, and robotic navigation applications. Researchers use techniques such as deep neural networks, CNNs, RNNs, and transformer architectures, but many challenges still exist, like over-fitting and limited datasets. Researchers are exploring semi-supervised learning frameworks and lightweight improvement techniques to address these challenges.
Recently, many models and techniques have been proposed to improve PAR, including DeepSAR, DeepMAR, HP-Net, multitask deep models, JRL, attention models, weakly supervised attention localization, STN, and feature map visualization.
In the continuity of the work in this context, a research team from India proposed a new methodology for pedestrian attribute recognition using the CoaT (co-scale mechanism transformer) model. The proposed network-oriented models include CNN (convolutional neural network) and Transformers. In the proposed methodology, the researchers suggest using the CoaT model with encoder branches at different scales while focusing attention on non-adjacent scales and implementing cross-scale, fine-to-coarse, and coarse-to-fine visual modeling. They also suggest using the Vit and DieT models for injecting absolute position embeddings to support vision tasks.
With more details, CNNs are used for feature extraction from the input images, which involves extracting spatial features from the images using a series of convolutional and pooling layers. The CNNs are trained on large datasets to learn the patterns and characteristics of different pedestrian attributes such as clothing, gender, age, etc.
Transformers are then used to refine the extracted features by modeling the dependencies between different attributes and their relationships to each other. The proposed technique uses a co-scale mechanism transformer (CoaT), which involves keeping encoder branches at different scales while focusing attention on scales that are not adjacent. This approach helps to effectively model the complex relationships between different attributes and capture fine details in the images.
To evaluate the performance of the proposed technique, the pedestrian attribute recognition is tested on several benchmark datasets such as PETA, PA100K, and RAP using standard evaluation metrics such as accuracy, F1 score, recall, and precision. The CoaT model was implemented using the PyTorch framework, and pre-trained ImageNet models were used. The experiments were conducted using different CNN backbones and transformer designs. For the RAP, PETA, and PA-100K datasets, the model was trained for 30 epochs, and a Tesla P100 GPU system was used for training. The results are reported for different levels of the CoaT model, including Small, Mini, and Tiny, to show the effectiveness of the proposed technique. Different CNN backbones and transformer designs were used, and the model was trained for 30 epochs. The obtained results show that the baseline model outperformed specially crafted methods on the PETA and PA100K datasets but performed similarly on the RAPv1 and RAPv2 datasets. Performance metrics were calculated for different CNN models.
In summary, the text discusses the importance of person attribute recognition in various applications, the challenges researchers face in developing accurate models, and the recent techniques proposed to address these challenges. Then, we presented a new methodology proposed by an Indian research team that involves using a CoaT model with encoder branches at different scales and transformers to refine extracted features. The proposed technique was tested on various benchmark datasets, and the results showed that the baseline model outperformed specially crafted methods on some datasets but performed similarly on others. The study highlights the potential of the proposed approach and its effectiveness in pedestrian attribute recognition.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.
Credit: Source link
Comments are closed.