This AI Paper Introduces RTMO: A Breakthrough in Real-Time Multi-Person Pose Estimation Using Dual 1-D Heatmaps

The field of pose estimation, which involves determining the position and orientation of an object in space, is a rapidly evolving area, with researchers continuously developing new methods to improve its accuracy and performance. Researchers from three highly regarded institutions – Tsinghua Shenzhen International Graduate School, Shanghai AI Laboratory, and Nanyang Technological University – have recently contributed to the field by developing a new RTMO framework. The framework has the potential to enhance the accuracy and efficiency of pose estimation and could have a significant impact on various applications, including robotics, augmented reality, and virtual reality.

RTMO is a one-stage pose estimation framework designed to overcome the trade-off between accuracy and real-time performance in existing methods. RTMO integrates coordinate classification and dense prediction models, outperforming other one-stage pose estimators by achieving comparable accuracy to top-down approaches while maintaining high speed. 

Real-time multi-person pose estimation is a challenge in computer vision, with existing methods needing help to balance speed and accuracy. Current approaches, either top-down or one-stage, have limitations regarding inference time or accuracy. RTMO is a one-stage pose estimation framework that combines coordinate classification with the YOLO architecture. Overcoming challenges through a dynamic coordinate classifier and tailored loss functions, RTMO outperforms existing one-stage pose estimators, achieving higher Average Precision on COCO while maintaining real-time performance.

The study presents a real-time multi-person pose estimation framework, RTMO, employing a YOLO-like architecture with CSPDarknet as the backbone and a Hybrid Encoder. Dual convolution blocks generate scores and pose features at each spatial level. The method addresses incompatibilities between coordinate classification and dense prediction models by employing a dynamic coordinate classifier and a tailored loss function for heatmap learning. Dynamic Bin Encoding is utilized for creating bin-specific representations, and Gaussian label smoothing with cross-entropy loss is employed for classification tasks. 

RTMO, a one-stage pose estimation framework, excels in multi-person pose estimation by achieving high accuracy and real-time performance. Outperforming cutting-edge one-stage pose estimators, it attains a 1.1% higher Average Precision on COCO while operating about nine times faster with the same backbone. The largest model, RTMO-l, achieves 74.8% AP on COCO val2017 and runs 141 frames per second on a single V100 GPU. Across different scenarios, the RTMO series outperforms comparable lightweight one-stage methods in performance and speed, demonstrating efficiency and accuracy. With additional training data, RTMO-l achieves a state-of-the-art 81.7 Average Precision. The framework generates spatially accurate heatmaps, facilitating robust and context-aware predictions for each key point.

https://arxiv.org/abs/2312.07526v1

In conclusion, the study can be summarized in a few points mentioned:

  • RTMO is a pose estimation framework with high accuracy and real-time performance.
  • It seamlessly integrates coordinate classification within the YOLO architecture.
  • RTMO employs an innovative coordinate classification technique using coordinate bins for precise keypoint localization.
  • It outperforms cutting-edge one-stage pose estimators and achieves higher Average Precision on COCO while being significantly faster.
  • RTMO excels in challenging multi-person scenarios, generating spatially accurate heatmaps for robust, context-aware predictions.
  • RTMO balances performance and speed among existing top-down and one-stage multi-person pose estimation methods.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.


🐝 [FREE AI WEBINAR] ‘Building Multimodal Apps with LlamaIndex – Chat with Text + Image Data’ Dec 18, 2023 10 am PST

Credit: Source link

Comments are closed.