Deepmind Researchers Open-Source TAPIR: A New AI Model for Tracking Any Point (TAP) that Effectively Tracks a Query Point in a Video Sequence

On Jun 22, 2023

Computer vision is one of the most popular fields of Artificial Intelligence. The models developed using computer vision are able to derive meaningful information from different types of media, be it digital images, videos, or any other visual inputs. It teaches machines how to perceive and understand visual information and then act upon the details. Computer vision has taken a significant leap forward with the introduction of a new model called Tracking Any Point with per-frame Initialization and Temporal Refinement (TAPIR). TAPIR has been designed with the aim of effectively tracking a specific point of interest in a video sequence.

Developed by a team of researchers from Google DeepMind, VGG, Department of Engineering Science, and the University of Oxford, the algorithm behind the TAPIR model consists of two stages – a matching stage and a refinement stage. In the matching stage, the TAPIR model analyzes each video sequence frame separately to find a suitable candidate point match for the query point. This step seeks to identify the query point’s most likely related point in each frame, and in order to ensure that the TAPIR model can follow the query point’s movement across the video, this procedure is carried out frame by frame.

The matching stage in which candidate point matches are identified is followed by the employment of the refinement stage. In this stage, the TAPIR model updates both the trajectory, which is the path followed by the query point, and the query features based on local correlations and thus takes into account the surrounding information in each frame to improve the accuracy and precision of tracking the query point. The refining stage improves the model’s capacity to precisely track the movement of the query point and adjust to variations in the video sequence by integrating local correlations.

🚀 JOIN the fastest ML Subreddit Community

For the evaluation of the TAPIR model, the team has used the TAP-Vid benchmark, which is a standardized evaluation dataset for video tracking tasks. The results showed that the TAPIR model performs significantly better than the baseline techniques. The performance improvement has been measured using a metric called Average Jaccard (AJ), upon which the TAPIR model has shown to achieve an approximate 20% absolute improvement in AJ compared to other methods on the DAVIS (Densely Annotated VIdeo Segmentation) benchmark.

The model has been designed to facilitate fast parallel inference on long video sequences, i.e., it can process multiple frames simultaneously, improving the efficiency of tracking tasks. The team has mentioned that the model can be applied live, enabling it to process and keep track of points as new video frames are added. It can track 256 points on a 256×256 video at a rate of about 40 frames per second (fps) and can also be expanded to handle films with higher resolution, giving it flexibility in how it handles videos of various sizes and quality.

The team has provided two online Google Colab demos for the users to try TAPIR without installation. The first Colab demo enables users to run the model on their own videos, providing an interactive experience to test and observe the model’s performance. The second demo focuses on running TAPIR in an online fashion. Also, the users can run TAPIR live by tracking points on their own webcams with a modern GPU by cloning the codebase provided.

Check Out The Paper and Project. Don’t forget to join our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Featured Tools From AI Tools Club

🚀 Check Out 100’s AI Tools in AI Tools Club

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

Credit: Source link