Meet SAM-PT: A New AI Method Extending Segment Anything Model’s (SAM) Capability to Tracking and Segmenting Anything in Dynamic Videos

Numerous applications, such as robotics, autonomous driving, and video editing, benefit from video segmentation. Deep neural networks have made great progress in the last several years. However, the existing approaches need help with untried data, especially in zero-shot scenarios. These models need specific video segmentation data for fine-tuning to maintain consistent performance across diverse scenarios. In a zero-shot setting, or when these models are transferred to video domains they have not been trained on and encompass object categories that fall outside of the training distribution, the current methods in semi-supervised Video Object Segmentation (VOS) and Video Instance Segmentation (VIS) show performance gaps when dealing with unseen data. 

Using successful models from the image segmentation domain for video segmentation tasks offers a potential solution to these problems. The Segment Anything concept (SAM) is one such promising concept. With an astonishing 11 million pictures and more than 1 billion masks, the SA-1B dataset served as the training ground for SAM, a strong foundation model for image segmentation. SAM’s outstanding zero-shot generalization skills are made possible by its huge training set. The model has proven to operate reliably in various downstream tasks using zero-shot transfer protocols, is very customizable, and can create high-quality masks from a single foreground point. 

SAM exhibits strong zero-shot image segmentation skills. However, it is not naturally suitable for video segmentation problems. SAM has recently been modified to include video segmentation. As an illustration, TAM combines SAM with the cutting-edge memory-based mask tracker XMem. Similar to how SAM-Track combines DeAOT with SAM. While these techniques largely restore SAM’s performance on in-distribution data, they fall short when applied to more difficult, zero-shot conditions. Many segmentation issues may be resolved using visual prompting by other techniques that do not need SAM, including SegGPT, although they still require mask annotation for the initial video frame. 

[Sponsored] 🔥 Build your personal brand with Taplio  🚀 The 1st all-in-one AI-powered tool to grow on LinkedIn. Create better LinkedIn content 10x faster, schedule, analyze your stats & engage. Try it for free!

This issue poses a substantial obstacle to zero-shot video segmentation, especially as researchers work to create simple techniques to generalize to new situations and reliably produce high-quality segmentation across various video domains. Researchers from ETH Zurich, HKUST and EPFL introduce SAM-PT (Segment Anything Meets Point Tracking). This approach offers a fresh approach to the issue by being the first to segment videos using sparse point tracking and SAM. Instead of utilizing mask propagation or object-centric dense feature matching, they suggest a point-driven method that uses the detailed local structural data encoded in movies to track points. 

Because of this, it only needs sparse points to be annotated in the first frame to indicate the target item and offers superior generalization to unseen objects, a strength that was proved on the open-world UVO benchmark. This strategy effectively expands SAM’s capabilities to video segmentation while preserving its intrinsic flexibility. Utilizing the adaptability of modern point trackers like PIPS, SAM-PT prompts SAM with sparse point trajectories predicted using these tools. They concluded that the approach most suited for motivating SAM was initializing locations to track using K-Medoids cluster centers from a mask label. 

It is possible to distinguish clearly between the backdrop and the target items by tracking both positive and negative points. They suggest different mask decoding processes that use both points to improve the output masks further. They also developed a point re-initialization technique that improves tracking precision over time. In this method, points that have been unreliable or obscured are discarded, and points from sections or segments of the object that become visible in succeeding frames, such as when the object rotates, are added. 

Notably, their test findings show that SAMPT performs as well as or better than existing zero-shot approaches on several video segmentation benchmarks. This shows how adaptable and reliable their method is because no video segmentation data was required during training. In zero-shot settings, SAM-PT can accelerate progress on video segmentation tasks. Their website has multiple interactive video demos.


Check out the Paper, Github Link, and Project Page. Don’t forget to join our 25k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com


Featured Tools:

  • Aragon: Get stunning professional headshots effortlessly with Aragon.
  • StoryBird AI: Create personalized stories using AI
  • Taplio: Transform your LinkedIn presence with Taplio’s AI-powered platform
  • Otter AI: Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.
  • Notion: Notion AI is a robust generative AI tool that assists users with tasks like note summarization
  • tinyEinstein: tinyEinstein is an AI Marketing manager that helps you grow your Shopify store 10x faster with almost zero time investment from you.
  • AdCreative.ai: Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. 
  • SaneBox: SaneBox’s powerful AI automatically organizes your email for you, and the other smart tools ensure your email habits are more efficient than you can imagine
  • Motion: Motion is a clever tool that uses AI to create daily schedules that account for your meetings, tasks, and projects. 

🚀 Check Out 100’s AI Tools in AI Tools Club


Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.


🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Check it out here. (Sponsored)

Credit: Source link

Comments are closed.