Researchers from Microsoft and ETH Zurich Introduce HoloAssist: A Multimodal Dataset for Next-Gen AI Copilots for the Physical World

In the field of artificial intelligence, a persistent challenge has been developing interactive AI assistants that can effectively navigate and assist in real-world tasks. While significant progress has been made in the digital domain, such as language models, the physical world presents unique hurdles for AI systems.

The main obstacle that researchers often face is the lack of firsthand experience for AI assistants in the physical world, preventing them from perceiving, reasoning, and actively assisting in real-world scenarios. This limitation is attributed to the necessity of specific data for training AI models in physical tasks.

To address this issue, a team of researchers from Microsoft and ETH Zurich has introduced a groundbreaking dataset called “HoloAssist.” This dataset is built for egocentric, first-person, human interaction scenarios in the real world. It involves two participants collaborating on physical manipulation tasks: a task performer wearing a mixed-reality headset and a task instructor who observes and provides verbal instructions in real-time.

HoloAssist boasts an extensive collection of data, including 166 hours of recordings with 222 diverse participants, forming 350 unique instructor-performer pairs completing 20 object-centric manipulation tasks. These tasks encompass a wide range of objects, from everyday electronic devices to specialized industrial items. The dataset captures seven synchronized sensor modalities: RGB, depth, head pose, 3D hand pose, eye gaze, audio, and IMU, providing a comprehensive understanding of human actions and intentions. Additionally, it offers third-person manual annotations, including text summaries, intervention types, mistake annotations, and action segments.

Unlike previous datasets, HoloAssist’s distinctive feature lies in its multi-person, interactive task execution setting, enabling the development of anticipatory and proactive AI assistants. These assistants can offer timely instructions grounded in the environment, enhancing the traditional “chat-based” AI assistant model.

The research team evaluated the dataset’s performance in action classification and anticipation tasks, providing empirical results that shed light on the significance of different modalities in various tasks. Additionally, they introduced new benchmarks focused on mistake detection, intervention type prediction, and 3D hand pose forecasting, essential elements for intelligent assistant development.

In conclusion, this work represents an initial step toward exploring how intelligent agents can collaborate with humans in real-world tasks. The HoloAssist dataset, along with associated benchmarks and tools, is expected to advance research in building powerful AI assistants for everyday real-world tasks, opening doors to numerous future research directions.


Check out the Paper and Microsoft Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..


Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.


▶️ Now Watch AI Research Updates On Our Youtube Channel [Watch Now]

Credit: Source link

Comments are closed.