This AI Paper Dives into Embodied Evaluations: Unveiling the Tong Test as a Novel Benchmark for Progress Toward Artificial General Intelligence
Unlike narrow or specialized AI systems designed for specific tasks, Artificial General Intelligence (AGI) can perform a wide range of functions that aim to replicate human intelligence’s broad cognitive abilities and adaptability. AGI can function autonomously by making decisions and taking actions independently. AGI can also comprehend ambiguous or incomplete information.
Achieving AGI is a complex and challenging endeavor, as it requires solving numerous difficult problems in machine learning, natural language processing, robotics, and other AI-related fields.
Researchers at the National Key Laboratory of General Artificial Intelligence propose a new way of evaluating AGI by introducing the Tong Test. “Tong” corresponds to the Chinese character of general in AGI.
They propose that AGI evaluation should be rooted in scenarios with the complex environments of DEPSI. They say that only through evaluations within DEPSI can the human-like abilities of AGI, such as commonsense reasoning, intention inference of social interactions, trust, and self-awareness, be promptly assessed. The Tong test offers a new perspective on AGI evaluation by emphasizing the importance of DEPSI as ability, value-oriented rather than a task-oriented evaluation.
The Tong test is a benchmark and evaluation system focusing on essential features such as infinite tasks, self-driven task generation, value alignment, and causal understanding. Their proposed virtual platform could also support embodied AI in training and testing. Embodied AI agents acquire information within this platform and continue to learn and finetune their values and abilities interactively.
To support infinite tasks, they follow a compositional graphical model as a basic form of knowledge representation that parses any given scene’s spatial, temporal, and causal relations. They define a fluent space for the time-varying variables; these represent all possible scene configurations that can be represented within a continuous DEPSI environment space.
The Tong test spans two domains called the U–V dual system. The U-system describes the agent’s understanding of extrinsic physical or social rules. In contrast, the V-system comprises the agent’s intrinsic values, defined as a set of value functions upon which the self-driven behaviors of the agent are built. The Tong test platform has modules for intermediate data visualization and a panel that displays the model’s performance, indicating how well the tested model performed.
Thus, the proposed Tong test based on DEPSI defines the five multidimensional levels of values and abilities and provides a practical pathway for building theoretical guidance for developing AI algorithms.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.
Credit: Source link
Comments are closed.