Researchers from University College London Introduce DSP-SLAM: An Object Oriented SLAM with Deep Shape Priors

In the quickly advancing field of Artificial Intelligence (AI), Deep Learning is becoming significantly more popular and stepping into every industry to make lives easier. Simultaneous Localization and Mapping (SLAM) in AI, which is an essential component of robots, driverless vehicles, and augmented reality systems, has been experiencing revolutionary advancements recently.

SLAM involves reconstructing the surrounding environment and estimating a moving camera’s trajectory at the same time. SLAM has some incredible algorithms that are able to estimate camera trajectories precisely and produce excellent geometric reconstructions. However, geometric representations alone cannot provide important semantic information for more sophisticated tasks requiring scene understanding.

Inferring specific details about objects in the scene, like their number, size, shape, or relative pose, is a challenge for the semantic SLAM systems that are currently in use. In recent research, a team of researchers from the Department of Computer Science, University College London, has introduced the latest object-oriented SLAM system called DSP-SLAM.

DSP-SLAM has been designed to construct a comprehensive and precise joint map; the foreground objects are represented by dense 3D models, while the background is represented by sparse landmark points. The system can even function well with monocular, stereo, or stereo+LiDAR input modalities.

The team has shared that DSP-SLAM’s main function is to take the 3D point cloud that is produced as input by a feature-based SLAM system and add to it the ability to enhance its sparse map by densely reconstructing objects that have been identified. Semantic instance segmentation has been used to detect objects, and category-specific deep-shape embeddings have been used as priors to estimate the shape and pose of these objects. 

The team has shared that DSP-aware bundle adjustment is the primary feature of the system, as it creates a pose graph for the joint optimization of camera poses, object locations, and feature points. By using this strategy, the system can improve and optimize how the scene is represented, taking into account both background landmarks and foreground objects. 

Operating at a speed of 10 frames per second across multiple input modalities, i.e., monocular, stereo, and stereo+LiDAR, the proposed system has demonstrated impressive performance. DSP-SLAM has been tested on multiple datasets, such as stereo+LiDAR sequences from the KITTI odometry dataset and monocular-RGB sequences from the Freiburg and Redwood-OS datasets, to verify its capabilities.  The results have portrayed the system’s capacity to produce excellent full-object reconstructions while preserving a consistent global map, even in the face of incomplete observations.

The researchers have summarized the primary contributions as follows.

  1. DSP-SLAM combines the richness of object-aware SLAM’s semantic mapping with the accuracy of feature-based camera tracking by reconstructing the background using sparse feature points, in contrast to earlier methods that only represented objects.
  1. DSP-SLAM has outperformed methods that rely on dense depth images because it uses RGB-only monocular streams instead of Node-SLAM, and it can accurately estimate an object’s shape with as few as 50 3D points.
  1. DSP-SLAM has outperformed auto-labeling, a prior-based technique, in both quantitative and qualitative terms for object shape and pose estimation.
  1. The KITTI odometry dataset experiment results have shown that DSP-SLAM’s joint bundle adjustment outperforms ORB-SLAM2 in terms of trajectory estimation, especially when stereo+LiDAR input is used.

Check out the Paper, Project and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.


↗ Step by Step Tutorial on ‘How to Build LLM Apps that can See Hear Speak’

Credit: Source link

Comments are closed.