Study Shows AI Models Don’t Match Human Visual Processing

On Sep 23, 2022

A new study from York University shows that deep convolutional neural networks (DCNNs) don’t match human visual processing by using configural shape perception. According to Professor James Elder, co-author of the study, this could have serious and dangerous real-world implications for AI applications.

The new study titled “Deep learning models fail to capture the configural nature of human shape perception” was published in the Cell Press journal iScience.

It was a collaborative study by Elder, who holds the York Research Chair in Human and Computer Vision, as well as the Co-Director position of York’s Center for AI & Society, and Professor Nicholas Baker, who is an assistant psychology professor and former VISTA postdoctoral fellow at York.

Novel Visual Stimuli “Frankensteins”

The team relied on novel visual stimuli referred to as “Frankensteins,” which helped them explore how both the human brain and DCNNs process holistic, configural object properties.

“Frankensteins are simply objects that have been taken apart and put back together the wrong way around,” Elder says. “As a result, they have all the right local features, but in the wrong places.”

The study found that DCNNs are not confused by Frankensteins like the human visual system is. This reveals an insensitivity to configural object properties.

“Our results explain why deep AI models fail under certain conditions and point to the need to consider tasks beyond object recognition in order to understand visual processing in the brain,” Elder continues. “These deep models tend to take ‘shortcuts’ when solving complex recognition tasks. While these shortcuts may work in many cases, they can be dangerous in some of the real-world AI applications we are currently working on with our industry and government partners.”

Image: York University

Real-World Implications

Elder says that one of these applications is traffic video safety systems.

“The objects in a busy traffic scene — the vehicles, bicycles and pedestrians — obstruct each other and arrive at the eye of a driver as a jumble of disconnected fragments,” he says. “The brain needs to correctly group those fragments to identify the correct categories and locations of the objects. An AI system for traffic safety monitoring that is only able to perceive the fragments individually will fail at this task, potentially misunderstanding the risks to vulnerable road users.”

The researchers also say that modifications to training and architecture aimed at making networks more brain-like did not achieve configural processing. None of the networks could accurately predict trial-by-trial human object judgements.

“We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition,” Elder concludes

Credit: Source link