All eyes will be on Saudi giants Al Nassr's first 11 and bench as they host Indian club FC Goa in the vital AFC Champions League Two Group D fixture here on
Sports

All eyes will be on Saudi giants Al Nassr's first 11 and bench as they host Indian club FC Goa in the vital AFC Champions League Two Group D fixture here on Wednesday. The Group D toppers are yet to confirm the availability of their marquee player, Cristiano Ronaldo, for the game. At the moment

2
3
4
5

Leading AI Models Perform Well In Basic Tasks But Struggle With Scientific Reasoning: Study

Posted By: Ramesh Sharma Posted On: Oct 14, 2025Share Article
Researchers from the Indian Institute of Technology (IIT) Delhi and Friedrich Schiller University Jena (FSU Jena), Germany have found that while leading Artificial Intelligence (AI) models perform well in basic tasks, they struggle with scientific reasoning. Their findings, published in Nature Computational Science, show that these AI models have important limitations that could be risky if used in research without proper supervision.Researchers from the Indian Institute of Technology (IIT) Delhi and Friedrich Schiller University Jena (FSU Jena), Germany have found that while leading Artificial Intelligence (AI) models perform well in basic tasks, they struggle with scientific reasoning. Their findings, published in Nature Computational Science, show that these AI models have important limitations that could be risky if used in research without proper supervision.The team, led by NM Anoop Krishnan, associate professor at IIT Delhi, and Kevin Maik Jablonka, professor at FSU Jena, developed “MaCBench

Leading AI Models Perform Well In Basic Tasks But Struggle With Scientific Reasoning: Study

Researchers from the Indian Institute of Technology (IIT) Delhi and Friedrich Schiller University Jena (FSU Jena), Germany have found that while leading Artificial Intelligence (AI) models perform well in basic tasks, they struggle with scientific reasoning. Their findings, published in Nature Computational Science, show that these AI models have important limitations that could be risky if used in research without proper supervision.

The team, led by NM Anoop Krishnan, associate professor at IIT Delhi, and Kevin Maik Jablonka, professor at FSU Jena, developed “MaCBench", the first benchmark designed to test how vision-language AI models handle real-world tasks in chemistry and materials science.

The results revealed a notable paradox. AI models achieved near-perfect results in basic perception tasks like identifying lab equipment but struggled with spatial reasoning, combining information from multiple sources, and multi-step logical thinking, skills necessary for real scientific discovery.

“Our findings represent a crucial reality check for the scientific community. While these AI systems show remarkable capabilities in routine data processing tasks, they are not yet ready for autonomous scientific reasoning. The strong correlation we observed between model performance and internet data availability suggests these systems may be relying more on pattern matching than genuine scientific understanding," Krishnan explained.

One concerning finding was related to laboratory safety. “While models excelled at identifying laboratory equipment with 77 pc accuracy, they performed poorly when evaluating safety hazards in similar laboratory setups, achieving only 46 pc accuracy. This disparity between equipment recognition and safety reasoning is particularly alarming," said Kevin Maik Jablonka.

“It suggests that current AI models cannot bridge the gaps in tacit knowledge that are crucial for safe laboratory operations. Scientists must understand these limitations before integrating AI into safety-critical research environments," he added.

The researchers also conducted ablation studies to understand where AI models fail. They found that models performed much better when information was presented as text rather than images, showing that current AI struggles with multimodal integration, a key requirement for scientific work.

ALSO READ: IIT Roorkee Launches Advanced Certificate In Quantum Computing

These findings have implications beyond chemistry and materials science, pointing to broader challenges for AI in scientific research. Developing reliable AI assistants will require improvements in training methods that focus on real understanding rather than just pattern recognition.

“Our work provides a roadmap for both the capabilities and limitations of current AI systems in science. While these models show promise as assistive tools for routine tasks, human oversight remains essential for complex reasoning and safety-critical decisions. The path forward requires better uncertainty quantification and frameworks for effective human-AI collaboration," said Indrajeet Mandal, IIT Delhi PhD scholar.

Comment on Post

Leave a comment

If you have a News Orbit 360 user account, your address will be used to display your profile picture.


All eyes will be on Saudi giants Al Nassr's first 11 and bench as they host Indian club FC Goa in the vital AFC Champions League Two Group D fixture here on
Sports
Will Cristiano Ronaldo Play In The FC Goa Vs Al Nassr Match Riyadh

All eyes will be on Saudi giants Al Nassr's first 11 and bench as they host Indian club FC Goa in the vital AFC Champions League Two Group D fixture here on Wednesday. The Group D toppers are yet to confirm the availability of their marquee player, Cristiano Ronaldo, for the game. At the moment

4 months ago


Sing Up