Lucknow Super Giants (LSG) skipper Rishabh Pant will play his team's match against Punjab Kings (PBKS), thus allaying injury fears ahead of the important match. During LSG's match against Royal Challengers Bengaluru (RCB) on 15 April, Rishabh Pant was hit on his left arm after facing a short length
Who is Kajal Meena? Rajasthan SDM and RAS topper arrested for accepting bribeKajal Meena, an IIT graduate, was held by the Anti-Corruption Bureau (ACB) on April 16, along with two others. Updated on: Apr 19, 2026 5:10 PM IST By Arya Mishra Share via Copy link A sub-divisional magistrate in
It’s one thing to reach the summit. Staying there is the real test. And right now, Jannik Sinner isn’t backing down from either. Fresh off his Monte Carlo triumph, the newly crowned World No. 1 has confirmed his participation at the Madrid Open
Uttarakhand Chief Minister Pushkar Singh Dhami on Sunday extended greetings to devotees as the Char Dham Yatra 2026 commenced, urging pilgrims to follow guidelines and contribute to environmental conservation during the sacred journey. In a post on X, Dhami said
Govt okays ₹12,980-crore maritime insurance poolThe decision will help lower costs for Indian vessels as global underwriters have hiked risk-mitigation charges to historic highs due to the war in West Asia. Published on: Apr 19, 2026 1:44 PM IST By Zia Haq
Milk sits quietly at the centre of Indian households, poured into morning tea, stirred into children’s routines, and trusted almost instinctively as a source of nutrition. But that trust, increasingly, is being questioned. According to Food Safety and Standards Authority of India
Odisha gets India’s first 3D glass chip packaging plantThe facility set up by US-based 3D Glass Solutions Inc. (3DGS) at Info Valley in Bhubaneswar is worth ₹1,943 crore Published on: Apr 19, 2026 4:51 PM IST By Debabrata Mohanty, Bhubaneswar Share via Copy link Odisha chief minister Mohan
Pune dermatologist explains how to decide what hair loss treatment is right for you: Hair transplant, PRP or GFCDr Chavan says hair restoration choices depend on follicle health – PRP and GFC work on existing follicles, while transplants are for advanced baldness. Published on: Apr 19
OpenAI introduces FrontierScience to test AI’s expert-level scientific reasoning

OpenAI introduces FrontierScience to test AI’s expert-level scientific reasoning across physics, chemistry, biology
OpenAI on December 16 announced FrontierScience, a new benchmark designed to evaluate artificial intelligence systems on expert-level scientific reasoning across physics, chemistry and biology, as AI models increasingly demonstrate their ability to support real scientific research.
The company said reasoning lies at the heart of scientific work, going beyond factual recall to include hypothesis generation, testing, refinement and cross-disciplinary synthesis. As AI systems grow more capable, OpenAI said the key question is how deeply they can reason to meaningfully contribute to scientific discovery.
AI models increasingly used in real research
Over the past year, OpenAI's models have reached major milestones, including gold-medal-level performance at the International Math Olympiad and the International Olympiad in Informatics. At the same time, advanced systems such as GPT-5 are already being used by researchers to accelerate scientific workflows.According to OpenAI, scientists are deploying these models for tasks such as cross-disciplinary literature searches, multilingual research reviews and complex mathematical proofs. In many cases, work that once took days or weeks can now be completed in hours.
This progress was detailed in OpenAI's November 2025 paper, Early science acceleration experiments with GPT-5, which presented early evidence that GPT-5 can measurably speed up scientific workflows.
Why FrontierScience was created
OpenAI said that as models' reasoning and knowledge capabilities scale, existing scientific benchmarks are no longer sufficient. Many prior benchmarks focus on multiple-choice questions, have become saturated, or are not centered on real scientific reasoning.For example, when the GPQA “Google-Proof” benchmark was released in November 2023, GPT-4 scored 39%, well below the expert baseline of 70%. Two years later, GPT-5.2 scored 92%, highlighting the need for more challenging evaluations.
FrontierScience was created to fill this gap by measuring expert-level scientific capabilities using difficult, original and meaningful questions written and verified by domain experts.
What FrontierScience measures
The full FrontierScience benchmark includes more than 700 textual questions, with 160 in a gold-standard set, spanning subfields across physics, chemistry and biology.It is divided into two tracks:
-FrontierScience-Olympiad:
-100 short-answer questions
-Designed by international science olympiad medalists
-Focused on constrained, theoretical scientific reasoning
-Difficulty at least comparable to international olympiad competitions
FrontierScience-Research:
-60 original research subtasks
-Written by PhD-level scientists
-Designed to reflect real-world, multi-step research challenges
-Graded using a detailed 10-point rubric
Each task was authored and verified by subject-matter experts. Olympiad contributors were medalists in at least one international competition, while Research contributors all held relevant PhD degrees.
How model performance is graded
Olympiad questions are graded using short answers, such as numerical values, expressions or fuzzy string matches, allowing for clear verification.For Research tasks, OpenAI introduced a rubric-based grading system. Each question includes multiple objectively assessable criteria totaling 10 points, evaluating both final answers and intermediate reasoning steps. A score of 7 out of 10 or higher is considered correct.
Responses are evaluated using a model-based grader (GPT-5). While human expert grading would be ideal, OpenAI said it is not scalable at this level, so rubrics were designed to be reliably checked by a model-based system, supported by a verification pipeline.
How leading AI models performed
OpenAI evaluated several frontier AI models on FrontierScience, including GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, GPT-4o, OpenAI o4-mini and OpenAI o3.In the initial results:
-GPT-5.2 scored 77% on FrontierScience-Olympiad
-GPT-5.2 scored 25% on FrontierScience-Research
-Gemini 3 Pro closely matched GPT-5.2 on the Olympiad track with a 76% score
OpenAI said the results show substantial progress in expert-level reasoning, while leaving significant headroom for improvement, particularly on open-ended research tasks.
Strengths, limits and next steps
While FrontierScience represents a step forward in evaluating scientific reasoning, OpenAI acknowledged key limitations. The benchmark focuses on constrained, expert-written problems and does not fully capture how science is conducted in practice.In particular, it does not assess how models generate genuinely novel hypotheses, work with experimental systems, or interact with multimodal data such as video and physical-world experiments.
Looking ahead, OpenAI said progress in scientific reasoning will come from both stronger general-purpose reasoning systems and targeted improvements in scientific capabilities. FrontierScience is one tool among many, and the company plans to expand the benchmark to new domains and pair it with real-world evaluations.
Ultimately, OpenAI said, the most important measure of AI's scientific value will be the new discoveries it helps generate—and FrontierScience is designed to serve as an early indicator of that potential.
Key takeaways:
-OpenAI launched FrontierScience to test AI on expert-level scientific reasoning across physics, chemistry and biology.-Focus is on reasoning, not recall, including hypothesis generation, testing and cross-disciplinary thinking.
-AI models like GPT-5 are already accelerating research, cutting tasks from weeks to hours.
-Existing science benchmarks are no longer sufficient, prompting the need for harder, expert-written evaluations.
-FrontierScience has two tracks: Olympiad (theoretical reasoning) and Research (real-world, multi-step tasks).
-GPT-5.2 leads performance, scoring 77% on Olympiad tasks and 25% on Research tasks.
Source: LiveMint
Related Posts: OpenAI introduces IndQA benchmark to evaluate AI systems on Indian culture Reel Awards 2026 Jury Meet Sets The Benchmark For Creative Excellence At Baglami Apple Studio Display is still the benchmark Rybakina’s jaw-dropping payday sets benchmark even Alcaraz Let us make excellence our benchmark IIT Madras launches National Employability Benchmark Initiative ‘NIPTA’ to standardise internship and job readiness across India EV supply chain data firm Benchmark Mineral trims workforce iPhone 18 Series Unlikely To Come With New Design Surprises OnePlus 15 Design Teased With New Camera Module Ahead Of 2025 Launch New-Gen Hyundai i20 Spotted Testing In Europe - Big Design Changes Explained
A Delhi-based entrepreneur has sparked online debate after sharing a raw and honest look at what startup life is really like behind the scenes. Nikhil Gaur, founder of Hypeschool, shared a video on Instagram showing himself arriving at his office at 5 am
1 days ago
For many years, liver disease has been closely associated with alcohol consumption. However, this is no longer the full picture. Today, doctors are increasingly diagnosing liver disease in people who rarely or never consume alcohol. This condition
1 days ago
Like everyone else, I cannot wait for Sunday's massive game between Manchester City and Arsenal. From a manager's perspective, both teams have many strengths and very few weaknesses. The same applies to the two men who are in charge of them.Like everyone else
1 days ago
Andhra Pradesh Deputy Chief Minister Pawan Kalyan underwent surgery on Saturday evening following a health complication, leaving his fans concerned. Now, megastar Chiranjeevi has shared a health update on his brother, Pawan Kalyan, confirming that he is safe, stable
1 days ago
Gold, silver prices on Akshaya Tritiya 2026: Check latest rates in Delhi, Mumbai, other citiesIf you’re planning to buy jewellery, keep in mind that the final price may be higher, as jewellers typically add making charges and GST to the base rate. Updated on: Apr 19
1 days ago
The self-enumeration portal for the ongoing Census had been showing Pasighat in Arunachal Pradesh as a Chinese town named Medog, a retired Indian Air Force officer pointed out on Saturday. Hours later, census officials said the error had been resolved
1 days ago
Iran's Parliament Speaker claimed its forces neutralized 180 drones and targeted a US F-35 stealth fighter, signaling advanced defense capabilities. He linked these military developments to ongoing indirect talks with Washington, stating that while some consensus exists, major differences persist
1 days ago
‘Extraordinary’ security measures in Pakistan ahead of potential second round of US-Iran talksAuthorities have announced that from Sunday midnight, several sensitive areas surrounding Nur Khan Airbase and Islamabad International Airport will be sealed. Published on: Apr 19
1 days ago
The poll body noted that over 11,000 violative social media posts have been addressed since March 15, while 3,10,393 complaints were resolved via the C-Vigil app, highlighting a 96.01% resolution rate within 100 minutes. New Delhii: The (ECI) on Saturday reiterated strict compliance with legal
1 days ago
To understand why India’s China deficit keeps growing, you have to go back further than last year’s trade data. You have to go back 35 years — to 1990, when India and China were, by most measures, economic equals. New Delhi: There is a number buried inside India’s FY26 trade data that
1 days ago
Blinkit vs Instamart vs roadside vendor: Gurgaon woman reveals cheapest grocery optionA Gurgaon woman compared grocery prices and found roadside vendors were cheaper than Blinkit and Instamart. Published on: Apr 19, 2026 4:08 PM IST By Mahipal Singh Chouhan Share via Copy link A Gurgaon woman has
1 days ago
Who was Liv Perrotto? Elon Musk fulfils cancer-struck teen's last wish in heartfelt gestureElon Musk responded to 15-year-old cancer patient Liv Perrotto's questions, revealing he won't develop his own phone and sharing his love for anime. Published on: Apr 19
1 days ago
SSLC grading row: K'taka government files review petition in HCSSLC grading row: K'taka government files review petition in HC Published on: Apr 18, 2026 11:10 PM IST PTI Share via Copy link Bengaluru, The Karnataka government has filed a review petition in the High Court against a single judge
1 days ago
Kamal Haasan urges Centre to implement women's reservation bill without linking it to delimitationKamal Haasan said that if we are serious about women’s empowerment, 33% reservation must be implemented immediately. Apr 19, 2026, 17:12:50 IST By Santanu Das Share via Copy link Actor and Makkal
1 days ago
Northeast woman shares how gifting a jhaadu to Delhi landlord changed their relationship: 'United by jhaadu'A Northeast woman shared how gifting a jhaadu transformed her relationship with her landlord in Delhi. Updated on: Apr 19, 2026 4:25 PM IST By Bhavya Sukheja Share via Copy link A
1 days ago
TCS case: Kin of accused call him ‘high-performer’; blame office politics, jealousy for allegationsTCS case: Kin of accused call him ‘high-performer’; blame office politics, jealousy for allegations Published on: Apr 19, 2026 4:49 PM IST PTI Share via Copy link Nashik
1 days ago
Beer Price India: The ongoing conflict in West Asia has intensified pressure on the Indian beer sector, triggering a sharp rise in input costs, supply disruptions and continued pricing restrictions. Vivek Gupta, Chief Executive Officer and Managing Director of United Breweries Ltd
1 days ago
Prime Minister Narendra Modi asserted that the West Bengal assembly election is a fight to preserve the state's identity, accusing the Mamata Banerjee government of favoring "infiltrators" over native populations. He alleged the TMC aims to form a "government of infiltrators and for infiltrators
1 days ago
New Delhi, Apr 19 (PTI) Delhi Traffic Police has booked 269 people for drunk driving during a special integrated night checking across the national capital, officials said on Sunday. The three-hour operation was conducted between 9 pm and midnight on Saturday in coordination with local police and
1 days ago
New Delhi:The Reserve Bank of India (RBI) has announced that multiple state governments will raise a total of Rs 16,900 crore through the revised auction of State Government Securities (SGS), scheduled to be conducted on April 21. According to the Central Bank release
1 days ago
Ukraine's President Volodymyr Zelensky has condemned a US decision to extend the period during which Russia is allowed to sell oil despite Western sanctions. The move means countries can purchase Russian oil and petroleum products already loaded on vessels at sea until 16 May
1 days ago
The Trinamool Congress on Sunday refuted a news report claiming that political consultancy Indian Political Action Committee has paused its operations in West Bengal ahead of the Assembly elections, calling it “completely baseless”. In a statement
1 days ago
BCCI set to extend chief selector Ajit Agarkar’s contract to June 2027 after multiple ICC titles, seeking continuity before next ODI World Cup amid major team transition Ajit Agarkar, chairman of India’s senior selection committee, is set to receive a one-year contract extension when his
1 days ago
Kalki Koechlin says heartbreak caused months of insomnia, left her confused: ‘I just could not sleep'Kalki Koechlin opens up about a challenging period marked by insomnia due to heartbreak, affecting her mental clarity and daily functioning. Apr 19, 2026
1 days ago
Ajit Agarkar will continue as chairman of the senior selection committee. The BCCI renewed his contract for one more year. This decision focuses on the 2027 ODI World Cup. Under Agarkar, Indian teams reached four ICC tournament finals, winning three. Continuity is a key factor for the board
1 days ago
With Deepika Padukone recently announcing that she is expecting her second child with Ranveer Singh, conversations around pregnancy, nutrition, and maternal well-being are once again in the spotlight. The couple shared the news through a heartfelt post
1 days ago
Lenskart has publicly rolled out a revised in-store style guide after facing backlash over an alleged internal dress code and a separate Pongal-themed campaign. The eyewear company said it wants to make its position on cultural and religious expression clear
1 days ago