Important Progress Achieved by the AI Research Team of H3I at Beihang University in the Field of Multi-Source General Reasoning for Large Models-Zhongfa Aviation Institute of Beihang University

Important Progress Achieved by the AI Research Team of H3I at Beihang University in the Field of Multi-Source General Reasoning for Large Models

Click: Date: 12/02/26

Recently, the research paper ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling by the AI Research Team of Hangzhou International Innovation Institute of Beihang University (H3I) was officially published at the International Conference on Learning Representations (ICLR), a world top-tier academic conference in artificial intelligence, marking a significant step forward for the team in the research on complex reasoning and generalization capabilities of large models.

The first author of the paper is Zhang Haotian, a Class of 2024 postgraduate student from the School of Artificial Intelligence of Beihang University, and the corresponding author is Professor Liu Liu from the School of Artificial Intelligence of Beihang University and the Artificial Intelligence Innovation Center (AIIC) of H3I. The research was jointly carried out by the School of Artificial Intelligence, H3I, and the School of Computer Science and Engineering of Beihang University, in collaboration with domestic and foreign research institutions and enterprises, including Kuaishou Technology, Nanyang Technological University in Singapore, and the University of Leicester in the UK. This achievement demonstrates the original breakthroughs and strength of the young researchers of Beihang University in the field of basic artificial intelligence research, and reflects Beihang University's open and cooperative attitude and profound international influence in the global artificial intelligence field.

Original link: https://openreview.net/forum?id=9H0gBsNjCv

ICLR (International Conference on Learning Representations) is a universally recognized world top-tier academic conference in the field of deep learning. Co-founded in 2013 by Turing Award winners Yoshua Bengio and Yann LeCun, it is regarded as one of the three top conferences in the field of machine learning alongside NeurIPS and ICML, and ranks among the world's top 10 in the Google Scholar Conference Journal Rankings. The conference covers multiple interdisciplinary fields such as artificial intelligence, statistics and data science, attracting the world's top researchers to participate. ICLR 2026 received a total of 19,000 submissions with an acceptance rate of 28%, and will be held in Rio de Janeiro, Brazil, in April 2026.

As the “examiner” for the reasoning of large models, Process Reward Models (PRMs) have significantly enhanced the mathematical reasoning capabilities of models through Test-Time Scaling technology. However, existing PRMs mostly focus on the mathematical domain and rely on domain-specific training data and knowledge-based learning patterns, resulting in limited generalization capabilities in non-mathematical domains such as law, history, and philosophy. To address this challenge, the ContextPRM framework innovatively shifts the learning objective from verifying the correctness of single-domain knowledge to modeling the cross-domain “Logical Flow”. Instead of merely focusing on the independent correctness of reasoning steps, this method centers on the “Contextual Coherence” between reasoning steps, enabling the model to adapt to the reasoning styles of different disciplines by evaluating the coherence of logical deduction, just like humans do.

△ Flow Chart of ContextPRM

ContextPRM realizes in-depth modeling of reasoning logic by introducing a brand-new Context-Aware Training Method and supporting data annotation standards. Different from traditional methods that only focus on factual errors, this framework guides the model to identify logical fallacies, misinterpretations, and irrelevant information, thus maintaining strong evaluation capabilities even in the absence of domain-specific training data. Experimental results show that ContextPRM performs excellently in the nine non-mathematical domains (including law, history, philosophy, etc.) of MMLU-Pro, achieving an average accuracy improvement of 6.5% through Weighted Majority Voting, which significantly surpasses the current state-of-the-art multi-domain model VersaPRM (2.2%) and other mathematics-focused reward models. This achievement proves that learning universal logical structures can effectively break the domain barriers in the reasoning of large models.

△ Key Experimental Results of ContextPRM

The proposal of ContextPRM is another successful exploration by the AI research team of Beihang University in integrating basic theories with engineering innovation, and also demonstrates the university's sustained investment and achievement transformation capabilities in cultivating internationally competitive young research talents. In the future, the team will continue to further explore the logical reasoning mechanisms and adaptive learning capabilities of large models in open-domain tasks, promote the adaptation of artificial intelligence technology to more complex and broader application scenarios from “domain-specific problem-solving” to “general-purpose thinking”.

Artificial Intelligence Innovation Center (AIIC)

The Artificial Intelligence Innovation Center focuses on three key areas: Refined Intelligence, Trustworthy Intelligence, and Embodied Intelligence, supporting national strategies and regional economic development. It aims to tackle critical AI challenges, build a full innovation chain from basic research to industrial application, develop world-class research teams, and cultivate top-tier AI talent.

The Refined Intelligence Division centers on cross-scale perception and adaptive intelligence, covering cross-scale intelligence and brain-inspired intelligence. It develops brain-inspired multimodal platforms, deciphers the mathematical logic and neural mechanisms of the human brain, and advances intelligent methods incorporating prior knowledge to form a high-precision, stable, and reliable theoretical system.

The Trustworthy Intelligence Division is committed to building secure and explainable AI systems. It concentrates on interpretability in neural networks, training and inference mechanisms of large models, and privacy protection, integrating endogenous reliability and external assessment to establish a robust AI framework.

The Embodied Intelligence Division focuses on embodied intelligence and swarm collaboration. It develops high-fidelity simulation and distributed training systems, and innovates control mechanisms based on large-small model cooperation as well as mechanism-informed swarm decision-making. These efforts enable precise agent manipulation and efficient multi-agent coordination, resulting in robust, highly generalizable, and well-decided solutions, and thus driving applications in smart homes, low‑altitude economy, and disaster response to enable sustainable development.

Approved by Dong Zhuoning, Zhang Wei, Xu Ran

Edited by Yuan Xiaohui

News

News