He Cao

He Cao (曹赫)

I am a Researcher in AI for Science Team at CTO-Lab, The International Digital Economy Academy (IDEA), where I work closely with Dr. Zijing Liu, Ziqi Gao and Yu Li. My research focuses on multimodal foundation models, reliable evaluation, and agentic systems for scientific discovery, particularly in chemistry, biology, and drug discovery.

My work aims to build scientifically grounded AI systems that can understand, reason over, and act on complex scientific data. I develop multimodal and agentic methods that connect natural language with molecular structures, chemical reactions, proteins, and drug-discovery workflows. More recently, I have focused on scientific foundation models, trustworthy evaluation and alignment, and autonomous agents for scientific research.

Earlier in my research, I worked on generative vision and multimodal learning, including diffusion models, 3D generation, and open-world visual understanding. These experiences shaped my broader perspective on foundation models and multimodal intelligence, and ultimately motivated my transition toward AI for Science and scientific discovery.

I received my Ph.D. (2020–2025) from the Individualized Interdisciplinary Program in Artificial Intelligence at The Hong Kong University of Science and Technology, advised by Prof. Yuan Yao and Prof. Yangqiu Song. Before that, I received my B.S. (2016–2020) in Computer Science and Technology from Harbin Institute of Technology (Shenzhen).

Research Interests

  • Reliable scientific foundation models: multimodal representation learning and reasoning over molecules, chemical reactions, proteins, protein-ligand systems, and scientific literature.
  • Evaluation and alignment for scientific LLMs: factuality, hallucination, chemical reasoning, reaction-diagram understanding, safety, preference alignment, and process-level verification.
  • Agentic AI for drug discovery: governed tool use, workflow orchestration, human-in-the-loop scientific agents, and auditable autonomous discovery systems.
  • Generative modeling for science and vision: diffusion and autoregressive models for molecular, protein, visual, and 3D generation.

Experience

2025 - Present

Researcher, AI for Science Team, CTO-Lab, IDEA

2022 - 2025

Algorithm Research Intern, IDEA

Services

Reviewer

ICLR, AAAI, CVPR, ACL, NeurIPS, EMNLP, ICML, ECCV.

2021 - 2025

Graduate Teaching Assistant

AI for Fintech Courses, Department of Mathematics, HKUST.