Welcome to Wizardlancet’s

About Me

Dr. Zilong Wang is currently a Senior Researcher at the Machine Learning Group, Microsoft Research Asia (MSRA) in Shanghai, China. His research lies at the intersection of AI4Health, Foundation Models, and Human-AI Interaction, with a focus on building reliable, generalizable, and clinically grounded AI systems for real-world healthcare. Trained as a physician, he received his M.D. in Clinical Medicine from Shanghai Medical College, Fudan University in 2018. This clinical background shapes his research philosophy: advancing computational models that are not only technically sophisticated, but also robust to real-world heterogeneity, aligned with clinical reasoning, and safely integrated into healthcare workflows.

His work spans three tightly connected directions. In AI4Health, he develops medical computer vision and multimodal systems for screening, diagnosis, and longitudinal disease management. In Foundation Models, he studies multimodal and large language model (LLM) architectures, evaluation frameworks, and reinforcement learning methods that improve generalization, interpretability, and clinical faithfulness. In Human-AI Interaction, he studies how advanced AI systems—including multimodal and agentic models—interact with people in real-world settings, spanning clinical workflows as well as accessibility and aging scenarios. He focuses on human-in-the-loop designs that let users query, verify, correct, and steer AI behavior, making systems more transparent, controllable, and trustworthy under uncertainty and high-stakes constraints.

Prior to joining MSRA in 2023, Dr. Wang served as CTO of medical technology startups, where he led the development of multiple AI Software as a Medical Device (SaMD) products and successfully drove them through clinical validation, regulatory approval, and market access. He was recognized by the Forbes China 30 Under 30 (2020) and the Hurun U30 China Entrepreneurship Leaders (2021), and serves on the Executive Committee of the CCF Digital Medicine Technical Committee.

News

[Feb. 2026] We released OMGs (Ovarian tumour Multidisciplinary intelligent aGent System), an LLM-powered multi-agent framework designed to support MDT decision-making across the ovarian tumor care continuum. In multicenter evaluations, OMGs achieved performance comparable to expert MDT consensus, demonstrating the potential of collaborative agentic AI systems in high-stakes clinical decision support.

[Jan. 2026] Four papers accepted as ICLR 2026 posters, covering visual prompt tuning interpretability, multi-modal Alzheimer’s disease diagnosis, reasoning-driven multimodal LLM for domain generalization, and a token-level fix for low-probability over-domination in RL for LLMs.

[Jan. 2026] Three papers accepted to top-tier venues: AAAI 2026 (medical foundation models for inner ear temporal CT analysis) and two CHI 2026 papers on accessibility and human-AI interaction for screen reader users in the vibe coding era and computer use scenarios.

[Jan. 2026] We introduced GI-Bench, a panoramic benchmark covering 20 fine-grained lesion categories to systematically evaluate Multimodal Large Language Models (MLLMs) across a five-stage gastrointestinal endoscopy clinical workflow, advancing clinically grounded evaluation of multimodal foundation models.

[Aug. 2025] We open-sourced Agent Lightning⚡, a flexible framework that enables developers to train ANY AI agent with Reinforcement Learning (RL). By decoupling agent execution from model training, it supports seamless integration with existing frameworks such as LangChain, AutoGen, and CrewAI with minimal code modification.

[Aug. 2025] We released preprints for two new medical foundation models: RenalCLIP, a disease-centric vision-language foundation model for precision oncology in kidney cancer, and DermINO, a dermatology foundation model based on a novel multi-view hybrid pretraining strategy for robust visual representation learning.

OpenOE-Lite — An open-source, lightweight reproduction of OpenEvidence-style evidence-based medical Q&A. With just one LLM API key and zero local vector DB, it generates grounded, citation-aligned answers in real time from OpenAlex’s 250M+ open academic works via a 6-stage pipeline (safety gate → 3-view query enhancement → multi-source retrieval → RRF dedup/rank → small-model evidence gating → grounded answer generation). The modular architecture supports smooth upgrade from the Lite core to a Full RAG with local vector DB and clinical guidelines.

Anji-Bridge 安济桥 — A PDF → AI-agent knowledge bridge that converts PDFs into structured, semantic Markdown/JSON ready for LLMs and agents. Built on PaddleOCR-VL for layout-aware OCR and Ovis2.5-9B VLM for image understanding, with AST-level enhancement (heading correction, decorative element filtering, image captioning) and multi-format export. Supports batch processing and base64-embedded portable outputs.

Mandarin Speech Prosody Benchmark (MSPB) — The companion repository for our Interspeech 2025 paper “Can AI Understand Mandarin Speech Prosody?”. MSPB is a linguistically grounded benchmark with 178 phonetically recorded and expert-validated stimuli covering 8 prosody-related tasks (tone/intonation, prosodic ambiguity, focus marking, focus operators, scalar meaning, irony, emotional prosody with/without context). It systematically evaluates how Speech LLMs interpret prosodic cues across phonology, syntax, semantics, and pragmatics.

Contact

Selected Publications

2026

  • GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards.
    Zhu, Yan, Luo, Te, Fu, Pei-Yao, Zhang, Zhen, Wang, Zi-Long, Qu, Yi-Fan, Geng, Zi-Han, Xu, Jia-Qi, Yao, Lu, Ma, Li-Yun, Su, Wei, Chen, Wei-Feng , et al.
    arXiv preprint arXiv:2601.08183, 2026 · Link · DOI
  • OMGs: A multi-agent system supporting MDT decision-making across the ovarian tumour care continuum.
    Zhang, Yangyang, Wang, Zilong, Xu, Jianbo, Chen, Yongqi, Han, Chu, Zhang, Zhihao, Liu, Shuai, Li, Hui, Zhang, Huiping, Liu, Ziqi, Chen, Jiaxin, Zhu, Jun , et al.
    arXiv preprint arXiv:2602.13793, 2026 · Link · DOI
  • Exploring interpretability for visual prompt tuning with cross-layer concepts.
    Wang, Yubin, Jiang, Xinyang, Cheng, De, Zhao, Xiangqian, Wang, Zilong, Li, Dongsheng, Zhao, Cairong
    ICLR 2026 (poster), 2026 · Link
  • Joint adaptation of uni-modal foundation models for multi-modal Alzheimer's disease diagnosis.
    Gu, Wentao, Li, Yuquan, Jiang, Xinyang, Wang, Zilong, Li, Dongsheng, Li, Zehui, Dong, Zijian, Zhao, Cairong
    ICLR 2026 (poster), 2026 · Link
  • Reasoning-driven multimodal LLM for domain generalization.
    Xu, Zhipeng, Wang, Zilong, Jiang, Xinyang, Li, Dongsheng, Cheng, De, Wang, Nannan
    ICLR 2026 (poster), 2026 · Link
  • Do not let low-probability tokens over-dominate in RL for LLMs.
    Yang, Zhihe, Luo, Xufang, Wang, Zilong, Han, Dongqi, He, Zhiyuan, Li, Dongsheng, Xu, Yunjian
    ICLR 2026 (poster), 2026 · Link
  • Programmers Who Use Screen Readers in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape.
    Chen, Nan, Qiu, Luna K., Wang, Arran Zeyu, Wang, Zilong, Yang, Yuqing
    CHI 2026, 2026 · Link · DOI
  • From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use.
    Chen, Nan, Lu, Jing, Wang, Zilong, Qiu, Luna K., Chen, Siming, Yang, Yuqing
    CHI 2026, 2026
  • Tuning Medical Foundation Models for Inner Ear Temporal CT Analysis with Plug-and-play Domain Knowledge Aggregator.
    Wan, Weixun, Jiang, Xinyang, Wang, Zilong, Li, Bei, Zhao, Cairong
    AAAI 2026, 2026
  • ReMe: Scaffolding Personalized Cognitive Training via Controllable LLM-Mediated Conversations.
    Wang, Zilong, Chen, Nan, Qiu, Luna K., Yue, Ling, Guo, Geli, Ou, Yang, Jiang, Shiqi, Yang, Yuqing, Qiu, Lili
    CHI 2026 (LBW/poster), 2026 · Link · DOI

2025

  • EEGChaT: A Transformer-Based Modular Channel Selector for SEEG Analysis.
    Wang, Chen, Wang, Yansen, Han, Dongqi, Wang, Zilong, Li, Dongsheng
    arXiv preprint arXiv:2510.13592, 2025 · Link · DOI
  • Towards Ultra-low Framerate Ultrasound Localization Microscopy on Human Brain with Artificial Intelligence.
    Jiang, Xinyang, Zhong, Chuanyu, Wan, Weixun, Qu, Zefan, Wang, Zilong, Zhang, Xingxuan, Xu, Xiang, Wei, Linglin, Sun, Dailin, Wang, Yu
    Preprint, 2025
  • Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase.
    Wang, Zilong, Zhang, Xiaoxue, Jiang, Xinyang, Song, Kaitao, Yu, Jue
    Interspeech 2025, 2025
  • Segmentation Helps Understanding: Mask-Infused Vision-Language Pre-training for 3D Medical Images.
    Hu, Yuqi, Luo, Xufang, Wang, Zilong
    Preprint, 2025
  • A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer.
    Tao, Yuhui, Zhao, Zhongwei, Wang, Zilong, Luo, Xufang, Chen, Feng, Wang, Kang, Wu, Chuanfu, Zhang, Xue, Zhang, Shaoting, Yao, Jiaxi, Jin, Xingwei, Jiang, Xinyang , et al.
    arXiv preprint arXiv:2508.16569, 2025 · Link · DOI
  • DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model.
    Xu, Jingkai, Cheng, De, Zhao, Xiangqian, Yang, Jungang, Wang, Zilong, Jiang, Xinyang, Luo, Xufang, Chen, Lili, Ning, Xiaoli, Li, Chengxu, Zhou, Xinzhu, Song, Xuejiao , et al.
    arXiv preprint arXiv:2508.12190, 2025 · Link · DOI
  • Learning Robust Representations for Medical Images via Unifying (Self-)Supervisions.
    He, Xiaoxuan, Luo, Xufang, Yang, Yifan, Jiang, Xinyang, Wang, Zilong, Usuyama, Naoto, Zhang, Sheng, Poon, Hoifung, Yang, Yuqing, Li, Dongsheng, Qiu, Lili
    ICLR 2025 submission (OpenReview), 2025 · Link
  • Agent Lightning: Train ANY AI Agents with Reinforcement Learning.
    Luo, Xufang, Zhang, Yuge, He, Zhiyuan, Wang, Zilong, Zhao, Siyun, Li, Dongsheng, Qiu, Luna K., Yang, Yuqing
    arXiv preprint arXiv:2508.03680, 2025 · Link · DOI
  • AI-assisted facial analysis in healthcare: From disease detection to comprehensive management.
    Patterns, 2025 · Link

2024

  • Screening chronic kidney disease through deep learning utilizing ultra-wide-field fundus images.
    Zhao, Xinyu, Gu, Xingwang, Meng, Lihui, Chen, Yongwei, Zhao, Qing, Cheng, Shiyu, Zhang, Wenfei, Cheng, Tiantian, Wang, Chuting, Shi, Zhengming, Jiao, Shengyin, Jiang, Changlong, Jiao, Guofang, Teng, Da, Sun, Xiaolei, Zhang, Bilei, Li, Yakun, Lu, Huiqin, Chen, Changzheng, Zhang, Hao, Yuan, Ling, Su, Chang, Zhang, Han, Xia, Song, Liang, Anyi, Li, Mengda, Zhu, Dan, Xue, Meirong, Sun, Dawei, Li, Qiuming, Zhang, Ziwu, Zhang, Donglei, Lv, Hongbin, Ahmat, Rishet, Wang, Zilong , et al.
    npj Digital Medicine 7:275, 2024 · Link · DOI
  • DualStreamFoveaNet: A dual stream fusion architecture with anatomical awareness for robust fovea localization.
    Song, Sifan, Wang, Jinfeng, Wang, Zilong, Wang, Hongxing, Su, Jionglong, Ding, Xiaowei, Dang, Kang
    IEEE Journal of Biomedical and Health Informatics 28(12):7217–7229, 2024 · Link · DOI
  • LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation.
    Wang, Zilong, Luo, Xufang, Jiang, Xinyang, Li, Dongsheng, Qiu, Lili
    arXiv preprint arXiv:2404.00998, 2024 · Link · DOI

2023

  • Early detection of visual impairment in young children using a smartphone-based deep learning system.
    Chen, Wenben, Li, Ruiyang, Yu, Qinji, Xu, Andi, Feng, Yile, Wang, Ruixin, Zhao, Lanqin, Lin, Zhenzhe, Yang, Yahan, Lin, Duoru, Wu, Xiaohang, Chen, Jingjing, Liu, Zhenzhen, Wu, Yuxuan, Dang, Kang, Qiu, Kexin, Wang, Zilong , et al.
    Nature Medicine 29(2):493–503, 2023 · Link · DOI

2020

  • Artificial intelligence-enabled screening for diabetic retinopathy: A real-world, multicenter and prospective study.
    Zhang, Yifei, Shi, Juan, Peng, Ying, Zhao, Zhiyun, Zheng, Qidong, Wang, Zilong, Liu, Kun, Jiao, Shengyin, Qiu, Kexin, Zhou, Ziheng, Yan, Li, Zhao, Dong , et al.
    BMJ Open Diabetes Research & Care 8(1):e001596, 2020 · Link · DOI