Welcome to Wizardlancet’s
About Me
Dr. Zilong Wang is currently a Senior Researcher at the Machine Learning Group, Microsoft Research Asia (MSRA) in Shanghai, China. His research lies at the intersection of AI4Health, Foundation Models, and Human-AI Interaction, with a focus on building reliable, generalizable, and clinically grounded AI systems for real-world healthcare. Trained as a physician, he received his M.D. in Clinical Medicine from Shanghai Medical College, Fudan University in 2018. This clinical background shapes his research philosophy: advancing computational models that are not only technically sophisticated, but also robust to real-world heterogeneity, aligned with clinical reasoning, and safely integrated into healthcare workflows.
His work spans three tightly connected directions. In AI4Health, he develops medical computer vision and multimodal systems for screening, diagnosis, and longitudinal disease management. In Foundation Models, he studies multimodal and large language model (LLM) architectures, evaluation frameworks, and reinforcement learning methods that improve generalization, interpretability, and clinical faithfulness. In Human-AI Interaction, he studies how advanced AI systems—including multimodal and agentic models—interact with people in real-world settings, spanning clinical workflows as well as accessibility and aging scenarios. He focuses on human-in-the-loop designs that let users query, verify, correct, and steer AI behavior, making systems more transparent, controllable, and trustworthy under uncertainty and high-stakes constraints.
Prior to joining MSRA in 2023, Dr. Wang served as CTO of medical technology startups, where he led the development of multiple AI Software as a Medical Device (SaMD) products and successfully drove them through clinical validation, regulatory approval, and market access. He was recognized by the Forbes China 30 Under 30 (2020) and the Hurun U30 China Entrepreneurship Leaders (2021), and serves on the Executive Committee of the CCF Digital Medicine Technical Committee.
News
[Feb. 2026] We released OMGs (Ovarian tumour Multidisciplinary intelligent aGent System), an LLM-powered multi-agent framework designed to support MDT decision-making across the ovarian tumor care continuum. In multicenter evaluations, OMGs achieved performance comparable to expert MDT consensus, demonstrating the potential of collaborative agentic AI systems in high-stakes clinical decision support.
[Jan. 2026] Four papers accepted as ICLR 2026 posters, covering visual prompt tuning interpretability, multi-modal Alzheimer’s disease diagnosis, reasoning-driven multimodal LLM for domain generalization, and a token-level fix for low-probability over-domination in RL for LLMs.
[Jan. 2026] Three papers accepted to top-tier venues: AAAI 2026 (medical foundation models for inner ear temporal CT analysis) and two CHI 2026 papers on accessibility and human-AI interaction for screen reader users in the vibe coding era and computer use scenarios.
[Jan. 2026] We introduced GI-Bench, a panoramic benchmark covering 20 fine-grained lesion categories to systematically evaluate Multimodal Large Language Models (MLLMs) across a five-stage gastrointestinal endoscopy clinical workflow, advancing clinically grounded evaluation of multimodal foundation models.
[Aug. 2025] We open-sourced Agent Lightning⚡, a flexible framework that enables developers to train ANY AI agent with Reinforcement Learning (RL). By decoupling agent execution from model training, it supports seamless integration with existing frameworks such as LangChain, AutoGen, and CrewAI with minimal code modification.
[Aug. 2025] We released preprints for two new medical foundation models: RenalCLIP, a disease-centric vision-language foundation model for precision oncology in kidney cancer, and DermINO, a dermatology foundation model based on a novel multi-view hybrid pretraining strategy for robust visual representation learning.
Featured Projects
OpenOE-Lite — An open-source, lightweight reproduction of OpenEvidence-style evidence-based medical Q&A. With just one LLM API key and zero local vector DB, it generates grounded, citation-aligned answers in real time from OpenAlex’s 250M+ open academic works via a 6-stage pipeline (safety gate → 3-view query enhancement → multi-source retrieval → RRF dedup/rank → small-model evidence gating → grounded answer generation). The modular architecture supports smooth upgrade from the Lite core to a Full RAG with local vector DB and clinical guidelines.
Anji-Bridge 安济桥 — A PDF → AI-agent knowledge bridge that converts PDFs into structured, semantic Markdown/JSON ready for LLMs and agents. Built on PaddleOCR-VL for layout-aware OCR and Ovis2.5-9B VLM for image understanding, with AST-level enhancement (heading correction, decorative element filtering, image captioning) and multi-format export. Supports batch processing and base64-embedded portable outputs.
Mandarin Speech Prosody Benchmark (MSPB) — The companion repository for our Interspeech 2025 paper “Can AI Understand Mandarin Speech Prosody?”. MSPB is a linguistically grounded benchmark with 178 phonetically recorded and expert-validated stimuli covering 8 prosody-related tasks (tone/intonation, prosodic ambiguity, focus marking, focus operators, scalar meaning, irony, emotional prosody with/without context). It systematically evaluates how Speech LLMs interpret prosodic cues across phonology, syntax, semantics, and pragmatics.
Contact
- Email: wangzilong@microsoft.com
- GitHub: wizardlancet
Selected Publications
2026
- GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards.
- OMGs: A multi-agent system supporting MDT decision-making across the ovarian tumour care continuum.
- Exploring interpretability for visual prompt tuning with cross-layer concepts.ICLR 2026 (poster), 2026 · Link
- Joint adaptation of uni-modal foundation models for multi-modal Alzheimer's disease diagnosis.ICLR 2026 (poster), 2026 · Link
- Reasoning-driven multimodal LLM for domain generalization.ICLR 2026 (poster), 2026 · Link
- Do not let low-probability tokens over-dominate in RL for LLMs.ICLR 2026 (poster), 2026 · Link
- Programmers Who Use Screen Readers in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape.
- From Struggle to Success: Context-Aware Guidance for Screen Reader Users in Computer Use.CHI 2026, 2026
- Tuning Medical Foundation Models for Inner Ear Temporal CT Analysis with Plug-and-play Domain Knowledge Aggregator.AAAI 2026, 2026
- ReMe: Scaffolding Personalized Cognitive Training via Controllable LLM-Mediated Conversations.
2025
- EEGChaT: A Transformer-Based Modular Channel Selector for SEEG Analysis.
- Towards Ultra-low Framerate Ultrasound Localization Microscopy on Human Brain with Artificial Intelligence.Preprint, 2025
- Can AI Understand Mandarin Speech Prosody? A Framework and Benchmark Showcase.Interspeech 2025, 2025
- Segmentation Helps Understanding: Mask-Infused Vision-Language Pre-training for 3D Medical Images.Preprint, 2025
- A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer.
- DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model.
- Learning Robust Representations for Medical Images via Unifying (Self-)Supervisions.ICLR 2025 submission (OpenReview), 2025 · Link
- Agent Lightning: Train ANY AI Agents with Reinforcement Learning.
- AI-assisted facial analysis in healthcare: From disease detection to comprehensive management.Patterns, 2025 · Link
2024
- Screening chronic kidney disease through deep learning utilizing ultra-wide-field fundus images.
- DualStreamFoveaNet: A dual stream fusion architecture with anatomical awareness for robust fovea localization.
- LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation.
2023
- Early detection of visual impairment in young children using a smartphone-based deep learning system.