Hello there, I am Zhipeng!

I am a seasoned scientist & engineer working in the field of Machine Learning Systems with a primary focus on the optimization of large-scale distributed training and inference systems for foundation models and large language models (LLMs). Distinct from many other ML Systems researchers, my work also extends into core modeling research. My core modeling research mainly focuses on efficient AI—particularly LLM compression, knowledge distillation, LLM post-training methodologies, Agentic Reinforcement Learning and Vision-Language Models (VLM). Some of my research findings are published in top venues including EMNLP, CVPR, MLSys, ECCV, ICCV, WACV, ICML and NeurIPS etc.

Some of the representative ML Systems that I was involved in the development phase including DeepSpeed, one of the most popular OSS LLM distributed training library (code maintainer and TSC Committer); fmchisel, the state-of-the-art Foundation Models Optimization research library (e.g. knowledge distillation,compression and quantization etc); and Liger Kernel (Kudos to Byron Hsu and Yun Dai et al for leading the work).

In the corporate world, I am Senior Manager/Senior Staff Research Scientist leading the Foundational AI algorithms organization at LinkedIn (subsidary of Microsoft). Previously I worked at AWS AI organization at Amazon, where I was leading the SageMaker Applied Science Team contributing to LLM Inference/Distributed Training and Evaluation Services. I was also the Tech Lead Manager/Staff Software Engineer at Google[X]/Google Research, where I was building up the machine learning team for the Moonshot project Chorus and engaged in AIDA project building coding agent using LLM (now part of Gemini). I was also involved in PaLM model development work across Alphabet. Before that I was Staff Research Scientist at Apple, where I lead the development of ML Algorithms for Sleep Apnea Detection on Apple Watch.

News

Talks and Tutorials

Selected Recent Publications

Scaling Down, Serving Fast
Kayhan Behdin, Ata Fatahi, Qingquan Song, Yun Dai, Aman Gupta, Zhipeng Wang# et al.
EMNLP 2025 (# corresponding author)
Oral Presentation
EVTP-IVS
Wenhui Zhu*, Xiwen Chen*, Zhipeng Wang*#, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang
WACV 2026 (* equal contribution, # corresponding author)
Liger Kernel
Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, Zhipeng Wang
ICML 2025 CODE Workshop
Local2Global
Rajat Koner, Zhipeng Wang, Srinivas Parthasarathy, Chinghang Chen
ICCV 2025
Reasoning-Aware Compression
Ryan Lucas, Kayhan Behdin, Zhipeng Wang#, Qingquan Song, Shao Tang, Rahul Mazumder
ICLR 2026 (# corresponding author)
LLaDA-MedV
Xuanzhao Dong*, Wenhui Zhu*, Xiwen Chen*, Zhipeng Wang*, Peijie Qiu, Shao Tang, Xin Li, Yalin Wang
CVPR 2026 (* equal contribution)
OTPrune
Xiwen Chen, Wenhui Zhu, Gen Li, Xuanzhao Dong, Yujian Xiong, Hao Wang, Peijie Qiu, Qingquan Song, Zhipeng Wang#, Shao Tang, Yalin Wang, Abolfazl Razi
CVPR 2026 (# corresponding author)
Planner-R1
Siyu Zhu, Yanbin Jiang, Hejian Sang, Shao Tang, Qingquan Song, Biao He, Rohit Jain, Zhipeng Wang, Alborz Geramifard
Preprint
Scaling Up SLM
Kayhan Behdin, Qingquan Song, Sriram Vasudevan, Jian Sheng, Xiaojing Ma, Z Zhou, Chuanrui Zhu, Guoyao Li, Chanh Nguyen, Sayan Ghosh, Hejian Sang, Ata Fatahi Baarzi, Sundara Raman Ramachandran, Xiaoqing Wang, Qing Lan, Qi Guo, Caleb Johnson, Zhipeng Wang*, Fedor Borisyuk
MLSys 2026 (* Corresponding author)

Open Source Contributions

I am the TSC Committer and code maintainer for DeepSpeed project, one of the most popular OSS libraries for LLM training. I also help maintain the Liger Kernel project, feel free to raise issues and contribute PRs.