Hello there, I am Zhipeng!

I am a seasoned scientist & engineer working in the field of Machine Learning Systems with a primary focus on the optimization of large-scale distributed training and inference systems for foundation models and large language models (LLMs). Distinct from many other ML Systems researchers, my work also extends into core modeling research. My core modeling research mainly focuses on efficient AI—particularly LLM compression, knowledge distillation, LLM post-training methodologies, Agentic Reinforcement Learning and Vision-Language Models (VLM). Some of my research findings are published in top venues including EMNLP, CVPR, MLSys, ECCV, ICCV, WACV, ICML and NeurIPS etc.

Some of the representative ML Systems that I was involved in the development phase including DeepSpeed, one of the most popular OSS LLM distributed training library (code maintainer and TSC Committer); fmchisel, the state-of-the-art Foundation Models Optimization research library (e.g. knowledge distillation,compression and quantization etc); and Liger Kernel (Kudos to Byron Hsu and Yun Dai et al for leading the work).

In the corporate world, I am Senior Manager/Senior Staff Research Scientist leading the Foundational AI algorithms organization at LinkedIn (subsidary of Microsoft). Previously I worked at AWS AI organization at Amazon, where I was leading the SageMaker Applied Science Team contributing to LLM Inference/Distributed Training and Evaluation Services. I was also the Tech Lead Manager/Staff Software Engineer at Google[X]/Google Research, where I was building up the machine learning team for the Moonshot project Chorus and engaged in AIDA project building coding agent using LLM (now part of Gemini). I was also involved in PaLM model development work across Alphabet. Before that I was Staff Research Scientist at Apple, where I lead the development of ML Algorithms for Sleep Apnea Detection on Apple Watch.

News

We won Best Paper Award in Visual Art, Generative AI, and the Legal/Ethical Dilemma Workshop @ WACV 2026 for our paper titled “EZBlender: Efficient 3D Editing with Plan-and-React Agent”!

Talks and Tutorials

Building Efficient Large-Scale Model Systems with DeepSpeed: From Open-Source Foundations to Emerging Research
Olatunji Ruwase, Minjia Zhang, Masahiro Tanaka, Zhipeng Wang, ASPLOS 2026 Tutorial
Ray x DeepSpeed Meetup: AI at Scale
Olatunji Ruwase, Masahiro Tanaka, Vinay Sridhar and Zhipeng Wang, slides

Selected Recent Publications

Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

Kayhan Behdin, Ata Fatahi, Qingquan Song, Yun Dai, Aman Gupta, Zhipeng Wang# et al.

EMNLP 2025 (# corresponding author)

Oral Presentation

[Paper]

EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models

Wenhui Zhu*, Xiwen Chen*, Zhipeng Wang*#, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang

WACV 2026 (* equal contribution, # corresponding author)

[Paper]

Liger-Kernel: Efficient Triton Kernels for LLM Training

Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, Zhipeng Wang

ICML 2025 CODE Workshop

[Paper] [Code]

Local2Global query Alignment for Video Instance Segmentation

Rajat Koner, Zhipeng Wang, Srinivas Parthasarathy, Chinghang Chen

ICCV 2025

[Paper]

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Ryan Lucas, Kayhan Behdin, Zhipeng Wang#, Qingquan Song, Shao Tang, Rahul Mazumder

ICLR 2026 (# corresponding author)

[Paper]

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding

Xuanzhao Dong*, Wenhui Zhu*, Xiwen Chen*, Zhipeng Wang*, Peijie Qiu, Shao Tang, Xin Li, Yalin Wang

CVPR 2026 (* equal contribution)

[Paper]

OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

Xiwen Chen, Wenhui Zhu, Gen Li, Xuanzhao Dong, Yujian Xiong, Hao Wang, Peijie Qiu, Qingquan Song, Zhipeng Wang#, Shao Tang, Yalin Wang, Abolfazl Razi

CVPR 2026 (# corresponding author)

[Paper]

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Siyu Zhu, Yanbin Jiang, Hejian Sang, Shao Tang, Qingquan Song, Biao He, Rohit Jain, Zhipeng Wang, Alborz Geramifard

Preprint

[Paper]

Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search

Kayhan Behdin, Qingquan Song, Sriram Vasudevan, Jian Sheng, Xiaojing Ma, Z Zhou, Chuanrui Zhu, Guoyao Li, Chanh Nguyen, Sayan Ghosh, Hejian Sang, Ata Fatahi Baarzi, Sundara Raman Ramachandran, Xiaoqing Wang, Qing Lan, Qi Guo, Caleb Johnson, Zhipeng Wang*, Fedor Borisyuk

MLSys 2026 (* Corresponding author)

[Paper]

Open Source Contributions

I am the TSC Committer and code maintainer for DeepSpeed project, one of the most popular OSS libraries for LLM training. I also help maintain the Liger Kernel project, feel free to raise issues and contribute PRs.