Posts by Collection

portfolio

publications

Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

Published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Oral presentation on compressing and deploying efficient LLMs for recommendation systems.

Recommended citation: Kayhan Behdin, Ata Fatahi, Qingquan Song, Yun Dai, Aman Gupta, Zhipeng Wang et al. (2025). "Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems." EMNLP 2025.
Download Paper

Liger-Kernel: Efficient Triton Kernels for LLM Training

Published in Championing Open-source DEvelopment in ML Workshop @ ICML 2025, 2025

Efficient Triton kernels for LLM training.

Recommended citation: Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, Zhipeng Wang. (2025). "Liger-Kernel: Efficient Triton Kernels for LLM Training." ICML 2025 Workshop.
Download Paper

Local2Global query Alignment for Video Instance Segmentation

Published in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2025), 2025

Local to global query alignment for video instance segmentation.

Recommended citation: Rajat Koner, Zhipeng Wang, Srinivas Parthasarathy, Chinghang Chen. (2025). "Local2Global query Alignment for Video Instance Segmentation." ICCV 2025.
Download Paper

EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models

Published in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026), 2026

Effective visual token pruning for unifying instruction visual segmentation in MLLMs.

Recommended citation: Wenhui Zhu*, Xiwen Chen*, Zhipeng Wang*#, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang. (2026). "EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models." WACV 2026.
Download Paper

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Published in The Fourteenth International Conference on Learning Representations (ICLR 2026), 2026

Reasoning-aware compression that jointly reconstructs activations from input and on-policy chain-of-thought traces.

Recommended citation: Ryan Lucas, Kayhan Behdin, Zhipeng Wang, Qingquan Song, Shao Tang, Rahul Mazumder. (2026). "Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction." ICLR 2026.
Download Paper

OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), 2026

Training-free visual token pruning using optimal transport for distribution alignment.

Recommended citation: Xiwen Chen, Wenhui Zhu, Gen Li, Xuanzhao Dong, Yujian Xiong, Hao Wang, Peijie Qiu, Qingquan Song, Zhipeng Wang, Shao Tang, Yalin Wang, Abolfazl Razi. (2026). "OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport." CVPR 2026.
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.