Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

Published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Oral presentation at EMNLP 2025 on compressing and deploying efficient LLMs for recommendation systems.

Recommended citation: Kayhan Behdin, Ata Fatahi, Qingquan Song, Yun Dai, Aman Gupta, Zhipeng Wang et al. (2025). "Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems." EMNLP 2025.
Download Paper