Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
Published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Oral presentation at EMNLP 2025 on compressing and deploying efficient LLMs for recommendation systems.
Recommended citation: Kayhan Behdin, Ata Fatahi, Qingquan Song, Yun Dai, Aman Gupta, Zhipeng Wang et al. (2025). "Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems." EMNLP 2025.
Download Paper
