publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

IC-Cache: Efficient Large Language Model Serving via In-context Caching

Yifan Yu^*, Yu Gan^*, Lillian Tsai, and 7 more authors

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

31st SOSP

Acceptance rate: 17 percent

2024

Loftq: Lora-fine-tuning-aware quantization for large language models

Yixiao* Li, Yifan* Yu, Chen Liang, and 4 more authors

The Twelfth International Conference on Learning Representations, 2024

Oral Presentation

About 1 percent of the submissions are selected as oral presentation

2023

Losparse: Structured compression of large language models based on low-rank and sparse approximation

Yixiao* Li, Yifan* Yu, Qingru Zhang, and 4 more authors

In International Conference on Machine Learning, 2023