publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. echolm.png
    IC-Cache: Efficient Large Language Model Serving via In-context Caching
    Yifan* Yu, Yu* Gan, Nikhil Sarda, and 7 more authors
    In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, Lotte Hotel World, Seoul, Republic of Korea, 2025

2024

  1. loftq.png
    Loftq: Lora-fine-tuning-aware quantization for large language models
    Yixiao* Li, Yifan* Yu, Chen Liang, and 4 more authors
    The Twelfth International Conference on Learning Representations, 2024

2023

  1. losparse.png
    Losparse: Structured compression of large language models based on low-rank and sparse approximation
    Yixiao* Li, Yifan* Yu, Qingru Zhang, and 4 more authors
    In International Conference on Machine Learning, 2023