publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. echolm.png
    EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation
    Yifan Yu*, Yu Gan*, Lillian Tsai, and 7 more authors
    2025

2024

  1. loftq.png
    Loftq: Lora-fine-tuning-aware quantization for large language models
    Yixiao* Li, Yifan* Yu, Chen Liang, and 4 more authors
    The Twelfth International Conference on Learning Representations, 2024

2023

  1. losparse.png
    Losparse: Structured compression of large language models based on low-rank and sparse approximation
    Yixiao* Li, Yifan* Yu, Qingru Zhang, and 4 more authors
    In International Conference on Machine Learning, 2023