I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I cofounded LMNet.ai (2023), and we have joined force with Snowflake since November 2023. During 2016 - 2021, I worked for the ML platform startup Petuum Inc. Here is a short Bio.
Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.
Research
I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.
Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models. I also co-founded and run the non-profit LMSYS Org (@lmsysorg) which maintains the popular LLM evaluation Chatbot Arena and the widely adopted LLM serving framework vLLM.
Current Projects
- LLM inference and serving systems: LLM-LTR [NeurIPS'24], DistServe [OSDI'24], MuxServe [ICML'24], vLLM [SOSP'23]
- Efficient ML architectures and algorithms: Consistency LLM [ICML'24], OSD [ICML'24], Lookahead Decoding [ICML'24]
- Open data, model, and evals: Chatbot Arena [ICML'24], LMSYS-Chat-1M [ICLR'24], Vicuna, MT-bench [NeurIPS'23]
- Model-parallel ML Systems: LightSeq [COLM'24], Alpa [OSDI'22, MLSys'23]
Some of my research have been developed and maintained as open source software:
- Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
- FastChat: An open platform for training, serving, and evaluating Large Language Models.
- vLLM: A high-throughput and memory-efficient inference engine for LLMs.
- Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
- Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
- Ray Collective: CPU/GPU collective communication primitives on Ray.
- AutoDist: Automatic data-parallel training on TensorFlow.
- DyNet: The Dynamic Neural Network Toolkit.
- Poseidon: Parameter server on distributed GPUs.
Students and Postdocs
Current Members
- Junda Chen, PhD (Rotation)
- Zhongdongming Dai, Undergrad Intern
- Hangliang Ding, Undergrad Intern
- Jiangfei Duan, Visiting PhD
- Yichao Fu, PhD
- Lanxiang Hu, PhD
- Atharva Kshirsagar, Master
- Will Lin, PhD (Rotation)
- Runlong Su, Master
- Anze Xie, Master
- Haoyang Yu, Undergrad Intern
- Longfei Yun, Master
- Peiyuan Zhang, PhD
- Yuxuan Zhang, Undergrad Intern
- Siqi Zhu, Undergrad Intern
Alumni
- Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU
- Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU
- Dacheng Li, Master (2020) -> PhD @ UC Berkeley
- Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich
Recent Talks
- 01/2025Talk at Faster LLM Inference Seminar @ Weizmann Institute of Science
- 10/2024Talk at PyTorch Webinar
- 09/2024Talk at Microsoft GenAI AIMS Talk
- 04/2024Talk at UChicago AI+System Seminar
- 03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
- 03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
- 02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
- 12/2023Panel at Instruction Workshop @ NeurIPS 2023
- 11/2023Tutorial at ODSC West
- 10/2023Talk at I-X Seminar Series at Imperial College London
- 08/2023Talk at USC and FedML.ai
- 08/2023Talk at SRG Seminar, Google
- 07/2023Talk at Generative AI Summit, ODSC
- 06/2023Talk at Chinese Googler Networks Talk Series, Google
- 06/2023Talk at THU, PKU, SJTU, SYSU, FDU
- 05/2023Talk at Apple
- 11/2022Talk at ML Guild Seminar, Spotify
- 10/2022Tutorial at Sky Camp, UC Berkeley
- 10/2022Talk at 1st CASL Workshop, MBZUAI
- 08/2022Talk at Ray Summit
- 07/2022Tutorial at ICML 2022
- 07/2022Tutorial at KDD 2021
- 01/2021Tutorial at AAAI 2021
Experience
- Assistant Professor, UC San Diego, 2023 - Present
- Software Engineer, Snowflake, 2023 - Present
- Postdoc, UC Berkeley, 2021 - 2023
- Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
- Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)