Hao Zhang

Assistant Professor

HDSI, CSE (affiliate)

Email: haozhang AT ucsd.edu

I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I cofounded LMNet.ai (2023), and we have joined force with Snowflake since November 2023. During 2016 - 2021, I worked for the ML platform startup Petuum Inc. Here is a short Bio.

Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.

Research

I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.

Our Lab develop open models, algorithms, and systems to democratize the access of large models. I also co-founded and run the non-profit LMSYS Org. We maintain the popular LLM evaluation Chatbot Arena and the widely adopted LLM serving framework vLLM. Some of our new research results are updated at lmsys.org (@lmsysorg).

Current Projects

LLM inference and serving systems: DistServe [Preprint'24], vLLM [SOSP'23], Lookahead Decoding [Preprint'23]
Efficient ML architectures and algorithms: Consistency LLM [Preprint'24], OSD [Preprint'23]
Open data, model, and evals: Chatbot Arena [Preprint'24], LMSYS-Chat-1M [ICLR'24], Vicuna, MT-bench [NeurIPS'23]
Model-parallel ML Systems: LightSeq [Preprint'23], Alpa [OSDI'22, MLSys'23]

Some of my research have been developed and maintained as open source software:

Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
FastChat: An open platform for training, serving, and evaluating Large Language Models.
vLLM: A high-throughput and memory-efficient inference engine for LLMs.
Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
Ray Collective: CPU/GPU collective communication primitives on Ray.
AutoDist: Automatic data-parallel training on TensorFlow.
DyNet: The Dynamic Neural Network Toolkit.
Poseidon: Parameter server on distributed GPUs.

Students and Postdocs

Current Members

Junda Chen, PhD (Rotation)
Jiangfei Duan, Visiting PhD
Yichao Fu, PhD
Lanxiang Hu, PhD
Will Lin, PhD (Rotation)
Anze Xie, Master
Longfei Yun, Master

Alumni

Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU
Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU
Dacheng Li, Master (2020) -> PhD @ UC Berkeley
Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich

Recent Talks

04/2024Talk at CMU LTI Colloquium
03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
12/2023Panel at Instruction Workshop @ NeurIPS 2023
11/2023Tutorial at ODSC West
10/2023Talk at I-X Seminar Series at Imperial College London
08/2023Talk at USC and FedML.ai
08/2023Talk at SRG Seminar, Google
07/2023Talk at Generative AI Summit, ODSC
06/2023Talk at Chinese Googler Networks Talk Series, Google
06/2023Talk at THU, PKU, SJTU, SYSU, FDU
05/2023Talk at Apple
11/2022Talk at ML Guild Seminar, Spotify
10/2022Tutorial at Sky Camp, UC Berkeley
10/2022Talk at 1st CASL Workshop, MBZUAI
08/2022Talk at Ray Summit
07/2022Tutorial at ICML 2022
07/2022Tutorial at KDD 2021
01/2021Tutorial at AAAI 2021

Experience

Assistant Professor, UC San Diego, 2023 - Present
Software Engineer, Snowflake, 2023 - Present
Postdoc, UC Berkeley, 2021 - 2023
Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)