About me
I am currently a 5th year Ph.D. candidate in the Department of Electrical and Computer Engineering, Cornell University @ Computer System Lab, advised by Prof. Zhiru Zhang.
My research focuses on high-performance computing and performance optimization across heterogeneous devices such as GPUs and NPUs, targeting both AI and scientific applications. I am also actively exploring agentic workflows for automating performance engineering and system optimization.
Education
Cornell University · Sep. 2021 – Present
Ph.D. candidate in Electrical and Computer Engineering
Cornell University · Sep. 2021 – Dec. 2025
M.S. in Electrical and Computer Engineering
Tsinghua University · Sep. 2016 – Jun. 2020
B.E. in Electronic Engineering
Academic Research
Cornell University · Sep. 2021 – Present
Ph.D. candidate, Zhang Research Group, Computer Systems Lab
Advisor: Prof. Zhiru Zhang
Agentic kernel generation and optimization on heterogeneous devices — exploring how LLM agents can autonomously generate and tune high-performance kernels across modern accelerators (GPUs, NPUs).
Rapid GPU-Based Pangenome Graph Layout — proposed the first GPU-based solution for pangenome graph layout, achieving an average 57.3× speedup over the state-of-the-art CPU implementation and enabling minute-scale layout of the entire human chromosome dataset; integrated into the pangenome analysis pipeline ODGI.
Analysis and Optimization of GNN-Based Recommender Systems on Persistent Memory — characterized and optimized GNN-based recommender workloads on persistent-memory hardware.
UCLA · Jun. 2019 – Sep. 2019
Research Intern, VAST Lab
Advisor: Prof. Jason Cong
HeteroHalide: An End-to-End Compilation System from Image Processing DSL to Efficient FPGA Acceleration
- Proposed HeteroHalide, an end-to-end compilation system from Halide to FPGA accelerators, with a Halide-to-HeteroCL code generator and scheduling extensions that emit lower-level primitives at the spatial-architecture backend.
- Demonstrated that the generated FPGA accelerators outperformed both multi-core CPU baselines and the state-of-the-art Halide-to-FPGA compiler, while significantly reducing migration effort from Halide.
Tsinghua University · Nov. 2018 – Jun. 2019
Research Assistant, NICS-EFC Lab (Energy Efficient Computing Group)
Advisor: Prof. Yu Wang
Hardware-Friendly Neural Network Training Algorithm Optimization
- Quantified the impact of low-bit-width quantization and neural network pruning on model on GPUs.
- Applied model distillation for model compression and studied how varying network sizes affect accuracy.
Industry Experience
AMD Research and Advanced Development · Jan 2026 – May 2026
Ph.D. Research Associate, AMD RAD
Mentors: Erwei Wang, Samuel Bayliss
Agent Skill System for End-to-End LLM Deployment on Spatial NPUs
- Mapped and optimized an LLM end-to-end on the AMD XDNA™ NPU, outperforming the existing open-source baseline.
- Built an agent skill system for automating end-to-end LLM deployment on the AMD XDNA™ NPU.
ByteDance · Aug 2024 – Dec 2024
Research Scientist Intern, Machine Learning System team, Seed Foundation
Mentors: Wenlei Bao, Li-Wen Chang
Benchmarking Optimized LLM Kernels
- Benchmarked LLM kernels across cuBLAS, Triton, and CUTLASS over different problem sizes, data types, and GPU architectures.
- Gained insights into low-level optimizations, particularly warp specialization with TMA on Hopper GPUs.
NVIDIA · May 2024 – Aug 2024
Deep Learning Training Performance Intern, End-to-End Training Performance team (working on MLPerf-Training)
Mentors: Rachit Garg, Burc Eryilmaz
LLM Training Toolbox: Memory Footprint Analyzer & Config-Shmooer
- Built a memory footprint analyzer for peak-memory debugging and leak detection in LLM training, integrated into NeMo and the MLPerf training pipeline.
- Built an autotuner that searches the large training config space (TP/PP/CP sizes, TP overlapping configs, etc.) for the best setup given a model and hardware.
Alibaba DAMO Academy · Jul. 2020 – Jan. 2021
Research Intern, Computing Technology Lab
Mentor: Yuanwei Fang
Micro-architecture Aware Neural Program Embedding
- Proposed Neural Program Sampling (NPS), a novel framework that provides high-resolution execution embeddings for accurate program sampling.
- Built the NPS-gem5 evaluation testbed by enhancing gem5 to report detailed-simulation statistics at specific instruction intervals, enabling fast and flexible simulation.
Publications
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Hongzheng Chen*, Yingheng Wang*, Yaohui Cai*, Hins Hu*, Jiajie Li*, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang. (*core contributors)
[ICLR’26]. The International Conference on Learning Representations, 2026.
Dato: A Task-Based Programming Model for Dataflow Accelerators
Shihan Fang, Hongzheng Chen, Niansong Zhang, Jiajie Li, Han Meng, Adrian Liu, Zhiru Zhang.
arXiv:2509.06794, 2025.
Rapid GPU-Based Pangenome Graph Layout
Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang.
[SC’24]. The International Conference for High Performance Computing, Networking, Storage, and Analysis, 2024.
Pangenome graph layout by Path-Guided Stochastic Gradient Descent
Simon Heumos, Andrea Guarracino, Jan-Niklas M Schmelzle, Jiajie Li, Zhiru Zhang, Jörg Hagmann, Sven Nahnsen, Pjotr Prins, Erik Garrison.
[Bioinformatics]. Volume 40, Issue 7, July 2024.
NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie.
arXiv:2304.08880, 2023.
Analysis and Optimization of GNN-Based Recommender Systems on Persistent Memory
Yuwei Hu, Jiajie Li, Zhongming Yu, Zhiru Zhang.
arXiv:2207.11918, 2022.
HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration
Jiajie Li, Yuze Chi, Jason Cong.
[FPGA’20]. 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020.
Teaching
Cornell University · Aug. 2023 – Dec. 2023
Teaching Assistant, ECE 2300 Digital Logic and Computer Organization
Services
Student Volunteer at FCCM’22
Last updated: May 2026
