> About
Hi there! I'm Xianwei, now an Associate Professor (2020 - ) in the School of Computer Science & Engineering
at Sun Yat-sen University,
where I conduct research on computer systems and architecture towards high-performance and intelligent computing.
During 2017-2020, I worked at AMD Inc. (Research & RTG)
on architectural and software designs of compute-optimized GPUs.
Previously, I completed my Ph.D. (2017) in the Computer Science Department
at University of Pittsburgh, and
obtained Bachelor's (2011) degree in Software Engineering from Northwestern Polytechnical University.
More info can be found on LinkedIn.
11/2025: [Paper] Two papers respectively accepted to AAAI'2026 and DATE'2026. Congrats to Xuanteng!
10/2025: [Paper] One paper accepted to ACM TACO journal. Congrats to Wenxuan!
09/2025: [Paper] One paper accepted to NeurIPS'2025. Congrats to Hongxin and Tianyu!
09/2025: [People] Five new ARCSYSUers joined the lab. Welcome aboard!
08/2025: [Talk] Two invited talks delivered at HPC China'2025!
07/2025: [Teach] YatCC-AI won the Grand Prize of CCEC. Kudos!
06/2025: [Paper] Three papers accepted to SC'2025. Congrats to Yuhao, Tianyu and Han!
06/2025: [People] Nine ARCSYSUers graduated! Cheers and all the best!
04/2025: [Paper] Two papers accepted to Euro-Par'2025. Congrats to Mengyue and Tianyu!
04/2025: [Paper] Three LLM papers released on arXiv. See Bullet, gLLM and vecTrans!
03/2025: [Move] I’ll be on sabbatical-like leave in Beijing throughout 2025.
03/2025: [Talk] One invited talk delivered at APAN'59!
02/2025: [Teach] YatCC-AI with Web and DeepSeek released. Try now!
02/2025: [Paper] Two papers accepted to DAC'2025. Congrats to Xuanteng and Kan!
01/2025: [Paper] One paper accepted to WWW'2025. Congrats to Yuhao!
09/2024: [People] Six new ARCSYSUers joined the lab. Welcome aboard!
08/2024: [Fund] One NSFC research grant awarded!
07/2024: [Fund] Two CCF research funds awarded!
07/2024: [Teach] One CCEC teaching prize received. Check out Yat Compiler!
> Research
Topics: GPU, Compiling, High-performance Computing, Intelligent Computing, Computer System
Grants: National Key R&D Program, NSFC Program, Tencent®/Huawei®/Phytium® Funds
My research interests lie broadly on hardware and software co-designs to enhance the performance, efficiency, scalability and usability of computing systems.
A particular emphasis is on GPU/heterogeneous computing system design,
covering architecture, compilers, runtime, and containers, with a focus on critical aspects such as latency, bandwidth, and portability.
I currently lead the arcSYSu (ARChitecture and SYStem Upscaling @ SYSU) research team,
which is proudly part of the Interdisciplinary Research Center (xRC), directed by
Prof. Yutong Lu
and Prof. Nong Xiao.
At arcSYSu, I am fortunate to work with a talented team of graduate and undergraduate researchers/interns on cutting-edge research in computing systems.
[⭐️ Hiring!] Welcome to join us! [vist the Lab Page, 详见 FAQs]
[people @ arcSYSu, refining computing system uses]
> Publications
[ see full publication list ]
[
,
,
]
§ [AAAI, CCF-A]. X. Huang, F. Li, R. Hu, J. Zhang, Y. Peng, Y. Zhou, F. Chen and X. Zhang,
FusedRec: Fused Embedding Communication for Distributed Recommendation Training on GPUs
§ [TACO, CCF-A]. W. Pan, Z. Lin, J. Du and X. Zhang,
HuntKTm: Hybrid Scheduling and Automatic Management for Efficient Kernel Execution on Modern GPUs
§ [NeurIPS'25, CCF-A]. H. Xu, T. Guo and X. Zhang,
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
§ [SC'25, CCF-A]. Y. Gu, H. Chen, X. Chen, J. Du, Z. Chen, N. Xiao, X. Zhang and Y. Lu,
coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability
§ [SC'25, CCF-A]. T. Guo, X. Zhang, J. Du, Z. Chen, N. Xiao and Y. Lu,
gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling
§ [SC'25, CCF-A]. H. Han, J. Xie, N. Feng, X. Zhang, D. Huang, Z. Chen and Y. Lu,
HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA
§ [DAC'25, CCF-A]. X. Huang, J. Du, N. Xiao and X. Zhang, PaSK: Cold Start Mitigation for Inference with Proactive and Selective Kernel Loading on GPUs
§ [DAC'25, CCF-A]. K. Wu, Z. Lin, M. Xi, Z.. Zheng, W. Pan, X. Zhang and Y. Lu, GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
§ [WWW'25, CCF-A]. Y. Gu, C. Chen, J. Du, X. Zhang and X. Zhang, ORFA: Exploring WebAssembly as a Turing Complete Query
Language for Web APIs
§ [DAC'24, CCF-A]. T. Guo, X. Huang, K. Wu, X. Zhang and N. Xiao, SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism
§ [LCTES'24, CCF-B]. Z. Lin, A. Sun, X. Zhang and Y. Lu, MixPert: Optimizing Mixed-precision Floating-point Emulation on GPU Integer Tensor Cores
§ [ICCD'23, CCF-B]. Z. Lin, Z. Mo, X. Huang, X. Zhang and Y. Lu, KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
§ [LCTES'22, CCF-B]. T. Ge, Z. Mo, K. Wu, X. Zhang and Y. Lu, RollBin: Reducing Code-size via Loop Rerolling at Binary Level
> Teaching
Yat Compilation Course
- Undergraduate
§ DCS290/292 - Compilation Principles & Construction,
[25s, 24s, 23s, 22s, 21s]
(
Yat-Compiler).
§ DCS3013 - Computer Architecture, [22f].
- Graduate
§ DCS5637/6207 - Advanced Computer Architecture, [24f, 23f, 22f, 21f].
> Miscellaneous
- Honors/Awards
§ [2024] CCF-CCEC Teaching Prize
§ [2022] CAST Sci&Tech Young Talent Program
§ [2019] AMD® Spotlight Award
§ [2016] Andrew Mellon Fellowship
§ [2013] Best Paper Award of ISLPED
- Services
§ [TPC] CCGrid'2025, IJCNN'2025, NAS'2024, NPC'2025/2024, HiPC'2025/2024/2023/2022, ICPADS'2025/2022
§ [ERC] MICRO (IEEE/ACM Int'l Sym. on Microarchitecture) - 2020
§ [TPC] ICCD (IEEE Int’l Conf. on Computer Design) - 2020, 2019, 2018