> About
Hi there! I'm Xianwei, now an Associate Professor (2020 - ) in School of Computer Science & Engineering and
National Supercomputer Center in Guangzhou/NSCC-GZ at Sun Yat-sen University,
researching on computer systems and architecture towards high-performance and intelligent computing.
During 2017-2020, I worked in AMD Inc. (Research & RTG) on architectural and software designs of compute-optimized GPUs.
Previously, I completed my Ph.D. (2017) in the Computer Science Department at University of Pittsburgh, and
obtained Bachelor's (2011) degree on Software Engineering from Northwestern Polytechnical University.
More info can be found in LinkedIn.
06/2025: [Paper] Three papers got accepted in SC'2025. Congrats to all!
06/2025: [People] Nine ARCSYSUers graduated! Cheers and all the best to future journey!
04/2025: [Paper] Two papers got accepted in Euro-Par'2025. Congrats to Mengyue and Tianyu!
04/2025: [Paper] Three LLM papers were made public on arXiv. See Bullet, gLLM and vecTrans!
03/2025: [Move] I’ll be on sabbatical-like leave in Beijing throughout 2025.
03/2025: [Talk] One invited talk was given in APAN'59!
02/2025: [Teach] YatCC-AI with Web and DeepSeek was released. Try now!
02/2025: [Paper] Two papers got accepted in DAC'2025. Congrats to Xuanteng and Kan!
01/2025: [Paper] One paper got accepted in WWW'2025. Congrats to Yuhao!
09/2024: [Talk] One invited talk was given in Huawei Connect 2024!
09/2024: [People] Six new ARCSYSUers joined, welcome on board!
08/2024: [Fund] One NSFC research grant was awarded!
07/2024: [Fund] Two CCF research funds were awarded!
07/2024: [Teach] One CCEC teaching prize was received. Check out Yat Compiler!
> Research
Topics: GPU, Compiling, HPC, Intelligent Computing, Memory System, Simulation/Modeling/Profiling
Grants: National Key R&D Program, NSFC Program, CCF-Tencent®/Huawei®/Phytium® Funds
My research interests lie broadly on hardware and software co-designs to enhance the performance, efficiency, scalability and usability of computing systems.
A particular emphasis is on GPU/heterogeneous computing and memory system design through architecture/compiler/runtime/container around the critical aspects of latency, bandwidth and portability, etc.
I currently lead the arcSYSu (ARChitecture and SYStem Upscaling @ SYSU) research team,
which is proudly part of the NSCC-GZ Interdisciplinary Research Center, directed by
Prof. Yutong Lu
and Prof. Nong Xiao.
At arcSYSu, I fortunately work with a talented team of graduate and undergraduate researchers/interns on cutting-edge reseach in computing systems.
[⭐️ Hiring!] Welcome to join us! [详见 FAQs]
[people @ arcSYSu, refining computing system uses] (#: co-advise)
2025 | Xianjie Chen[phd] | Mingen Liang[phd] | Yunhao Han | Junru Chen |
Xin Huang | ||||
2024 | Hongxin Xu | Tengyang Zheng# | Gaojin Sun | Lu Wu |
Jingyi He | Bingjie Liu | |||
2023 | Han Huang[phd]# | Zhongchun Zheng[phd]# | Mengyue Xi | Wenyuan Liang |
Hengzhong Liang | Wenxuan Pan | Aoyuan Sun | ||
2022/21 | Xuanteng Huang[phd]# | Zejia Lin[phd]# | Tianyu Guo[phd]# | Yuhao Gu[phd]# |
Kan Wu[phd]# | ||||
Ug/RA | Zheng Zhou | Haoquan Chen | Yipeng Ouyang | |
Alum. | Tianao Ge (ms22, phd@Hkust-gz) | Zewei Mo (ms22, Intel->phd@upitt) | ||
Yue Weng (ms23, Nvidia)# | Yinchuan Guo (ms24, Huawei) | |||
Lianghong Huang (ms24, MetaX) | Tianyi Zhang (ms25, ByteDance) | |||
Zhaowen Shan (ms25, Chengdu Gov) | Chun-yu Chen (ms25, JD.com) | Guanyi Chen (bs24, phd@Hkust-gz) | Yibin Luo (bs25, phd@Tsinghua) |
> Publications
[ see full publication list ]
[ ,
,
]
§ [SC'25, CCF-A]. Y. Gu, H. Chen, X. Chen, J. Du, Z. Chen, N. Xiao, X. Zhang and Y. Lu, coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability
§ [SC'25, CCF-A]. H. Han, J. Xie, N. Feng, X. Zhang, D. Huang, Z. Chen and Y. Lu, HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA
§ [SC'25, CCF-A]. T. Guo, X. Zhang, J. Du, Z. Chen, N. Xiao and Y. Lu, gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling
§ [DAC'25, CCF-A]. X. Huang, J. Du, N. Xiao and X. Zhang, PaSK: Cold Start Mitigation for Inference with Proactive and Selective Kernel Loading on GPUs
§ [DAC'25, CCF-A]. K. Wu, Z. Lin, M. Xi, Z.. Zheng, W. Pan, X. Zhang and Y. Lu, GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
§ [WWW'25, CCF-A]. Y. Gu, C. Chen, J. Du, X. Zhang and X. Zhang, ORFA: Exploring WebAssembly as a Turing Complete Query
Language for Web APIs
§ [DAC'24, CCF-A]. T. Guo, X. Huang, K. Wu, X. Zhang and N. Xiao, SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism
§ [LCTES'24]. Z. Lin, A. Sun, X. Zhang and Y. Lu, MixPert: Optimizing Mixed-precision Floating-point Emulation on GPU Integer Tensor Cores
§ [ICCD'23]. Z. Lin, Z. Mo, X. Huang, X. Zhang and Y. Lu, KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications
§ [LCTES'22]. T. Ge, Z. Mo, K. Wu, X. Zhang and Y. Lu, RollBin: Reducing Code-size via Loop Rerolling at Binary Level
> Teaching
Yat Compilation Course
- Undergraduate
§ DCS290/292 - Compilation Principle & Construction,
[25s, 24s, 23s, 22s, 21s]
( Yat-Compiler).
§ DCS3013 - Computer Architecture, [22f].
- Graduate
§ DCS5637/6207 - Advanced Computer Architecture, [24f, 23f, 22f, 21f].
> Miscellaneous
- Honors/Awards
§ [2024] CCF-CCEC Teaching Prize
§ [2022] CAST Sci&Tech Young Talent Program
§ [2019] AMD® Spotlight Award
§ [2016] Andrew Mellon Fellowship
§ [2013] Best Paper Award of ISLPED
- Services
§ [TPC] CCGrid'2025, IJCNN'2025, NAS'2024, NPC'2025/2024, HiPC'2025/2024/2023/2022, ICPADS'2022
§ [ERC] MICRO (IEEE/ACM Int'l Sym. on Microarchitecture) - 2020
§ [TPC] ICCD (IEEE Int’l Conf. on Computer Design) - 2020, 2019, 2018