About Me

I am a PhD Candidate at University of California, Riverside. Prior to the PhD study in Computer Science, I obtained MS and BS degrees from Columbia University and Peking University.

News

  • Mar 2025: Gave a talk at the Las Vegas, NV (PPoPP 2025).
  • June. 2023: I gave a talk at International Conference on Supercomputing 2023.
  • April. 2023: A paper was accepted at International Conference on Supercomputing 2023.

Education

  • Ph.D. in Computer Science (Sep. 2022 – Present)
    University of California, Riverside (UCR)
    Advisor: Prof. Zizhong Chen

  • M.S. in Electrical Engineering (Sep. 2020 – May 2022)
    Columbia University

  • B.S. in Computer Science (Sep. 2016 – Jul. 2020)
    B.S. in Economics (Double Major)
    Peking University


Research Experience

  1. USC ISI / Argonne National Laboratory (Jan. 2024 – Present)
    Los Angeles, CA / Lemont, IL
    Scientific Workflow Applications on Resilient Metasystem
    Mentors: Dr. Franck Cappello, Dr. Sheng Di, Dr. Krishnan Raghavan (ANL); Dr. Ewa Deelman (USC ISI)
    • Designed a Q-learning + GNN-based topology protocol (DGRO) that reduces network diameter by optimizing virtual rings over heterogeneous, failure-prone systems.
    • Implemented a single-hop gossip-based failure detector, resilient to network jitter and churn, enabling decentralized membership monitoring across 20+ globally distributed sites.
    • Deployed DGRO on the FABRIC testbed spanning Japan, Europe, Hawaii, and 15+ U.S. locations, demonstrating fast convergence and robustness at international scale.
  2. UCR / Lawrence Berkeley National Laboratory (Sep. 2022 – Present)
    Riverside, CA
    Data-driven Exascale Control of Optically Driven Excitations in Chemical and Material Systems
    Mentor: Dr. Zizhong Chen
    • Designed and implemented in-kernel ABFT GEMM using tensor cores, achieving higher performance than cuBLAS while ensuring fault detection and correction under soft errors.
    • Developed a fully GPU-resident ABFT FFT pipeline, outperforming cuFFT, and enabling error-resilient spectral analysis in scientific simulations.
    • Proposed the first ABFT-enabled K-means clustering framework on GPUs, exceeding cuML performance with integrated resilience support.
    • Innovated lightweight, low-overhead in-kernel fault tolerance mechanisms across linear algebra and ML workloads, demonstrating resilience-performance co-design in exascale systems.
  3. Nvidia (Jun. 2024 – Sep. 2024)
    Santa Clara, CA
    Compiler Optimization for OpenMP Target Offload on Heterogeneous GPU Architectures
    Mentor: Dr. David Appelhans
    • Investigated performance bottlenecks of OpenMP target offload in SPEChpc 2021 on GH200/H200 GPUs.
    • Developed compiler/runtime optimizations achieving up to 10× speedup without source code changes.
    • Analyzed OpenMP vs. OpenACC performance and contributed optimized versions to SPEChpc 1.1.9.
    • Work adopted by RWTH Aachen University, demonstrating both research impact and practical relevance.
  4. Columbia University / AI4Finance Foundation (Aug. 2021 – Jul. 2022)
    New York, NY
    ElegantRL: Massively Parallel Deep Reinforcement Learning Library
    Mentors: Dr. Xiaoyang Liu, Dr. Xiaodong Wang
    • Developed multi-agent RL algorithms in ElegantRL, a popular RL library with ~4k GitHub stars.
    • Co-led ElegantRL_Solver, a high-performance solver that outperforms Gurobi for dense MaxCut problems.

Selected Publications

Full list in Google Scholar

PPoPP '25
Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Sheng Di, Franck Cappello, Zizhong Chen.
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs.
PPoPP '2025: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2025.
SC '24
Jinyang Liu*, Jiannan Tian*, Shixun Wu*, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello.
cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs.
2024 SC24: International Conference for High Performance Computing, Networking, Storage and Analysis.
Cluster '24
Shixun Wu*, Yitong Ding*, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Bryan Wong, Zizhong Chen, Franck Cappello.
FT K-means: A High-Performance K-means on GPU with Fault Tolerance.
2024 IEEE International Conference on Cluster Computing (CLUSTER).
HPDC '23
Shixun Wu*, Yujia Zhai*, Jiajun Huang, Zizhe Jian, Zizhong Chen.
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs.
The 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing, Orlando, FL, USA, June 21–23, 2023. DOI: 10.1145/3588195.3595947.
ICS '23
Shixun Wu*, Yujia Zhai*, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan Wong, Zizhong Chen.
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs.
The 37th ACM International Conference on Supercomputing, Orlando, FL, USA, June 21–23, 2023. DOI: 10.1145/3577193.3593715.