Discovery II Building
8888 University Dr.
Simon Fraser University
Burnaby, BC V5A 1S6
Email: weihua_liu [at] sfu.ca
I am a Master student in Computer Engineering at the Simon Fraser University, under the supervision of Dr. Zhenman Fang. I finished my bachelor's degree in computer engineering at Marquette University(2013-2017). Currently, I worked as a research Assistant at HiAccel lab. My research interests mainly focused on FPGA-based hardware accelerator design, and heterogeneous computing.
Apart from research, I also have strong interst in sports, especially basketball and snowboarding. I wish in the coming futre, I can record some interesting clips to show you the beauty of sports!
C1 |
Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking FPGA '21
With the public availability of FPGAs from major cloud service providers like AWS, Alibaba, and Nimbix, hardware and software developers
can now easily access FPGA platforms. However, it is nontrivial to develop efficient FPGA accelerators, especially for software programmers
who use high-level synthesis (HLS).
The major goal of this paper is to figure out how to efficiently access the memory system of modern datacenter FPGAs in HLS-based accelerator designs.
This is especially important for memory-bound applications; for example, a naive accelerator design only utilizes less than 5% of the available off-chip
memory bandwidth. To achieve our goal, we first identify a comprehensive set of factors that affect the memory bandwidth, including 1) the number of
concurrent memory access ports, 2) the data width of each port, 3) the maximum burst access length for each port, and 4) the size of
consecutive data accesses. Then we carefully design a set of HLS-based microbenchmarks to quantitatively evaluate the performance of the Xilinx Alveo U200
and U280 FPGA memory systems when changing those affecting factors, and provide insights into efficient memory access in HLS-based accelerator designs.
To demonstrate the usefulness of our insights, we also conduct two case studies to accelerate the widely used K-nearest neighbors (KNN) and sparse
matrix-vector multiplication (SpMV) algorithms. Compared to the baseline designs, optimized designs leveraging our insights achieve about 3.5x and
8.5x speedups for the KNN and SpMV accelerators.
@inproceedings{luChipKNN,
title={Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking},
author={Alec Lu and Zhenman Fang and Weihua Liu and Lesley Shannon},
year={2021},
booktitle = {2021 International Symposium on Field-Programmable Gate Arrays},
series = {FPGA ’21},
location = {Virtual Conference},
numpages = {11},
pages = {}
}
|
Please contact me via email: weihua_liu [at] sfu.ca