FPGA Acceleration of large 3-D stencils
Large 3-D stencil computation over large grids represents a formidable challenge for any computing platform. In this work we managed to achieve maximum possible throughput under the off-chip DDRAM memory throughput constraint.
Published in Computational Sciences
Like
Be the first to like this
It was a grueling task, but we managed to implement a fully asynchronous data processing pipeline with 5 near-perfectly balanced pipeline stages; tilling on the host, streaming grid tiles to the FPGA, perform multiple (fused) iterations on the FPGA with maximum possible parallelism under on-chip BRAM constraints, stream results back to the host, and perform untiling & boundary conditions (using perfectly match convolutional layers) on the host. Flawless execution with complete overlap of different pipeline stages.
Follow the Topic
Hardware Performance and Reliability
Mathematics and Computing > Computer Science > Computer Hardware > Hardware Performance and Reliability
Computer and Information Systems Applications
Mathematics and Computing > Computer Science > Computer and Information Systems Applications
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in