Introduction - If you have any usage issues, please Google them yourself
Abstract—We present a domain-specific approach to generate
high-performance hardware-software partitioned implementations
of the discrete Fourier transform (DFT). The partitioning
strategy is a heuristic based on the DFT’s divide-and-conquer
algorithmic structure and fine tuned by the feedback-driven
exploration of candidate designs. We have integrated this approach
in the Spiral linear-transform code-generation framework
to support push-button automatic implementation. We present
evaluations of hardware-software DFT implementations running
on the embedded PowerPC processor and the reconfigurable
fabric of the Xilinx Virtex-II Pro FPGA.
In our experiments, the 1D and 2D DFT’s FPGA-accelerated
libraries exhibit between 2 and 7.5 times higher performance
(operations per second) and up to 2.5 times better energy
efficiency (operations per Joule) than the software-only version.