Designing with DSPs and FPGAs

<b><i>FPGAs Rise to Match Switched Fabric Performance</i></b>

By Brian Tithecott This article discusses the possibilities FPGAs open for designers who want to support a wide range of signal processing applications and best capitalize on serial switched fabrics’ imminent maturation. Brian contrasts DSP and FPGA capabilities and argues that today’s FPGAs offer developers the means to rapidly construct custom architectures for specific signal processing needs. A constant push to widen or eliminate performance bottlenecks characterizes embedded computing’s history. Processors, memory architectures, bus structures, interconnects, real-world I/O, software, peripherals, and network interfaces are all targets of innovative attempts to speed up whole systems by speeding up their various parts. When throughput in one segment of an embedded system wholly outpaces throughput in the rest of the system, there is little to do but wait for the other parts to catch up. Slower I/O or memory underfeeds faster processors. Faster memories cannot take advantage of buses that insert wait states. Today’s fast horse, the serial switched fabric interconnect, is waiting for signal processing engines that can take full advantage of order-of-magnitude throughput improvements in VMEbus and CompactPCI based systems. The signal processing engine that can best leverage serial switched fabrics’ imminent maturation is the Field Programmable Gate Array (FPGA). Unlike merchant DSP devices, FPGAs perfectly suit many gigabytes/sec data throughputs, just what the domain switched fabric interconnects were designed for. This is especially true for the most compute-intensive and highly parallel signal processing applications found in many defense, aerospace, and communications related embedded systems. Compared to DSPs, FPGAs are flexible, highly integrated, lower power, and competitively priced, especially on a price/performance scale. Even measured in sheer DSP multiplies, FPGAs can often execute at a full order-of-magnitude higher performance than DSPs. This performance level matches that of the improved performance serial switched fabric interconnects have brought to embedded systems. Advantage FPGA General-purpose DSPs by definition perform relatively well over a wide range of applications rather than addressing special-purpose needs. The preceding sentence reflects the compromise whereby, in order to serve a wide variety of applications, DSPs do not optimally implement high-performance or special-purpose signal processing algorithms. Simply put, the serial instruction stream inherent to microprocessor architectures limits DSP performance. DSPs do well in applications with well-defined requirements that do not need the FPGA computing platform flexibility and scalability. The ability to manipulate FPGA logic at the gate level allows designers to create a custom processor that can efficiently implement exactly the function the application requires, simultaneously performing all application subfunctions in parallel. Designers can also fully configure FPGA devices to support a wide range of signal processing applications simply by downloading the appropriate configuration bitstream. Programming FPGAs was once a laborious handcrafted process, but now tools such as the Tsunami FPGA platform from SBS Technologies signify the arrival of product and market maturity for FPGAs. Today, FPGAs resemble personal silicon foundries that allow developers to quickly design custom architectures for any particular signal processing application’s needs. Signal processing engines built from FPGAs can achieve true parallel processing, executing DSP algorithms based on the inherent parallelism of the hardware, and completely avoiding the instruction/fetch, load/store bottlenecks of traditional von Neumann microprocessor architectures found in most DSP chips. When sample rates go beyond a few MHz, DSPs that use shared resources, such as memory buses, become less and less able to transfer data without incurring frame losses. FPGAs on the other hand, dedicate logic for data I/O and are far more scalable into the higher throughput realms. FPGAs let designers assign silicon resources in a way that fully capitalizes on the concurrency inherent in many DSP algorithms. For example, if an algorithm performs N multiplications, and then sums the products of those multiplications, an FPGA can be configured to perform all N of the multiplications in parallel (in contrast, a DSP must perform the multiplications serially, followed by the sum at the end). As N becomes a large number, the advantage of FPGAs becomes even more significant. Of course, developers have traditionally used clusters of multiple DSP chips to achieve multiprocessor parallelism and address the highest-performance signal processing applications. However, using FPGAs greatly reduces power consumption in high-end embedded applications. DSP chip-based systems must drive heavily loaded buses that connect the DSPs to memory devices and to each other. The clock cycles used to fetch DSP instructions and operands from off-chip memory compound the problem by adding to the total power needed to execute any particular DSP algorithm. An FPGA’s ability to process multiple parallel data streams allows clocking, a key factor enabling efficient power consumption amortization during the algorithm execution process. Finally, but significantly, FPGAs are inherently ideal for migrating directly to ASIC implementations suited for higher volumes of product deployment, improving performance, and reducing power consumption and cost even further. DSP chips simply do not fit into the migration-to-ASIC discussion in any way. Application Benefits FPGA flexibility facilitates pipelined data flow architecture designs in which data flows from one processing element to the next inside the FPGA. This approach reduces or eliminates signal loading and requires none of the DSP’s overhead burden for fetching instructions or data. FPGA Finite Impulse Response (FIR) filter implementations, for example, can process hundreds of taps at multi-MHz rates, improving performance levels by at least an order of magnitude in wireless base station, test equipment (RADAR, SONAR, Software Defined Radio), and image processing systems applications. Other applications that can capitalize on serial switched fabric data throughput speed, and where FPGA parallelism outshines DSPs, include compute-intensive algorithms such as image processing using 2D DCT, preprocessing for medical image enhancement, centroid calculations in adaptive optics or robotics, waveform synthesis (e.g. numerically controlled oscillators), fast implementation of CORDIC (Coordinate Rotation Digital Computer) functions, digital downconversions, and very high-performance digital filters. Another embedded application that can be served by FPGAs is data encryption and reformatting. FPGAs are replacing DSPs in Synthetic Aperture Radar (SAR), Space Time Adaptive Processing (STAP), and other multiphased embedded radar applications where previous multiprocessor DSP designs required complex interconnections and substantial RAM. The FPGA based deep-pipeline approach on the other hand, uses simple systolic connections among and inside the FPGAs. Engineers need the software and hardware tools to access the power of FPGA computing. Hardware now exists with multiple FPGAs per board, and development kits include software tools that help engineers combine processing blocks together for rapid application prototyping and getting to market faster. As new FPGA devices and high-speed fabrics become available, the processing potential will continue to grow faster than that of DSP architecture. Conclusion As the data throughput bottleneck problem is solved (probably for a long time) by the accelerated proliferation of various serial switched fabric interconnect schemes, the world of embedded signal processing must have processing engines that are suitably matched. Classic DSP chips, even with their advancing clock rates, are simply too serial in nature to achieve the very high throughput needed to feed switched fabric bus architectures. FPGA devices, with their inherently scalable parallelism, are more genetically fit to take signal processing performance to the next level, occupying the same performance space as serial switched fabrics. With FPGA development tools and platforms having matured to the point of enabling short development times for FPGA signal processing designs, and vendors supplying the software tools and COTS hardware platforms needed, little reason remains not to take advantage of FPGAs for high-performance signal processing applications. Brian J. Tithecott is the director of sales and marketing, FPGA Computing Products for SBS Technologies. Brian has more than 15 years of experience in marketing and business development for technology-based companies. Prior to joining SBS Technologies, he held various positions where he managed product development and marketing programs. Brian has been at his current position for five years. Brian has an Electronics Engineering Technology diploma, a Bachelor of Science in Computer Science from the University of Western Ontario and a Master of Business Administration from Wilfrid Laurier University. For further information, contact Brian at: SBS Technologies Canada Inc. 101 Randall Drive Waterloo, Ontario Canada N2V 1C5 Tel: 519-880-8228 ext. 206 E-mail: Web site: Reprinted from CompactPCI Systems /March, 2004 Copyright 2004