Instructions for Authors of Papers Submitted for Publication
Transkrypt
Instructions for Authors of Papers Submitted for Publication
FPGAs in 2005 and Beyond P. Alfke Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124, USA [email protected] Abstract This paper takes a broad look at Field Programmable Logic, first from a business perspective, and then in technical details. FPGA progress is spearheaded by two intensely competitive manufacturers, Xilinx and Altera. Their technical directions are similar, which leaves some room for the minor competitors to exploit protected niches. All FPGAs take advantage of the rapid technical evolution commonly described as Moore’s Law, and they offer an attractive alternative to the exploding cost of ASIC designs. Power consumption has evolved as a major issue, as chips migrate to smaller geometries and implement more logic and run at ever higher clock rates. As an example of cutting-edge FPGA technology, this paper describes several technical improvements in the Virtex-4 family from Xilinx. The conclusion offers general recommendation for designing with FPGAs, and a glimpse at evolutionary developments in the near future. I. THE MARKET FOR PROGRAMMABLE LOGIC In 2005, Xilinx is the market leader with a share of 51%, Altera has 33%, and the “others” (Lattice, Actel, Quicklogic, Atmel) divide the remaining 16%. Seven years ago, Xilinx and Altera each had 31%, while the “others” held 39%. During these seven years of market swings, Altera maintained about 32%, but Xilinx grew to 51%, all at the expense of the “others”, who lost more than half their market share. This is typical for a maturing market, increasingly dominated by the two top players. All the non-dedicated former participants have given up (Intel, T.I., Motorola, NSC, AMD, Cypress, Philips), while most of the recent newcomers have been very unsuccessful. Xilinx and Altera are defining the FPGA market direction. The popularity of FPGAs is partially due to the rapidly rising cost of IC design (including ASICs), which grew from <$1M for 130 nm technology, to >$10M at 130 nm, and now to >$20M at 90 nm, for each completed design. II. MARKET REQUIREMENTS FPGA must offer high performance and versatility, low cost, reasonable power consumption, capable and userfriendly tools, competent and helpful support, industry- standard I/O capabilities, as well as various size, speed and package options. Less interesting for the big suppliers are specialized niches: single-chip non-volatile designs, one-timeprogrammable antifuse chips with alleged (but not perfect) security, as well as ultra-low power operation (which unfortunately is incompatible with the newest, fastest and cheapest technology). III. FPGA GROWTH OPPORTUNITIES The PLD market, presently about $5B per year, has an opportunity to capture part of the much bigger (>$15B each) ASIC and ASSP markets, provided the FPGAs can offer competitive performance and price. Few technologies in the world have such an almost unlimited opportunity for growth. The driving force for innovation is not only Moore’s Law, which gets us from 90 nm to 65 nm in 2006, and to 45 nm in a few years. But innovative circuit design and chip architecture also contribute, as does better design software and the availability of cores (intellectual property). Innovative system design can explore the massive parallelism and (dynamic) re-configurability available only in FPGAs. IV. IC TECHNOLOGY Moore’s Law gives us 90 nm today, 65 nm in 2006, and 45 nm later. Low defect density on 300 mm wafers achieves high yield and low cost. However, 3.3-V compatibility will become problematic, and leakage current is increasing. The Xilinx triple-oxide (MidOx) process provides some relief. V. POWER CONSUMPTION Battery-operated systems care about operating life per battery charge. Static (leakage) current dominates and rises exponentially with temperature. Between 1997 and 2002, leading-edge technology advanced from 250 nm to 130 nm, causing leakage current to increase by a factor of 100,000 (from microamps to amps). Virtex-4 uses innovative methods in processing and circuit design to fight this trend, and reduce leakage current. Plug-in-the-wall systems are primarily concerned about IC junction temperature. Dynamic current dominates, and it is very voltage dependent. A variety of methods are being used to reduce dynamic power consumption. VI. VIRTEX-4, LEADING IN 90 NM TECHNOLOGY There are three sub-families, with a total of 17 devices being offered today. • LX has logic, BlockRAM, multiply-accumulate blocks, versatile clock management and enhanced I/O. • SX has the same features, but relatively more arithmetic blocks and BlockRAMs, and less logic. That makes this subfamily ideal for DSP applications. • FX adds “hard” microprocessor and Ethernet controller cores, as well as ultra-fast multi-gigabit transceivers, running at up to 10 Gbps. With their overlapping capabilities, these three sub-families use the same basic structure and column-based architecture, optimized for flip-chip packaging. These packages provide improved pc-board signal integrity by placing supply pins always close to signal pins, and by offering high-quality decoupling capacitors inside the package. Compared to traditional packages and pin-outs, Virtex-4 reduces the signalpin inductance from 16 nH to 5 nH, and reduces the simultaneously-switching output noise by a factor six. This has been demonstrated in side-by-side comparisons against competing devices. VII. MORE EFFICIENT DEDICATED CORES Dedicated “hard” cores offer higher performance, smaller area, lower power consumption, and less design effort and risk. A. 18 x 18 bit multiplier + 48-bit accumulator This is a full custom (but configurable) design, running at 500 MHz with low power consumption. Pipeline registers and cascade logic increase flexibility and performance. D. Multi-Gigabit transceivers. For fastest bit-serial I/O, Virtex-4 FX devices offer fullduplex up to 11 Gbps operation using serializer/deserializers, with an up to 40 bit wide internal parallel interface, with 8B10B or 64B66B encoding/decoding, FIFOs, CRC, and loop-back test support for each of the 4, 8, 12 or 20 channels per FX device. The electrical interface has several levels of output pre-emphasis and input equalization to cope with various pc-board characteristics, trace lengths, and number of connectors. E. Microprocessor and Ethernet controller. Each of the six FX devices has one or more PowerPC 405 microprocessors and Ethernet controllers, offering much higher performance than popular “soft” microprocessor implementations (MicroBlaze or NIOS), while saving chip area and power consumption. Similarily, the 10-100-1000 Ethernet Media Access Controller saves thousands of logic slices, compared to a traditional soft implementation. VIII. RECOMMENDATIONS FOR DESIGNING WITH FPGAS Always design architecture-specific, since most recent improvements come from family-specific features and hard cores. A non-dedicated design approach would sacrifice performance as well as cost. Use pipelining whenever possible. Flip-flops are abundant and offer a free performance boost whenever increased latency can be tolerated Use parallel structures for highest performance, or use serial structures for lowest cost. There is usually more than one way to implement the desired function. Pick the best one. Design synchronously, always use global low-skew clocks. They guarantee operation free of hold-time issues. B. Dual-ported BlockRAM with FIFO control Virtex-4 adds built-in Hamming error correction and retains the “read-previous-data-during-write” option. Two 18K-bit BlockRAMs can be combined without speed loss. The FIFO offers 500 MHz dual-clock (asynchronous) operation, tested for >10e14 error-free “going empty” cycles at full speed. The integration of this popular function saves area and power, and doubles the achievable speed. Even more importantly, it saves the user from tricky design decisions in the unfamiliar and treacherous area of asynchronous design. C. Advanced input / output All pins operate at up to 600 Mbps single-ended, or up to 1 Gbps differentially, using LVDS. The I/O supports many standards: PCI, PCI-X, SFI-4, HSTL, SSTL, LVCMOS. Built-in serial termination is optional. 3.3 V, 2.5 V, or 1.8 V supply voltage can be used for every pin. Each pin offers ChipSync: a serializer/deserializer and clock divider, as well as a precision 64-tap input delay line with a time-granularity of 70 ps, supporting system-synchronous interfaces. IX. 2006 AND BEYOND Moore’s Law will lead us to 45 nm technology with lower cost and slightly higher performance in the logic fabric. More “hard” cores will significantly increase performance and density, lowering cost and power consumption for these functions, equivalent to the best features that ASICs might offer, but avoiding the well-known ASIC problems. Massive parallelism enhances DSP performance far beyond what is available in dedicated DSP circuits. Dynamic reconfiguration offers unique advantages and can lower cost and power. There will be innovative solutions to the leakage current problem, and there is always room for some positive surprises. The future for FPGAs looks very bright