Instructions for Authors of Papers Submitted for Publication


Instructions for Authors of Papers Submitted for Publication
FPGAs in 2005 and Beyond
P. Alfke
Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124, USA
[email protected]
This paper takes a broad look at Field Programmable Logic,
first from a business perspective, and then in technical details.
FPGA progress is spearheaded by two intensely competitive
manufacturers, Xilinx and Altera. Their technical directions
are similar, which leaves some room for the minor
competitors to exploit protected niches. All FPGAs take
advantage of the rapid technical evolution commonly
described as Moore’s Law, and they offer an attractive
alternative to the exploding cost of ASIC designs.
Power consumption has evolved as a major issue, as chips
migrate to smaller geometries and implement more logic and
run at ever higher clock rates. As an example of cutting-edge
FPGA technology, this paper describes several technical
improvements in the Virtex-4 family from Xilinx. The
conclusion offers general recommendation for designing with
FPGAs, and a glimpse at evolutionary developments in the
near future.
In 2005, Xilinx is the market leader with a share of 51%,
Altera has 33%, and the “others” (Lattice, Actel, Quicklogic,
Atmel) divide the remaining 16%. Seven years ago, Xilinx
and Altera each had 31%, while the “others” held 39%.
During these seven years of market swings, Altera maintained
about 32%, but Xilinx grew to 51%, all at the expense of the
“others”, who lost more than half their market share. This is
typical for a maturing market, increasingly dominated by the
two top players. All the non-dedicated former participants
have given up (Intel, T.I., Motorola, NSC, AMD, Cypress,
Philips), while most of the recent newcomers have been very
unsuccessful. Xilinx and Altera are defining the FPGA market
The popularity of FPGAs is partially due to the rapidly
rising cost of IC design (including ASICs), which grew from
<$1M for 130 nm technology, to >$10M at 130 nm, and now
to >$20M at 90 nm, for each completed design.
FPGA must offer high performance and versatility, low
cost, reasonable power consumption, capable and userfriendly tools, competent and helpful support, industry-
standard I/O capabilities, as well as various size, speed and
package options. Less interesting for the big suppliers are
specialized niches: single-chip non-volatile designs, one-timeprogrammable antifuse chips with alleged (but not perfect)
security, as well as ultra-low power operation (which
unfortunately is incompatible with the newest, fastest and
cheapest technology).
The PLD market, presently about $5B per year, has an
opportunity to capture part of the much bigger (>$15B each)
ASIC and ASSP markets, provided the FPGAs can offer
competitive performance and price. Few technologies in the
world have such an almost unlimited opportunity for growth.
The driving force for innovation is not only Moore’s
Law, which gets us from 90 nm to 65 nm in 2006, and to 45
nm in a few years. But innovative circuit design and chip
architecture also contribute, as does better design software
and the availability of cores (intellectual property). Innovative
system design can explore the massive parallelism and
(dynamic) re-configurability available only in FPGAs.
Moore’s Law gives us 90 nm today, 65 nm in 2006, and
45 nm later. Low defect density on 300 mm wafers achieves
high yield and low cost. However, 3.3-V compatibility will
become problematic, and leakage current is increasing. The
Xilinx triple-oxide (MidOx) process provides some relief.
Battery-operated systems care about operating life per
battery charge. Static (leakage) current dominates and rises
exponentially with temperature. Between 1997 and 2002,
leading-edge technology advanced from 250 nm to 130 nm,
causing leakage current to increase by a factor of 100,000
(from microamps to amps). Virtex-4 uses innovative methods
in processing and circuit design to fight this trend, and reduce
leakage current.
Plug-in-the-wall systems are primarily concerned about
IC junction temperature. Dynamic current dominates, and it is
very voltage dependent. A variety of methods are being used
to reduce dynamic power consumption.
There are three sub-families, with a total of 17 devices being
offered today.
• LX has logic, BlockRAM, multiply-accumulate blocks,
versatile clock management and enhanced I/O.
• SX has the same features, but relatively more arithmetic
blocks and BlockRAMs, and less logic. That makes this subfamily ideal for DSP applications.
• FX adds “hard” microprocessor and Ethernet controller
cores, as well as ultra-fast multi-gigabit transceivers, running
at up to 10 Gbps.
With their overlapping capabilities, these three sub-families
use the same basic structure and column-based architecture,
optimized for flip-chip packaging. These packages provide
improved pc-board signal integrity by placing supply pins
always close to signal pins, and by offering high-quality
decoupling capacitors inside the package. Compared to
traditional packages and pin-outs, Virtex-4 reduces the signalpin inductance from 16 nH to 5 nH, and reduces the
simultaneously-switching output noise by a factor six. This
has been demonstrated in side-by-side comparisons against
competing devices.
Dedicated “hard” cores offer higher performance, smaller
area, lower power consumption, and less design effort and
A. 18 x 18 bit multiplier + 48-bit accumulator
This is a full custom (but configurable) design, running at
500 MHz with low power consumption. Pipeline registers and
cascade logic increase flexibility and performance.
D. Multi-Gigabit transceivers.
For fastest bit-serial I/O, Virtex-4 FX devices offer fullduplex up to 11 Gbps operation using serializer/deserializers,
with an up to 40 bit wide internal parallel interface, with
8B10B or 64B66B encoding/decoding, FIFOs, CRC, and
loop-back test support for each of the 4, 8, 12 or 20 channels
per FX device. The electrical interface has several levels of
output pre-emphasis and input equalization to cope with
various pc-board characteristics, trace lengths, and number of
E. Microprocessor and Ethernet controller.
Each of the six FX devices has one or more PowerPC 405
microprocessors and Ethernet controllers, offering much
higher performance than popular “soft” microprocessor
implementations (MicroBlaze or NIOS), while saving chip
area and power consumption. Similarily, the 10-100-1000
Ethernet Media Access Controller saves thousands of logic
slices, compared to a traditional soft implementation.
Always design architecture-specific, since most recent
improvements come from family-specific features and hard
cores. A non-dedicated design approach would sacrifice
performance as well as cost.
Use pipelining whenever possible. Flip-flops are
abundant and offer a free performance boost whenever
increased latency can be tolerated
Use parallel structures for highest performance, or use
serial structures for lowest cost. There is usually more than
one way to implement the desired function. Pick the best one.
Design synchronously, always use global low-skew
clocks. They guarantee operation free of hold-time issues.
B. Dual-ported BlockRAM with FIFO control
Virtex-4 adds built-in Hamming error correction and
retains the “read-previous-data-during-write” option. Two
18K-bit BlockRAMs can be combined without speed loss.
The FIFO offers 500 MHz dual-clock (asynchronous)
operation, tested for >10e14 error-free “going empty” cycles
at full speed. The integration of this popular function saves
area and power, and doubles the achievable speed. Even more
importantly, it saves the user from tricky design decisions in
the unfamiliar and treacherous area of asynchronous design.
C. Advanced input / output
All pins operate at up to 600 Mbps single-ended, or up to
1 Gbps differentially, using LVDS. The I/O supports many
standards: PCI, PCI-X, SFI-4, HSTL, SSTL, LVCMOS.
Built-in serial termination is optional. 3.3 V, 2.5 V, or 1.8 V
supply voltage can be used for every pin. Each pin offers
ChipSync: a serializer/deserializer and clock divider, as well
as a precision 64-tap input delay line with a time-granularity
of 70 ps, supporting system-synchronous interfaces.
Moore’s Law will lead us to 45 nm technology with
lower cost and slightly higher performance in the logic fabric.
More “hard” cores will significantly increase performance and
density, lowering cost and power consumption for these
functions, equivalent to the best features that ASICs might
offer, but avoiding the well-known ASIC problems.
Massive parallelism enhances DSP performance far
beyond what is available in dedicated DSP circuits.
Dynamic reconfiguration offers unique advantages and can
lower cost and power.
There will be innovative solutions to the leakage
current problem, and there is always room for some positive
The future for FPGAs looks very bright