Teraflops Research Chip

From testwiki
Jump to navigation Jump to search

Template:Infobox CPU

Intel Teraflops Research Chip (codenamed Polaris) is a research manycore processor containing 80 cores, using a network-on-chip architecture, developed by Intel's Tera-Scale Computing Research Program.[1] It was manufactured using a 65 nm CMOS process with eight layers of copper interconnect and contains 100 million transistors on a 275 mm2 die.[2][3][4] Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0 TFLOPS while dissipating less than 100 W.[3] Research from the project was later incorporated into Xeon Phi. The technical lead of the project was Sriram R. Vangal.[4]

The processor was initially presented at the Intel Developer Forum on September 26, 2006[5] and officially announced on February 11, 2007.[6] A working chip was presented at the 2007 IEEE International Solid-State Circuits Conference, alongside technical specifications.[2]

Architecture

The chip consists of a 10x8 2D mesh network of cores and nominally operates at 4 GHz.[nb 1] Each core, called a tile (3 mm2), contains a processing engine and a 5-port wormhole-switched router (0.34 mm2) with mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.[2] The processing engine in each tile contains two independent, 9-stage pipeline, single-precision floating-point multiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.[3] Each FPMAC unit is capable of performing 2 single-precision floating-point operations per cycle. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit very long instruction word (VLIW) encodes up to eight operations per cycle.[3] The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.[4] Underneath each tile, a 256 KB SRAM module (codenamed Freya) was 3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.[7] The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.[8]

Teraflops Research Chip's tile diagram.

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.[4][9]

Instruction types and their latency[4]
Instruction type Latency (cycles)
FPMAC 9
LOAD/STORE 2
SEND/RECEIVE 2
JUMP/BRANCH 1
STALL/WFD ?
SLEEP/WAKE 6
Application performance of Teraflops Research Chip[nb 2][4]
Application FLOP count TFLOPSavg %TFLOPSpeak Active tiles
Stencil 358K 1.00 73.3% 80
SGEMM:

Matrix multiplication

2.63M 0.51 37.5% 80
Spreadsheet 64.2K 0.45 33.2% 80
2D FFT 196K 0.02 2.73% 64
Experimental results of the Teraflops Research Chip[nb 3]
VCC fmax[nb 4] TFLOPSpeak[nb 5] Power[nb 6] T Source
0.60 V 1.0 GHz 0.32 TFLOPS 11 W 110 °C [2]
0.675 V 1.0 GHz 0.32 TFLOPS 15.6 W 80 °C [4]
0.70 V 1.5 GHz 0.48 TFLOPS 25 W 110 °C [2]
0.70 V 1.35 GHz 0.43 TFLOPS 18 W 80 °C [4]
0.75 V 1.6 GHz 0.51 TFLOPS 21 W 80 °C [4]
0.80 V 2.1 GHz 0.67 TFLOPS 42 W 110 °C [2]
0.80 V 2.0 GHz 0.64 TFLOPS 26 W 80 °C [4]
0.85 V 2.4 GHz 0.77 TFLOPS 32 W 80 °C [4]
0.90 V 2.6 GHz 0.83 TFLOPS 70 W 110 °C [2]
0.90 V 2.85 GHz 0.91 TFLOPS 45 W 80 °C [4]
0.95 V 3.16 GHz 1.0 TFLOPS 62 W 80 °C [4]
1.00 V 3.13 GHz 1.0 TFLOPS 98 W 110 °C [2]
1.00 V 3.8 GHz 1.22 TFLOPS 78 W 80 °C [4]
1.05 V 4.2 GHz 1.34 TFLOPS 82 W 80 °C [4]
1.10 V 3.5 GHz 1.12 TFLOPS 135 W 110 °C [2]
1.10 V 4.5 GHz 1.44 TFLOPS 105 W 80 °C [4]
1.15 V 4.8 GHz 1.54 TFLOPS 128 W 80 °C [4]
1.20 V 4.0 GHz 1.28 TFLOPS 181 W 110 °C [2]
1.20 V 5.1 GHz 1.63 TFLOPS 152 W 80 °C [4]
1.25 V 5.3 GHz 1.70 TFLOPS 165 W 80 °C [4]
1.30 V 4.4 GHz 1.39 TFLOPS ? 110 °C [2]
1.30 V 5.5 GHz 1.76 TFLOPS 210 W 80 °C [4]
1.35 V 5.67 GHz 1.81 TFLOPS 230 W 80 °C [4]
1.40 V 4.8 GHz 1.52 TFLOPS ? 110 °C [2]

Issues

Intel aimed to help software development for the new exotic architecture by creating a new programming model, especially for the chip, called Ct. The model never gained the following Intel hoped for and has been eventually incorporated into Intel Array Building Blocks, a now defunct C++ library.

See also

Notes

Template:Reflist

References

Template:Reflist


Cite error: <ref> tags exist for a group named "nb", but no corresponding <references group="nb"/> tag was found