International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2010)

Epochal Tsukuba, Tsukuba, Japan June 1, 2010

Actions

Invited Keynote Lecture 1 [09:00-09:40, June 1]

Reconfigurable Computing in the Multi-Core Era

Khaled Benkrid,
Senior Lecturer,
School of Engineering,The University of Edinburgh, Scotland,
United Kingdom

Since it was first announced in 1965, Moore's law has stood up the test of time, providing exponential increases in computing power for science and engineering problems. However, while this law was largely followed through increases in transistor integration levels and processor clock frequencies, this is no longer possible as heat dissipation is becoming a major hurdle in the face of further clock frequency increases, the so-called frequency wall problem.

In order to keep Moore's law going general-purpose processor chip manufacturers such as Intel and AMD are now relying on multi-core chip technology in which multiple processor cores run simultaneously on the same chip. While this is providing considerable speed-up opportunities for science and engineering problems, it is also creating a semantic gap between applications, traditionally written in sequential code, and hardware, as multi-core technologies need to be programmed in parallel in order to take advantage of their performance potential. Ironically perhaps, this problem is also opening a window of opportunity for niche computer technologies such as Field Programmable Gate Arrays (FPGAs) and Graphics Processor units (GPUs) since the problem of parallel programming has to be tackled for general-purpose processors anyway.

This talk will explore the pros and cons of a number of current computer technologies including multi-core microprocessors, FPGAs and GPUs. Comparison criteria include quantitative concerns such as speed, power consumption, and cost of purchase and development, as well as qualitative concerns such as technology maturity and forward/backward compatibility. In light of this, the talk will position reconfigurable hardware technology in the current multi-core era and speculate on the future of this technology.

Invited Keynote Lecture 2 [17:15-17:55, June 1]

Custom Computing for Efficient Acceleration of HPC Kernels

Kentaro Sano,
Associate Professor,
Graduate School of Information Sciences, Tohoku University,
Japan

The present supercomputers commonly used for scientific high-performance computation (HPC) are composed of many general-purpose microprocessors connected by an interconnection network, each of which is designed for peak arithmetic performance imbalanced with available memory performance. Although sophisticated memory subsystems such as cache memories are incorporated, the external-memory access can often be a bottleneck in performing the memory-intensive scientific computations. For example, the computational fluid dynamics (CFD) kernels based on the difference methods are ones of them, which especially require the memory bandwidth balanced with the arithmetic performance for efficiently high-performance computation. Thus, recent multicore architectures integrating more and more cores cannot allow such memory-intensive computation to fully exploit their arithmetic performance due to the limited off-chip memory bandwidth. In addition, parallel processing with a lot of processors also suffers from low utilization of their cores due to the limited network bandwidth, resulting in only a fraction of the peak performance for a large-scale system.

To address this problem, we have been focusing on the custom-computing approaches for efficient acceleration of memory-intensive HPC kernels. The reconfigurable technology gives us to repeatedly implement various circuits including arithmetic units and data paths with the same programmable-logic devices (PLDs), such as FPGAs. By utilizing PLDs as a platform, we can design and implement custom computing machines tailored to each of different kernels for efficient acceleration. For example, we can provide customized memory systems and/or network structures for kernels requiring substantial data-accesses rather than arithmetic operations. We show our custom computing architectures and their FPGA-based prototype implementations for efficient and scalable acceleration of CFD kernels, followed by the future directions of reconfigurable HPC with floating-point arithmetic.