International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2014)

Actions

Invited Keynote Lecture 1

Towards Reconfigurable High Performance Computing based on Co-Design Concept

Dr. Taisuke Boku,
2011 Gordon Bell Prize (Sustained Performance Prize)
Deputy Director, Professor
Center for Computational Sciences, University of Tsukuba,
Japan

Biography

Prof. Taisuke Boku received Master and PhD degrees from Department of Electrical Engineering at Keio University. After his carrier as assistant professor in Department of Physics at Keio University, he joined to Center for Computational Sciences (former Center for Computational Physics) at University of Tsukuba where he is currently the deputy director, the HPC division leader and the system manager of supercomputing resources. He has been working there more than 20 years for HPC system architecture, system software, and performance evaluation on various scientific applications. In these years, he has been playing the central role of system development on CP-PACS (ranked as number one in TOP500 in 1996), FIRST (hybrid cluster with gravity accelerator), PACS-CS (bandwidth-aware cluster) and HA-PACS (high-density GPU cluster) as the representative supercomputers in Japan. He also contributed to the system design of K Computer as a member of architecture design working group in RIKEN and currently a member of operation advisory board of AICS, RIKEN. He received ACM Gordon Bell Prize in 2011. His recent research interests include accelerated HPC systems and direct communication hardware/software for accelerators in HPC systems based on FPGA technology.

Lecture Summary

FPGA and reconfigurable hardware system has been researched as an effective solution for HPC (High Performance Computing) systems, mainly focusing on its computation capability and easy-to-design feature to fit the computation to its application characteristics. However, recent advanced technology on commodity CPU strongly pushes up the circuit frequency and absolute performance of floating point operation throughput based on SIMD (single instruction / multiple data streams) instructions and multi-core implementation on a chip while the frequency of FPGA grows relatively slowly. The approach to utilize FPGA just for computation does not have a great advantage as like as traditional works. One of the keywords in HPC fields today is "codesign" where the application request and hardware limitation must have a certain middle ground to share for the best coupling and balance between them under the limitation of power consumption. I strongly believe that the feature of FPGA should play important role in this new concept of HPC world, with its strong reconfigurability and flexibility of circuit utilization.

On the other hand, for the absolute computing performance in HPC, we need to parallelize everything in the system. To avoid the performance bottleneck exists on the data path between all components, we need a strong connectivity among them. Most of current HPC components such as CPU, GPU, accelerators, network interfaces and storage drives are connected by PCI Express today. In other words, "PCI Express rules everything." In University of Tsukuba and other collaborators have been focusing on the importance of PCI Express which can be used not just for commodity data path between components within a computation node but also for them crossing node border. For the solution on this data path problem, we are applying FPGA technology with a flexible programmability of the circuit as well as its strong IP library for various processing components including data interfaces.

In this talk, I present the effective FPGA utilization to the field of HPC which is facing serious performance issues, based on the concept of hardware/software codesign for the next generation of HPC systems. FPGA can be used both on effective computing component and communication facility to be an answer for today's HPC problems.

Invited Keynote Lecture 2

Micron’s Automata Processor Architecture:
Reconfigurable and Massively Parallel Automata Processing

[Lecture Slide]

Mr. Harold Noyes,
Senior Architect
DRAM Solutions Group, Micron Technology,
USA

Biography

Harold Noyes joined Micron Technology's DRAM Solutions Group in 2007 as the senior architect (hardware), working on the Automata Processor investigation and development. Prior to joining Micron, Mr. Noyes held a variety of research and development positions with Hewlett-Packard Company (25 years) and Freescale Semiconductor (2 years). His experience spans both engineering and project management roles, including Automata Processor architecture development, printed circuit assembly design, electromagnetic and regulatory compliance, modem design, full custom silicon design, ASIC design, and technical writing. Mr. Noyes earned a B.S. in electrical engineering from Utah State University.

Lecture Summary

Frequency scaling and architectural enhancements traditionally provided by Moore's Law are no longer adequately addressing computationally intensive problems. Multicore processing architectures, capable of increasing performance in certain applications, fall short when it comes to unstructured data set processing and algorithms that are not easily modified for parallel execution. Reconfigurable silicon architectures, with purpose-built machines controlled by traditional CPUs, take a somewhat similar approach but are limited by the practical considerations of power, cost, size, and speed.

Micron's Automata Processor architecture overcomes many of the obstacles facing modern von Neumann architectures and is poised to play an important role in solving some of the most challenging computational problems. It uses memory-based reconfigurable technology to create purpose-built, data-driven machines called automata. Inherent to the architecture is the ability to operate all of these automata—thousands or even millions— completely in parallel.

This presentation will cover the Automata Processor architecture and programming model, as well as the software development kit. Examples of possible applications will also be presented, along with the associated cost, power, and performance improvements that are anticipated.

Invited Keynote Lecture 3

Towards a Scalable and Configurable Accelerator

[Lecture Slide]

Dr. Simon See,
Director and Chief Solution Architect
Nvidia Inc. Asia Pacific,
Singapore

Biography

Dr Simon See is currently the High Performance Computing Technology Director and Chief Solution Architect for Nvidia Inc, Asia and also an Professor and Chief Scientific Computing officer in Shanghai Jiao Tong University . Concurrently A/Prof See will also be the chief scientific computing advisor for BGI (China). His research interest is in the area of High Performance Computing, computational science, Applied Mathematics and simulation methodology. He has published over 100 papers in these areas and have won various awards. Dr. See graduated from University of Salford (UK) with a Ph.D. in electrical engineering and numerical analysis in 1993. Prior to joining Sun, Dr See worked for SGI, DSO National Lab. of Singapore, IBM and International Simulation Ltd (UK), Sun Microsystems and Oracle. He is also providing consultancy to a number of national research and supercomputing centers.

Lecture Summary

In the last few years, high performance computing community have been increasing adopting accelerator such as GPU. Accelerator allows one to scale up and increase performance within a certain power envelope. However in order to scale to large systems and address different type of applications, one has to design systems that configurable and scale. In this talk, the author will discuss some of the ideas of the next generation GPU.