International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2011)

HOME CONTACTS HEART HISTORY

Imperial College London, London, United Kingdom June 2-3, 2011

Actions

Home
Committees
Workshop Program
Supporters
Keynotes

Call for papers

Paper submission

Presentation guidelines

Author notification

Post proceedings

Registration

Hotel accommodation
Contact
HEART history
Post proceedings
- HEART2011
- HEART2010

Invited Keynote Lecture 1 [09:00-10:00, June 2]

The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs or How to Avoid the Heroics of GPU Programming

Miriam Leeser,
Professor,
Northeastern University,
United States of America

Lecture Summary

We live in the age of heroic programming for scientific applications on Graphics Processing Units (GPUs). Typically a scientist chooses an application to accelerate and a target platform, and through great effort maps their application to that platform. If they are a true hero, they achieve two or three orders of magnitude speedup for that application and target hardware pair. The effort required includes a deep understanding of the application, its implementation and the target architecture. When a new, higher performance architecture becomes available additional heroic acts are required.

There is another group of scientists who prefer to spend their time focused on the application level rather than lower levels. These scientists would like to use GPUs for their applications, but would prefer to have parameterized library components available that deliver high performance without requiring heroic efforts on their part. The library components should be easy to use and should support a wide range of user input parameters. They should exhibit good performance on a range of different GPU platforms, including future architectures. Our research focuses on creating such libraries.

We have been investigating parameterized library components for use with Matlab/Simulink and with the SCIRun Biomedical Problem Solving Environment from the University of Utah. In this talk I will discuss our library development efforts and challenges to achieving high performance across a range of both application and architectural parameters.

I will also focus on issues that arise in achieving correct behavior of GPU kernels. One issue is correct behavior with respect to thread synchronization. Another is knowing whether or not your scientific application that uses floating point is correct when the results differ depending on the target architecture and order of computation.

Biography

Professor Miriam Leeser received the BS degree in Electrical Engineering from Cornell University and the Diploma and PhD in Computer Science from Cambridge University, England. In 1992, she received a National Science Foundation CAREER award to conduct research into floating point arithmetic. She has been on the faculty of Northeastern since 1996, where she is head of the Reconfigurable and GPU Computing Laboratory and a member of the computer engineering research group and the Center for Communications and Digital Signal Processing. She is conducting research into accelerating image and signal processing applications with nontraditional computer architectures, including FPGAs, GPUs, and the Cell Broadband Engine. Her research includes building parameterized libraries and tools that enable application programmers to make use of highly optimized implementations developed for these platforms.

Invited Keynote Lecture 2 [09:00-10:00, June 3]

Surviving the end of frequency scaling with reconfigurable dataflow computing

Oliver Pell,
Vice-president,
Maxeler Technologies
United Kingdom

Lecture Summary

Microprocessors have been hitting the limits of attainable clock frequencies for the past few years, resulting in the current multi-core processor solutions provided by the major microprocessor vendors. Multiple cores on a chip result in the need to share the same pins to get to the memory system and communication channels to other machines. This leads to a memory wall, since the number of pins per chip does not scale with the number of cores, and a power wall, since chips must still be cooled within the same physical space.

Many important applications in fields as diverse as earth science and finance exhibit significantly worse than linear scaling on multiple cores, a problem that is only going to worsen as the major microprocessor vendors move beyond quad/six-core chips to many-core architectures. At the same time, programmers must now grapple with an even more complex programming model of parallelism at the core, chip, node and cluster level.

Dataflow computing offers a way to avoid the memory and power walls. Although the concept has been around for many decades, until recently dataflow solutions have been lagging the performance of state-of-the-art supercomputers. At Maxeler we are bridging the gap between research and production quality systems, delivering complete solutions for high-performance computing applications utilizing reconfigurable dataflow engines. This talk will discuss details of our reconfigurable computing solutions, the dataflow programming model, and some example applications.

Biography

Oliver Pell is VP of Engineering at Maxeler Technologies. His experience ranges from accelerating reverse time migration and Lattice-Boltzmann simulations to credit derivatives pricing. He is currently responsible for the technical architecture and project management of acceleration efforts for clients including Tier 1 oil companies and investment banks. Oliver holds degrees in electronics and computing from Imperial College London and has co-authored several patents and scientific publications in top conferences in Computer Science, Electrical Engineering, and Geophysics.