|
GPU Computing The HistoryGraphics processing units (GPUs) are fundamentally accelerators for displaying interactive three-dimensional graphics, in a process known as rendering. Take a geometrical description of the surfaces comprising a scene, together with a model of their physical properties, and another for a camera: computers can then essentially perform a light transport and scattering simulation to produce an approximate image based on this model. There are numerous ways to represent the geometry, to approximate the surface interaction physics, and to handle to light transport. Light can be approximated by a six-dimensional phase space density function, which in the absence of interacting volumetric media will have an invariant (the intensity or radiance) along vacuum rays, reducing the transport problem to five dimensions. Still, this is a is a hard problem to solve even for static images, let alone for interactive or real-time graphics. The traditional graphics pipeline simplifies the problem to a scattering process which involves rendering finitely sampled (rasterized) triangle projections to an image buffer, and then determining local lighting conditions at each sample fragment with the help of simple shading models and specific light sources. The computational operations involved in these calculations are transforms on four-dimensional vectors to project the triangle vertices, and local calculations per fragment to determine lighting. Since many of these calculations are very similar, parallelism can be employed to perform then in less time. This is important for any rendering problem, but it is essential for real-time graphics where each frame needs to be calculated in 30 milliseconds or less. GPUs, therefore, are throughput-oriented processing architectures with many (hundreds or thousands) computational cores to perform transformation and shading in parallel. First-generation GPUs were mostly non-programmable signal processors which implemented parts of the graphics pipeline with fixed hardware operations. This is limiting in many situations, in particular because a very simple surface model (the Blinn-Phong model) was employed. Increasing demands on photorealism have led to the development of programmable graphics processing units in subsequent years, e.g. the GeForce 3 by NVIDIA in 2001. GPU ComputingPresent-day GPUs are nothing like the fixed-function graphics coprocessors from before 2000. Instead, they are throughput-optimized, massively parallel SIMD machines which can handle millions of threads simultaneously. Because of different optimization points for GPUs as compared to CPUs, GPUs offer much higher peak performance and memory bandwidth than CPUs, and are therefore an interesting target for high-performance computing. While programmable shaders (short programs to perform transformations and lighting calculations) have been available for a while, a number of restrictions in their performance profile have limited GPU-based computing to experiments for a number of years. However, as more general programming models like NVIDIA CUDA and the Khronos group's OpenCL became available, GPUs suddenly presented an attractive alternative to production-level scientific computing. It is interesting to note that advanced treatment of three-dimensional graphics, the reason GPUs were originally designed, has proven to be a challenging problem to port to GPU computing, even so many scientific simulation applications have enjoyed substantial runtime speedups (sometimes 50 or more). Part of the reason is that high-dimensional problems like global light simulation lend themselves to a Monte Carlo approach, which tends to produce thread decoherence when following different ray paths. Cache use and peak memory bandwidth on a GPU depends on coalesced data access, and therefore some sorting methods need to be considered for higher speedups. My GPU Computing ActivitiesMy own research with GPUs began in 2008 in the context of an experiment: I wanted to see how much I could accelerate a solution of Einstein's field equations of general relativity using the G80 architecture available at that time. This experiment was surprisingly successful, and the simulation code on CUDA performed up to 26 times faster than a reference implementation in C. In 2011, I have finished porting an evolution code to solve general relativistic magnetohydrodynamics, which is a representation of magnetized fluid flow in general spacetimes, to GPUs using CUDA. The speedups are indeed impressive: compared to a quad-core CPU running an MPI-parallel simulation, the GPU code is over 50 times faster. This has opened entirely new possibilities for simulations of neutron stars and magnetars, another activity I am involved in. The Horizon code is also able to run on clusters of GPUs: therefore, as more and more GPU installations are coming online, it can take advantage of these increased computational resources, and very high-quality fluid dynamics applications become possible. |
|