Parallel Computing Workshop, Day 4
Today we dove into the world of high-performance parallel computing using GPGPUs. It turns out that graphics processing units are tremendously fast "accelerators" for certain types of problem - that is, fine-grained, massively-parallel tasks using hundreds or even thousands of threads. Computer scientists have been playing with this idea for a while using the then-existing graphics APIs like OpenGL (which meant that one had to do quite a bit of work to recast a scientific-computing problem into a graphics-processing problem) but now Nvidia has embraced this with CUDA, a hardware standard that allows their GPUs to be used more easily for scientific computing and with libraries for popular programming languages (like C!) to use those GPUs.
Your computer's CPU is relatively small, requires lots of power, and spends most of its real estate on hardware that figures out the best order in which to execute instructions; the actual processor is only ten percent or so of the chip. On the other hand, GPUs are larger, consume less power (Watts per GFLOP is a big deal in supercomputing), are mostly devoted to processing and can run thousands of threads with ease. The CPU has a relatively modest number of computational cores and can run only a few threads with GFLOP performance but can handle pretty much any task you throw at it; the GPU can process only certain types of problems but completes them at TFLOP speed. The combination is a computational powerhouse.
Here's the results of a sample code that we ran this morning: multiplication of two large matrices. Processing time was ten minutes on a single core, a third of a second with a "naive" CUDA program, and 50 milliseconds with optimized CUDA. That's a performance increase of eight orders of magnitude. Using a $200 video card!
Let it be known that Southern Man's next new PC (to be built sometime this fall, budget depending) will include the best Nvidia CUDA card he can afford.
0 Comments:
Post a Comment
<< Home