Parallel Processing
In the past when someone needed to solve bigger problems in less time they would often have to wait until a faster computer was available. If the problem was simple, and only required the resources of a single processor they could wait until manufacturers abiding by Moore’s law created a larger personal computer. If the problem was larger and required many more compute resources they would have to procure the services of proprietary supercomputers. Recently multi-core processors, Commodity-Off-The-Shelf (COTS) hardware, open source operating systems, and inexpensive, high bandwidth local area networks have allowed systems designers to create various forms of parallel processing machines that are now available to almost anyone.
Some Details
Ultimately parallel computing is an attempt to maximize the infinite but seemingly scarce commodity called time. However, until recently virtually all computers have followed a common machine model known as the von Neumann computer. A von Neumann computer uses the stored-program concept1. A central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and then sequentially performs them. Conversely, parallel computing is the simultaneous use of multiple computing resources to solve a computational problem. In this sense a problem is broken into discrete sequential parts that can be solved concurrently on different CPUs. One of the latest facets on parallel computing has been the advent of the multicore processor. Modern multi-core designs was borne out of necessity because monolithic processor were reaching their practical limitations2. Issues such as heat and power consumption required chip engineers to consider new designs. Multi-core processors combine two or more independent processing cores into a single package composed of a single integrated circuit (IC)3. These new processors provide significant advantages to monolithic processors such as leveraging previous symmetrical multiprocessing systems (SMP) knowledge in their design, more effective cache coherency through higher clock rates, less inter-processor communication signal degradation because of proximity, and more processing power while using less power and generating less heat. On the opposite end of the parallel computing spectrum4 are clusters of computers organized to perform parallel computing. Open source software and commodity hardware have fostered the development of parallel processing in the form of multi-computer architectures epitomized by the Beowulf cluster design5. Clusters of computers are typically used for High Availability (HA) or High Performance Computing (HPC) to provide greater computational power than a single computer. Beowulf Clusters for instance are scalable performance clusters based on commodity hardware, on a private system network, with open source software infrastructure6. The designer can improve performance proportionally with each added machine. The commodity hardware can be any of a number of mass-market, stand-alone compute nodes as simple as two networked computers each running Linux and sharing a file system or as complex as 1024 nodes with a high-speed, low-latency network7. In addition to utilizing inexpensive hardware, the open source operating system platform allows the designers to perform sophisticated kernel enhancements. These enhancements can consist of simple kernel tweaks to the scheduler to major augmentations that facilitate comprehensive clustering solutions offering a full, highly available Single System Image (SSI) environment8. As with commodity microprocessors and open source operating systems, Ethernet and its function as an inter-processor communication mechanism is playing an ever-increasing role cluster computing. Gigabit Ethernet with its high bandwidth and low latency is readily accessible and performs well when coupled with clusters of commodity workstations9. These developments have had a profound impact on the HPC community. As of 2007 the vast majority of supercomputers fall into the class of cluster computing utilizing the Linux operating system10. With the predominance of such systems, architectures that facilitate parallel programming have evolved as well. Although there are a number of schemes to classify parallel processing systems, two programming models that mirror the system architectures mentioned above have emerged: shared-memory and distributed-memory11. With a shared-memory multiprocessor, different processors can access the same variables. This makes referencing data stored in memory similar to traditional single-processor programs, but adds the complexity of shared data integrity. A distributed-memory system introduces a different problem: how to distribute a computational task to multiple processors with distinct memory spaces and reassemble the results from each processor into one solution. To aid in distributed software development, two standards have materialized. The sharedmemory architectural standard, OpenMP provides parallelization mechanisms on sharedmemory multiprocessors12. The standard provides a specification of compiler directives, library routines, and environment variables that control the parallelization and runtime characteristics of a program. Because OpenMP is a threaded paradigm, there are several issues to contend with, namely balancing processor workloads and the synchronization of multiple threads. For the distributed-memory multiprocessor environment the Message Passing Interface (MPI) Standard is the most prominent13. As an example when an MPI program starts, the program spawns into the number of processes as specified by the user. Each process runs and communicates with other instances of the program, possibly running on the same processor or different processors. The greatest computational speedup will occur when processes are distributed among processors. Basic communication consists of sending and receiving data from one process to another over a high-speed network. Similar to OpenMP distributed-memory programming presents its own challenges such as efficient and effective inter-process communication and synchronization mechanisms. These new programming paradigms are significant to the point that they now affect on how computer programming is being taught14. In conclusion, fueled by the need for more computational power, monolithic systems are being replaced by parallel architectures. Numerous shifts in hardware and software progression are contributing to these changes. Not only are conventional, lowcost home computing devices being affected but also it is now possible to achieve supercomputing processing power by interconnecting large numbers of commodity microprocessors through an inexpensive interconnected network. These new parallel systems are impacting software development as well. They are not only forcing the creation of new programming paradigms but also changing the way problem solving and computer programming is being taught in school.
[1] L. C. Center, "Introduction to Parallel Computing," https://computing.llnl.gov/tutorials/parallel_comp/, 2008.
[2] G. M. Amdahl, "Validity of the single-processor approach to achieving large scale computing capabilities," In AFIPS Conference Proceedings, vol. 30, pp. 483-485, 1967.
[3] D. Geer, "Chip Makers Turn to Multicore Processors," Computer - IEEE Computer Society, vol. May, 2005.
[4] M. Flynn, "Some Computer Organizations and Their Effectiveness," IEEE Transactions. Computation, vol. C-21, 1972.
[5] T. L. Serling, J. Salmon, D. J. Becker, and D. Savarese, "How to build a Beowulf," MIT Press, 1999.
[6] C. T. Yang, C. S. Liao, and K. C. Li, "On construction of a large computing farm using multiple Linux PC clusters," in Parallel and Distributed Computing: Applications and Technologies, Proceedings. vol. 3320, 2004, pp. 856-859.
[7] Beowulf.org, "Beowulf Clusters," http://www.beowulf.org/, 2008.
[8] R. Lottiaux, P. Gallard, G. Vallee, C. Morin, and B. A. Boissinot, "OpenMosix, OpenSSI and Kerrighed: a comparative study," in Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on, 2005, pp. 1016-1023 Vol. 2.
[9] J. Mache, "An assessment of Gigabit Ethernet as cluster interconnect," in Cluster Computing, 1999. Proceedings. 1st IEEE Computer Society International Workshop on, 1999, pp. 36-42.
[10] Top500, "500 Fastest Computer Systems," http://top500.org/, 2007.
[11] C. Quammen, "Introduction to Programming Shared-Memory and Distributed-Memory Parallel Computers," ACM Crossroads, vol. 8, 2002
[12] OpenMP, "The OpenMP Specification for Parallel Programming," http://www.openmp.org/blog/, 2007.
[13] A. N. Laboratory, "The Message Passing Interface (MPI)," http://wwwunix.mcs.anl.gov/mpi/, 1997.
[14] Intel, "Intel Multi-Core University Initiative charges ahead," http://softwarecommunity.intel.com/articles/eng/1606.htm, 2008.