On successful completion of the curricular unit, students will be able to demonstrate the following knowledges, capacities and skills: -to characterize and evaluate in qualitative and quantitative terms the architectures of parallel / distributed computer systems - to analyze, measure and evaluate the performance of computer systems in the execution of applications - to develop and/or modify computer applications aiming to improve performance and scalability.
1. Recent evolution of processor architecture: pipeline, superscalarity, multi-threading, multi-core, many-core; memory hierarchy in a shared environment 2. Parallelism evolution in computer architecture: vector computing, shared and distributed memory computers, networked architectures; performance analysis in MPP systems; heterogeneous platforms, with a focus on GPU-based 3. Performance evaluation in application execution: metrics, profiling, analysis and code tuning (focus on iterative cycles, matrix operations and function calls, data locality), benchmarking
Teaching methodologies: - lectures with presentation of key concepts with examples - class discussion of scientific papers - lectures/talks by computational science/engineering researchers - lab classes to analyse and solve case studies, with later open presentation and discussion - participation in an international research internship, at Univ. Texas at Austin Evaluation elements: - a written test with several development or problem-solving questions (weight: ~40%) - written essays on complementar subjects (weight: ~20%) - practical lab workss, including report writing and oral discussion of results (weight: ~40%).
• David Patterson, John Hennessy, Computer Architecture. A Quantitative Approach, 5th Ed., Morgan Kaufmann, 2011 • David Kirk and Wen-mei Hwu, Programming Massively Parallel Processors, A Hands-on Approach, Morgan Kaufmann, 2010
A student who successfully complete this UC must be able to demonstrate that it has acquired the following competences: • To understand the theoretical foundations to support the algorithms analyzed • Identify the different levels (complexity, robustness, efficiency, scalability, etc.) strengths and weaknesses of the algorithms analyzed • Develop sequential and parallel implementations and discuss the results of performance and efficiency obtained with the algorithms analyzed • Identify, in parallel algorithms, possible problems of load balancing and / or high cost of communication between the computing elements
• search and sort algorithms: mergesort, quicksort and bitonic; parallel search techniques: depth-first and breadth-first • Algorithms for parallel heterogeneous environments: matrix multiplication, linear systems, eigenvalues, dense and sparse matrices, finite element methods and conjugate gradient • Optimization: graphs, and spanning trees, dynamic programming, knapsack problems • Genetic algorithms in parallel optimization problems / modeling.
The UC is taught with a weekly 2H theoretical session and a session laboratory of 1H. In theoretical sessions are taught the fundamentals and concepts. The sessions are primarily theoretical exposition. The laboratory sessions are to consolidate the knowledge acquired in the theoretical sessions, by solving exercises. The various exercises allow students to consolidate the knowledge through the design and implementation of parallel algorithms. The laboratory sessions also serve to clarify doubts. The evaluation is done through practical assignment.
Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, McGraw-Hill Education, 2003 Parallel Metaheuristics: A New Class of Algorithms, Enrique Alba, Wiley, 2005
• Identify the different types of computer systems, including SMP machines, clusters and grids, and discuss their advantages and limitations • Analyze and evaluate the hardware and software with a view to planning and installation of equipment • Identify user requirements necessary to select the software to install and maintain • Identify and characterize management policies and scheduling of work and evaluate the results of their application • Discuss and evaluate the actual performance of sequential and parallel programs on the platforms studied • Identify the bottlenecks of the runtime environment (run-time) to a specific computational load. • Use existing tools, at the level of execution environment, to study strategies for optimization of the resource shared across different platforms in order to minimize a given cost function (eg runtime).
• Introduction to clusters: architecture (terminology, technologies, limitations), equipment (individual components and networking, coordination of decentralized resources) • Linux Cluster computing: characteristics, installation and configuration services, security • Planning and Construction Cluster: mission, and software architecture, parallel file systems, interconnection technologies, cloning and facilities • Managing clusters: models for submission of tasks, monitoring and administration of users and resources, scheduling policies and accounting, data security, performance analysis and tuning • Case studies: development of management strategies.
Presentation of concepts and analysis / discussion of case studies. Working group and individual of exercises to solve practical application of concepts, and evaluating performance of different components, on the binomial hardware / software Presentation of the works, with emphasis on identifying constraints, discussion of results and proposals for extensions / alterations of the solutions. General Method of Evaluation: • individual / group works with reporting : 40% to 60% • written test individual assessment of knowledge acquired: 40% to 60%
Beowulf Cluster Computing with Linux, William Gropp, Ewing Lusk, Thomas Sterling, 2nd Ed., The MIT Press, 2003 High Performance Linux Clusters with Oscar, Rocks, OpenMosix and MPI, Joseph Sloan, O'Reilly Media, Inc., 2004 Computer Systems, A Programmer’s Perspective, R. E. Bryant, D. R. O’Hallaron. Prentice Hall 2011
A student who successfully complete this UC must be able to demonstrate that it has acquired the following competences: • Develop parallel applications capable of efficiently execute on a wide range of architectures • Implement applications using the most common types of parallelism exploitation patterns • Measure and optimize the performance of applications on distributed memory systems • Implement scheduling techniques at application level.
• Programming Models: threads, message passing, distributed objects, workflows • Methodologies for developing parallel applications: partition, communication, aggregation and mapping of tasks and data • Analysis of typical parallelism patterns: pipelining, farming, divide & conquer and heartbeat • Measuring and optimizing the performance of applications on shared memory and distributed, homogeneous and heterogeneous • Languages and tools to support application development.
The UC is taught with a weekly 2H theoretical session and a session laboratory of 1H. In theoretical sessions are taught the fundamentals and concepts. The sessions are primarily theoretical exposition. The laboratory sessions are to consolidate the knowledge acquired in the theoretical sessions, by solving exercises. The various exercises allow students to consolidate the knowledge through the application of knowledge aimed at solving a given problem and / or interpretation of results. The laboratory sessions also serve to clarify doubts. The evaluation is done through practical assignment.
Principles of Parallel Programming, Calvin Lin and Lawrence Snyder, Addison-Wesley, 2009 Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, McGraw-Hill Education, 2003 Designing and Programming Parallel Programs: Concepts and Tools for Parallel Software Engineering, Ian Foster, Addison-Wesley, 1995