ECE 538 Term Paper Proposal
Thursday, September 29th, 02005We need to write a term paper for my class in advanced computer architecture. Here's a short description of the requirements:
Due in December, the paper should be 15-20 pages long (double spaced) and include an in depth analysis of some topic related to the things we discuss during the semester. That is, select a concept, machine, technique, or mechanism; then, state whether it is good or not, defending your position with judicious use of appropriate metrics.
Further guidance:
- A topic we're interested in
- Understand what people have done (search/research)
- After search, you have an idea what's good and what's not good
- Propose something - make a coherent analysis of what we've decided to work
on with the chosen metric - Conclusion that defends use and choice of metric and makes a conclusion,
based on the metric
Here's my proposal:
An analysis of the SIMD capabilties found in modern commodity CPUs, with an emphasis on their application in HPC and the HPL benchmark
The proposed paper will examine the Single Instruction Multiple Data (SIMD) extensions found in modern commodity CPUs. An emphasis will be placed on the use of these features in floating point-intensive High Performance Computing (HPC) applications. Of specific interest is the application of SIMD features in the High Performance Linpack (HPL) benchmark and the BLAS DGEMM matrix-matrix multiply routine, which is the main math kernel used by HPL.
SIMD techniques compared will include the Pentium III's SSE extensions, Pentium 4's SSE2 and SSE3 extensions, and the PPC 970's AltiVec extensions.
The theoretical peak floating point operations per second (Flop/s) achievable with HPL will be derived for each CPU architecture and SIMD extension examined, by considering the DGEMM algorithm, and the details of each SIMD feature, along with overall CPU architecture, including pipelining and functional unit organization.
An experimental validation of theoretical values will be conducted by running the HPL benchmark on all the investigated platforms. Runs will be conducted with and without DGEMM routines that are optimized for each CPU's specific SIMD features. That is, the HPL performance of each CPU will be examined with and without the use of its SIMD extension.
Finally, a change to one of the CPUs analyzed that could double its performance in the HPL benchmark will be proposed. The architectural changes required will be explained and the theoretical speedup will be calculated.
notes







