KWrite

Rational Evolution

March 13th, 2010

“Bibles” in High Performance Computing

This is the continuation of my series “Bibles” in Applied Math. I write a new page for this section because it is a bit off from what most people call “Applied Math”. Further, this topic deserves a blog entry, or even a forum, on its own because being able to make your code run 10-15% faster (or even 90% faster if you haven’t mastered certain level of the art of programming) is a big deal nowadays. It can even land you a lucrative job (here’s an example).

Suppose that you’re a master of Numerical Algorithms. You’ve spent 2 years working on a complex engineering problem and recently have developed a new algorithm that can solve the problem in linear time instead of the state-of-the-art O\left(N\log N)\right). Your new solver (and the 2 years) would be wasted if it is poorly written and/or poorly compiled. Here are a couple out of thousands of common poor programming practices.

1. Computing the same thing over and over again

for (int k=0;k<100;k++) {
 out[k] = 0;
 for (int i=0;i<M;i++)
  for (int j=0;j<N;j++)
   out[k] += kernel(i,j) * in[k];
}

In this example, kernel(i,j) is re-computed 99 times. Because this matrix Kernel is independent from the outer loop, it can and most of the times should be pre-computed. If the computation of kernel is computationally intensive, which is common, pre-computation of the matrix can make your program run 99.9 times faster.

2. Using the right flags when compiling

[zer0ne@ion]$ g++ myProgram.cpp -o myProgram
[zer0ne@ion]$ ./myProgram
Total time: 68.74 seconds
[zer0ne@ion]$ g++ -O3 myProgram.cpp -o myProgram
[zer0ne@ion]$ ./myProgram
Total time: 13.9 seconds
[zer0ne@ion]$ g++ -O3 -DNDEBUG myProgram.cpp -o myProgram
[zer0ne@ion]$ ./myProgram
Total time: 11.54 seconds

So, if you’re already good at designing algorithms, it’s worth to learn a bit of performance optimization tricks. I am no expert in HPC, let alone related fields such as Computer Architectures or Compilers, so my best bet is to point you to some great books out there. Also for the same reason, I’ll appreciate if you share your own tips.

  1. Performance Optimization of Numerically Intensive Codes by Hoisie. $73 seems high for a 173-page book but this thin bible is worth every penny. It covers all basic stuffs such as CPU architecture, compiler optimization, memory localit, profiling, etc.
  2. High Performance Computing by Kevin Dowd & Charles Severance. Similar contents but this book explains things in greater depth.

March 2nd, 2008

Programming Tips (C++)

Since I’ve been back coding for the last few days, I’d like to save some useful programming tips that I’ve just figured out.

1. Linear Algebra package: use uBlas. uBlas, developped by Boost, is very object-oriented and easy to use. Not only does it support linear algebra, it also nicely supports sparse matrix storage and processing, which is important for iterative methods (when solving PDEs by Finite Difference or Finite Element). Take a look at my little program below, which constructs a sparse matrix when solving a boundary value problem.

Code:

#include <boost/numeric/ublas/vector.hpp>

#include <boost/numeric/ublas/matrix_sparse.hpp>

using namespace std;

namespace ublas = boost::numeric::ublas;const int N = 5000; // number of discretizations on each direction

typedef ublas::vector<double> Vector;

typedef ublas::compressed_matrix<double> Matrix; // Sparse Matrix class

void initializeMatrix(Matrix& A) {

    long k=0;

    for (int i=0; i<N; i++)

        for (int j=0; j<N; j++) {

            A(k,k) = -4;

            // adjacent elements in x-direction

            if (i>0) A(k,k-N)=1;

            if (i<N-1)

                if (i==0) A(k,k+N)=2; // Neumann condition on this side

                else A(k,k+N)=1; //Dirichlet condition on this side

            // adjacent elements in y-direction

            if (j>0) A(k,k-1)=1;

            if (j<N-1)

                if (j==0) A(k,k+1)=2; // Neumann condition

                else A(k,k+1)=1; // Dirichlet condition

k++;

        }

}

Drawbacks of uBlas

  • It has poor performance compared to (atlas-)Blas. One reason is that it is the most abstract linear algebra library (easier to use). Another critical reason is that it performs many *useful* (debugging) checks. The performance issue can be overcome through some optimization tricks, such as one that I described in the compiler section. Check out the effective uBlas wiki page for more performance tips. If performance is critical, consider binding with Atlas.
  • Poor documentation: if you’re used to Java’s API (which is terrific) as I was, using uBlas as first may be difficult because uBlas’s documentation is, now, quite short and lacks many details. The developers are working hard on it though. Moreover, they also maintain a main mailing list (and some other satellite sites) where you can ask questions.

2. Quant Finance Library: use QuantLib, the major open-source library

3. Compiler: when finished with your programming/debugging, consider using -O3 and -DNDEBUG for a much better performance (thanks to Tuyen). Most of the time, it optimizes the program significantly. Below is an illustration.

Code:

[khoa@trinidad]$ g++ myProg.cpp -o myProg

[khoa@trinidad]$ ./myProg

Total time: 68.74 seconds

[khoa@trinidad]$ g++ -DNDEBUG myProg.cpp -o myProg

[khoa@trinidad]$ ./myProg

Total time: 49.43 seconds

[khoa@trinidad courses]$ g++ -O3 myProg.cpp -o myProg

[khoa@trinidad courses]$ ./myProg

Total time: 13.9 seconds

[khoa@trinidad]$ g++ -O3 -DNDEBUG myProg.cpp -o myProg

[khoa@trinidad]$ ./myProg

Total time: 11.54 seconds

|