I have over 6 years of experience in developing C/C++ software applications for high performance computing. I am well versed in using OpenMP for shared memory programming as well as in using MPI for distributed computing. I also have experience in developing CUDA kernels for GPU computing.
I can also benchmark and profile applications to figure out bottlenecks and suggest optimizations