This summer I am working with three talented undergraduates as part of a National Science Foundation (NSF) program called Research Experience for Undergraduates (REU): Alex Fuerst (Xavier University), Charlie Kazer (Swarthmore College), and Billy Hoffman (Salisbury University). This post continues to present their work in performing parallel processing with QGIS.
Today I want to continue discussing our project to create a QGIS plug-in for terrain analysis. In my previous post, I discussed our goals and some early results. Our first cut showed that pyCUDA was running about as fast as a serial C++ version of our code. And, at that time, QGIS was still about as fast as our own serial C++ code and our pyCUDA code when generating slope.
My students decided to rewrite the code to create a scheduler to manage the I/O, CPU and GPU activities. The schedule is divided into three modules:
dataLoader – The dataLoader takes in an ASCII GeoTIFFinput file and sends data one line at a time to a buffer it shares with gpuCalc.
gpuCalc – The gpuCalc grabs data from the shared buffer with dataLoader and moves it into CUDA pagelocked memory. When the memory is full, the program transfers the data to the GPU and executes the CUDA kernel for the calculation. When the calculation completes, the program writes the results to a second buffer it shares with dataSaver.
dataSaver – dataSaver gets the data from shared buffer in gpuCalc and writes the results to an ASCII file GeoTIFF.
An example of the architecture we are using is shown below:
The results indicated that the GPU was running faster for slope, but still not enough to make me too happy. It was about 25% faster, and I wasn’t going to be satisfied until we were an order of magnitude faster.
We tried running our algorithms with ESRI ASCII files because it was easier to read than a GeoTIFF. Unfortunately, the input time to read the file was horrendous (like 15 minutes!). So, we spent a little time writing the algorithm to work with GeoTIFFs (much thanks to some generous souls on gis.stackechange.com who helped Alex figure out the GeoTiff read/write), and found them to run substantially faster. Also, we decided to run the Hillshade algorithm which includes many more computations than a simple slope or aspect. In this case, the results are shown below.
We had a breakthrough with PyCUDA, so I want to wait a another day or so to rewrite the serial version in C++, but for now, we’ll use QGIS and the terrain plugin to calculate the hillshade on a 8.3GB raster file (1.5GB as a GeoTiff):
PyCUDA QGIS Input 8:30 2:30 Output 3:48 5:00 Computation 5:09 9:00 TOTAL TIME* 7:03 15:30
We’ve placed a version of the code up on gitHub here. I hope you get a chance to try it out, and maybe even collaboratively help us make this a legitimate plug-in for QGIS.
* the PyCUDA total time is not the sum of its parts because as we are using multi-threading in our code so that while we are reading data in, we are also writing data out in another thread, and also performing the computations using another thread.