For this lab, you will parallelize a simple n-body simulation using the GPU. An n-body simulation models the movement of n massive objects acting on each other gravitationally. Every pair of stars has some gravitational effect, and to compute the total effect on each star we have to consider all pairs of stars. This results in a lot of repeated computation, which is a good target for parallelization, especially for GPUs where you perform the same task for a large number of input values.
The provided code runs in a single thread on the CPU. Your task is to modify the system to perform the physics calculations on the GPU instead of the CPU, using a single thread for each “star” in the simulation.
Group information is no longer available for this course.
Check back for Q&A on this lab.
Before you start changing things make sure you can run the program. Once the window appears, click a few times to see what this program does. You should also review the code in main.c. You are welcome to look in gui.c and gui.h as well, but these files should not need to change in your CUDA implementation.
The first step in porting this code to run the physics calculations on the GPU is to convert it to a CUDA program.
This should be fairly easy;
rename the two .c files to .cu files and update the Makefile to build with
nvcc instead of
Move your .c files to .cu files with
git mv instead of just
mv so they are still tracked by git.
It’s okay if you forget to do this;
in that case, use
git add to add the .cu files to your repository.
You may want to refer back to the mini-labs on the GPU to see how to invoke
nvcc with all the relevant options.
Make sure you leave the
$(shell sdl2-config ...) lines in the settings for
these set up the required options for SDL2, the library this program uses to display a window and draw graphics.
Once you can build the program with
nvcc, make sure it runs correctly.
You’re still not doing anything on the GPU, but you at least have a solid starting point to begin moving calculations over to the GPU.
Commit your changes, push them to GitHub, and have the instructor or a mentor sign off on this part before you move on.
We will check to make sure your .cu files are checked into the GitHub repository and you’ve used all the required
nvcc options in your updated Makefile.
The important physics calculations in this program begin on line 90 of main.c. This code loops over every star in the system and performs the following tasks:
Then, there is an inner loop that loops over each star to perform the following tasks:
jare the same. If so, skip this iteration (line 97)
jin both dimensions (lines 117 and 118)
DT(lines 121 and 122)
All of these steps rely on the data in the
Add code to create a GPU version of this array and use
cudaMemcpy to transfer the array to the GPU.
Pay close attention to the size of the
this array grows whenever the user clicks to create a new star, so make sure you update the GPU array whenever the size changes.
Unfortunately, there does not seem to be a
cudaRealloc function, so you will have to duplicate that functionality with
Your code should still work after these changes, although you won’t actually be doing any computation on the GPU.
The easiest code to move to the GPU is the update to each star’s position, originally on lines 91 and 92 of main.c.
Remove these two lines from the loop and instead perform this calculation with a GPU kernel.
Instead of looping over each star in the
stars array, you should have one CUDA thread update each star’s position.
Don’t forget to copy the updated star positions back from the GPU after your kernel completes.
cudaDeviceSynchronize() function returns
cudaSuccess if the kernel worked, and an error code if it failed. You may want to look at the CUDA error checking on this stack overflow page for some good approaches to error handling.
At this point your program should still work correctly. Do not move on until you have resolved any bugs in this stage.
Now that you have successfully moved some of the updates to the GPU, move the rest of the physics calculation over to the GPU. Again, don’t forget to copy updated values back from the GPU once your kernel completes.
Now that you have a working CUDA port of this simulation, it’s time to think about performance.
With GPUs, that usually means thinking about data movement.
Odds are you are copying your
stars array back and forth more than necessary.
Think about when arrays need to be moved from the CPU to the GPU and vice versa;
remember that memory on the GPU remains accessible after a kernel finishes, so there’s no need to copy the data to the GPU again if it hasn’t changed on the CPU.
To earn full credit on this lab you must eliminate all unnecessary transfers of the
You do not need to eliminate fields from the
stars struct in your copying;
I will count the number of
cudaMemcpy calls, not the number of bytes transferred.