HANDS ON | thepowerofthegpu

Coding With the GPU

CUDA

With the amazing power granted to users by GPUs today, one may wonder if they can use it for more than just rendering video, and gaming. Some people, like programmers, may want to unlock the power of the GPU to run their programs. Todays programs are getting larger and more detailed. A way to deal with these complexities, is through the GPU. Here we have two examples of code. One of them is run just using the CPU and the other the GPU. The large array of data is parsed using Nvidia's CUDA, which allows implementation of the GPU, while the other is run without CUDA, forcing the CPU to do all of the work. With the executables, we are able to see the differences in processing power based on their run times.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <math.h>
#include <cuda.h>
#include <cuda_device_runtime_api.h>
using namespace std;
// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
int index = threadIdx.x;
int stride = blockDim.x;
for (int i = index; i < n; i += stride)
{
y[i] = x[i] + y[i];
}
}

int main(void)
{
float elapsed = 0;
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start,0);

int N = 1<<20;
float *x, *y;

// Allocate Unified Memory – accessible from CPU or GPU
cudaMallocManaged(&x, N * sizeof(float));
cudaMallocManaged(&y, N * sizeof(float));

for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}

// Run kernel on 1M elements on the GPU
add << <1, 256>> >(N, x, y);

// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();

// Check for errors (all values should be 3.0f)
float maxError = 0.0f;

for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i] - 3.0f));
cout << "Max errors: " << maxError << endl;
cudaEventRecord(stop,0);
cudaEventSynchronize(stop);

cudaEventElapsedTime(&elapsed, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
printf("The elapsed time in gpu was %.2f ms\ ", elapsed); cout << endl;
// Free memory
cudaFree(x);
cudaFree(y);

system("pause");
return 0;
}

C++

#include <iostream>
#include <math.h>
#include <ctime>
using namespace std;

void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
{
y[i] = x[i] + y[i];
}
}

int main()
{
double start_s = clock();
int N = 1 << 20;
float *x = new float[N];
float *y = new float[N];

for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}

add(N, x, y);

float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i] - 3.0f));
cout << "Max Errors: " << maxError << endl;
double stop_s = clock();
cout << "time (ms): " << (stop_s - start_s) / double(CLOCKS_PER_SEC) * 1000 << endl;
delete[] x;
delete[] y;

system("pause");

return 0;
}

THE GPU

The Billion Cycles Per Second Processor

Coding With the GPU

CUDA

C++

EMAIL US