Introduction - If you have any usage issues, please Google them yourself
Start the kernel -- start with the myFirstKernel template.
Part1: use pointer d_a to allocate device memory for the results of the kernel.
Part2: use the 1-D 1-D grid to configure and start the kernel
Thread block.
Part3: let each thread set an element of d_a, as shown below:
IDX = blockIdx.x * blockDim.x + threadIdx.x d_a = [idx] = 1000 * blockIdx.x + threadIdx.x
Part4: copy the results from d_a back to the host pointer h_a.
Part5: is the validation correct?.