Introduction - If you have any usage issues, please Google them yourself
Reverse array (single block) - the input array {a0, A1,..., an-1} in the given pointer d_a, the reverse array {an-1, An-2,..., a0}, stored in the pointer d_b
A: starts with the "reverseArray_singleblock" template
B: has only one thread block to start to reverse an array of sizes
N = numThreads = 256 elements
C: first parts (1 altogether): all you need to do is implement the kernel's reverseArrayBlock ()"
D: each thread moves a single element in the opposite position, reads input from the d_a index, and outputs the output in the opposite position in the d_b index