Introduction - If you have any usage issues, please Google them yourself
Reverse array (multiblock):
Given the pointer d_a in the input array {a0, A1,..., an-1}, the reverse array {an-1, An-2,..., a0} stored in the pointer d_b
A: starts with the "reverseArray_multiblock" template
B: more than 256 thread block start, to reverse the size of N, N / 256 block array
The first part: the number of calculation to start
The second part: implementation of kernel reverseArrayBlock
Attention, now you must calculate at the same time
The reverse position in a block
Reverse offset to block start