You said that each thread has its own seed. It means each thread will have its own set of random numbers. So with your kernel, uniformity will be achieved *by thread-basis*, and to get the uniformly generated numbers you'll have to call the above kernel many times. However, if your intention is to call this kernel only once and to expect `n`

uniformly distributed values, each of the `n`

threads should have **the same seed value**, with **different sequence or offset values**. For detailed information with example code, you can look at **this article**.

Also, there's some subtlety with the return value of `curand_uniform`

. From the cuRAND documentation ยง3.1.4, it says:

```
__device__ float
curand_uniform (curandState_t *state)
```

This function returns a sequence of pseudorandom floats uniformly distributed between 0.0 and 1.0. It may return from 0.0 to 1.0, where **1.0 is included and 0.0 is excluded**.

And your code is:

`unsigned int x = curand_uniform(&localState) * n;`

Casting to an integer type is truncating towards zero(**link**). So theoretically, you will get `n`

only if the value returned by `curand_uniform`

is `1.0`

(which is a rare case). However, you'll get `k, (0 < k < n)`

when the value returned by `curand_uniform`

(I'll denote this return value as `y`

) is `k/n <= y < (k+1)/n`

, and you'll get `0`

if `0 < y < 1/n`

. So the generated numbers are not uniformly distributed from integers `0`

to `n`

.

But note that this is all theoretical explanation. I've **posted some sample code**. It simply does integer casting as your code and builds histogram to see how the numbers are distributed. I've posted the output of the program at the bottom, and you'll see that the histogram *looks like* uniformly distributed from integers `0`

to `n-1`

. The case where `curand_uniform`

returning `1.0`

seems to be a really rare case (not appearing for `100000`

trials!).

Theoretically, you'll get uniformly distributed integers **from **`1`

to `n`

if you use ceiling function, since if `(k-1)/n < y <= k/n`

, then `ceilf(y * n)`

will be `k`

for `1 <= k <= n`

.

`unsigned int x = ceilf(curand_uniform(&localState) * n);`

The **code I posted** also covers the above case. You can run it and see a result similar to this (`n = 10`

in this case):

```
Histogram for the generated random numbers (with casting) 0 : 9993 1 : 9926 2 : 10138 3 : 10131 4 : 9980 5 : 9967 6 : 9979 7 : 10054 8 : 9921 9 : 9911
10 : 0
Histogram for the generated random numbers (with ceiling) 0 : 0 1 : 10036 2 : 10033 3 : 9899 4 : 9952 5 : 9960 6 : 9892 7 : 10167 8 : 9839 9 : 10198
10 : 10024
```

Note also that `state[id] = localState;`

is redundant in your code.