I'm looking for a statistical technique to determine whether two sampled pdfs are the same. I know I'm not going to get an absolute answer, but I'd like to get some kind of measure of confidence.

This is for a software regression test. The software runs a Monte-Carlo simulation, meaning that there's a random number generator used, so the results are different for each iteration. The software records 9 double-precision output variables after each iteration. The output variables are not independent, but for this purpose we should probably treat them as such. The numbers are almost always between 5 and 6. Each iteration takes a little over a minute, so 500 iterations is a solid overnight job.

The boss's suggestion was to compare the mean and standard deviation of each of the 9 output variables. If they match closely, they pass the test. The problem is if I set the bounds tight enough to be meaningful, I get a failure on the standard deviation of this variable:

Each of the red and green lines represents a group of 500 runs, put through GNU Plot's "smooth frequency" algorithm. The only difference between the two lines is the initial random seed. Both came from the same version of the software, so I know for a fact that it hasn't regressed and whatever test I put in place must pass. You can see that they look like the same pdf, but the standard deviations differ by more than 10% (0.0972 vs. 0.0864).

Suppose I sort all 1000 numbers, then go through and count reds and greens. If they are from the same pdf, the difference between red_count and green_count should stay relatively small, right? How small?

Are higher statistical moments worth considering? (skew, kurtosis, etc.)

Any other ideas?