## Problem 352

**Forum rules**

As your posts will be visible to the general public you

are requested to be thoughtful in not posting anything

that might explicitly give away how to solve a particular problem.

This forum is NOT meant to discuss solution methods for a problem.

In particular don't post any code fragments or results.

Don't start begging others to give partial answers to problems

Don't ask for hints how to solve a problem

Don't start a new topic for a problem if there already exists one

Don't start begging others to give partial answers to problems

Don't ask for hints how to solve a problem

Don't start a new topic for a problem if there already exists one

See also the topics:

Don't post any spoilers

Comments, questions and clarifications about PE problems.

### Problem 352

Hi,

I have a question about the example given in the problem 352. The problem say that, with optimal strategy, the minimum average number of tests needed for 25 sheep is 4.155452. Also, the problem suggests that rules like "Test the mix of all 25 and that will give you about 60.35% of negative (so only one test is needed) and 39.65% positive (26 tests needed, 25 plus 1 already done)" could be used. Now... The "inefficient example" dividing in 5 groups of 5 gave us an average of 7.40198008 tests to chack 25 sheep. Now using this "inefficient" strategy PLUS the strategy of first test all 25 sheep, gives us 1*0.6035 + 7.40198008*0.3965 which is about 3.5384 (less than 4.155452) tests in average. In fact dividing in some other way (which I wont tell here in respect to the disclaim of the forum) one could get even a little less than that. Would´t that be the optimal strategy for this example?

Thanks and sorry for the english...

[]'s

I have a question about the example given in the problem 352. The problem say that, with optimal strategy, the minimum average number of tests needed for 25 sheep is 4.155452. Also, the problem suggests that rules like "Test the mix of all 25 and that will give you about 60.35% of negative (so only one test is needed) and 39.65% positive (26 tests needed, 25 plus 1 already done)" could be used. Now... The "inefficient example" dividing in 5 groups of 5 gave us an average of 7.40198008 tests to chack 25 sheep. Now using this "inefficient" strategy PLUS the strategy of first test all 25 sheep, gives us 1*0.6035 + 7.40198008*0.3965 which is about 3.5384 (less than 4.155452) tests in average. In fact dividing in some other way (which I wont tell here in respect to the disclaim of the forum) one could get even a little less than that. Would´t that be the optimal strategy for this example?

Thanks and sorry for the english...

[]'s

### Re: Problem 352

Isn't that all in the following sentences:

Although the scheme we have just described seems to be very efficient, it can still be improved considerably (always assuming that the test is sufficiently sensitive and that there are no adverse effects caused by mixing different samples). E.g.:

•We may start by running a test on a mixture of all the 25 samples. It can be verified that in about 60.35% of the cases this test will be negative, thus no more tests will be needed. Further testing will only be required for the remaining 39.65% of the cases.

•If we know that at least one animal in a group of 5 is infected and the first 4 individual tests come out negative, there is no need to run a test on the fifth animal (we know that it must be infected).

•We can try a different number of groups / different number of animals in each group, adjusting those numbers at each level so that the total expected number of tests will be minimised.

Your task is to find the best strategy.•We may start by running a test on a mixture of all the 25 samples. It can be verified that in about 60.35% of the cases this test will be negative, thus no more tests will be needed. Further testing will only be required for the remaining 39.65% of the cases.

•If we know that at least one animal in a group of 5 is infected and the first 4 individual tests come out negative, there is no need to run a test on the fifth animal (we know that it must be infected).

•We can try a different number of groups / different number of animals in each group, adjusting those numbers at each level so that the total expected number of tests will be minimised.

### Re: Problem 352

hk - I think you misunderstood; ninguem was saying the stated optimal value of 4.155452 is not correct.

ninguem - your calculations are incorrect. If the first test of all 25 comes back positive, running the first strategy no longer results in the same average value due to the fact you no longer have the chance all sheep are negative.

ninguem - your calculations are incorrect. If the first test of all 25 comes back positive, running the first strategy no longer results in the same average value due to the fact you no longer have the chance all sheep are negative.

### Re: Problem 352

Also, the expression 1*0.6035 + 7.40198008*0.3965 as the expected number of tests was wrong for another reason. It should have been either 1*0.6035 + 8.40198008*0.3965 or 1 + 7.40198008*0.3965. The cost of the first test must be borne whether it is positive or not.TripleM wrote:hk - I think you misunderstood; ninguem was saying the stated optimal value of 4.155452 is not correct.

ninguem - your calculations are incorrect. If the first test of all 25 comes back positive, running the first strategy no longer results in the same average value due to the fact you no longer have the chance all sheep are negative.

_{Jaap's Puzzle Page}

### Re: Problem 352

Thanks for the reply TripleM,

Yes, I see that there would be a difference in the average AFTER the result come out positive, but I guess that would help EVEN more to reduce the number of tests needed (in the case we need to test all 25), since that is a PLUS information, the average would be even LESS than 7.4...

Thank you!

PS.: I´m not very good at problems (as you can see hehehehe) but I like to try a lot!! And, for me, to understand the problems is as fun as to solve then %-)

[]´s

Yes, I see that there would be a difference in the average AFTER the result come out positive, but I guess that would help EVEN more to reduce the number of tests needed (in the case we need to test all 25), since that is a PLUS information, the average would be even LESS than 7.4...

Thank you!

PS.: I´m not very good at problems (as you can see hehehehe) but I like to try a lot!! And, for me, to understand the problems is as fun as to solve then %-)

[]´s

### Re: Problem 352

Thanks also to you jaap,

Sorry, my bad... You are right about 1*0.6035 + 8.40198008*0.3965 being the correct one since the possibilities are 1 or 1+average. But... still that gives us 3.9349 (still less than 4.15). And, as I said to TripleM, it seems to me that, even the average being incorrect AFTER the first test is true, that would reduce even more the overall average ...

Anyway I´m learning a lot here! Thank you all!!

[]'s

Sorry, my bad... You are right about 1*0.6035 + 8.40198008*0.3965 being the correct one since the possibilities are 1 or 1+average. But... still that gives us 3.9349 (still less than 4.15). And, as I said to TripleM, it seems to me that, even the average being incorrect AFTER the first test is true, that would reduce even more the overall average ...

Anyway I´m learning a lot here! Thank you all!!

[]'s

### Re: Problem 352

If the first test (with all 25 samples) came out positive, then that will make the strategy for testing the groups of 5 less efficient. This is because the probability of having all 5 in a batch negative is smaller (you know there are positive ones amongst the 25), so you are more likely to need to test all 5 individually. Therefore the strategy that previously had an average of 7.4 tests in the general case will now (applied only to cases with at least one positive in the 25) have a higher average.ninguem wrote:Yes, I see that there would be a difference in the average AFTER the result come out positive, but I guess that would help EVEN more to reduce the number of tests needed (in the case we need to test all 25), since that is a PLUS information, the average would be even LESS than 7.4...

_{Jaap's Puzzle Page}

### Re: Problem 352

There is is!!

Right jaar, you are right!!! Thank you so much! %-)

Combining that answer with the one by TripleM and hk, I understood the problem!!! Thank you all for the replies!

As I said, I wont try this problem anymore because of those informations I got here (to be fair) my rule is to solve the problems without any help whatsoever (no forums, no books, no consulting, noting...) All the problems I solve up to now (starting from February 2011) I did using formulas derived by myself and the 1 min rule.

Thank you again for the information guys. It is aways awesome to learn from the more experienced !! Thanks!

Right jaar, you are right!!! Thank you so much! %-)

Combining that answer with the one by TripleM and hk, I understood the problem!!! Thank you all for the replies!

As I said, I wont try this problem anymore because of those informations I got here (to be fair) my rule is to solve the problems without any help whatsoever (no forums, no books, no consulting, noting...) All the problems I solve up to now (starting from February 2011) I did using formulas derived by myself and the 1 min rule.

Thank you again for the information guys. It is aways awesome to learn from the more experienced !! Thanks!

### Re: Problem 352

My code makes sense to me and agrees with as far as I can calculate, but does not agree with the 25-sheep value given. My values are WAY too low. ?? Would it be too much to ask for a smaller check? Like say the optimum expected value for 10 sheep or so??

### Re: Problem 352

I'm having a lot of problems with this one. I get the problem, I'm just not sure how 'optimal' the solution is that we're trying to get.

The example says divide into group once. The optimization says do a test on all sheep first. So does the strategy mean we're always testing the whole sample first, and then only once split them up in groups; OR the mention of levels includes a recursive solution, like take 25 sheep, test all, split into 13 and 12, test the 13, split into 7 and 6, test 7, test 6, split 12 into 6 and 6, test 6 and test the last 6?

My problem is that when there's one division into groups I get answers that are much higher than 4.15 (but obviously lower than 7.4), but when I do a full recursive solution, I get an answer that's much lower than 4.15. I can't figure out which one I'm trying to fix. Some clarification would be nice.

The example says divide into group once. The optimization says do a test on all sheep first. So does the strategy mean we're always testing the whole sample first, and then only once split them up in groups; OR the mention of levels includes a recursive solution, like take 25 sheep, test all, split into 13 and 12, test the 13, split into 7 and 6, test 7, test 6, split 12 into 6 and 6, test 6 and test the last 6?

My problem is that when there's one division into groups I get answers that are much higher than 4.15 (but obviously lower than 7.4), but when I do a full recursive solution, I get an answer that's much lower than 4.15. I can't figure out which one I'm trying to fix. Some clarification would be nice.

"Nothing in this world that's worth having comes easy"

### Re: Problem 352

You can perform whatever strategy you like, as long as it satisfies the one restriction mentioned in the problem. The methods described in the problem are solely examples.

### Re: Problem 352

I still keep getting results that are much better than T(25, .02) = 4.15

Maybe I'm thinking about this wrong.

To me when testing 25 sheep at 2% popularity, it's clearly recursive like this:

25 into 16 and 9

16 into 4 x 4

4 into 2 x 2

9 into 3 x 3

3 into 2 and 1

I can't post results, but what I get is much less than 4.15.

Without the cool optimizations like what to do when the sample size doesn't evenly divide into the group size and assuming last positive if all previous negative in a definitely positive sample,

the math goes like this:

1 + (1 - (.98)^25) * ( // 1 group of 25

1 + (1 - (.98)^16) * ( // 1 group of 16

4 + 4 * (1 - (.98)^4) * ( // 4 groups of 4

2 + 2 * (1 - (.98)^2) * (1.02) // 2 groups of 2

)

) +

1 + (1 - (.98)^9) * ( // 1 group of 9

3 + 3 * (1 - (.98)^3) * ( // 3 groups of 3

1 + (1 - .98^2) * (1.02) + // 1 group of 2

1 // and 1 group of 1

)

)

)

which is ~2.5

Note: 1.02 is one test for the first sheep and the 2% chance that we still need to test the second one.

I don't get it. Further optimizing brings the expected number down even more. Where's the error in my logic?

Maybe I'm thinking about this wrong.

To me when testing 25 sheep at 2% popularity, it's clearly recursive like this:

25 into 16 and 9

16 into 4 x 4

4 into 2 x 2

9 into 3 x 3

3 into 2 and 1

I can't post results, but what I get is much less than 4.15.

Without the cool optimizations like what to do when the sample size doesn't evenly divide into the group size and assuming last positive if all previous negative in a definitely positive sample,

the math goes like this:

1 + (1 - (.98)^25) * ( // 1 group of 25

1 + (1 - (.98)^16) * ( // 1 group of 16

4 + 4 * (1 - (.98)^4) * ( // 4 groups of 4

2 + 2 * (1 - (.98)^2) * (1.02) // 2 groups of 2

)

) +

1 + (1 - (.98)^9) * ( // 1 group of 9

3 + 3 * (1 - (.98)^3) * ( // 3 groups of 3

1 + (1 - .98^2) * (1.02) + // 1 group of 2

1 // and 1 group of 1

)

)

)

which is ~2.5

Note: 1.02 is one test for the first sheep and the 2% chance that we still need to test the second one.

I don't get it. Further optimizing brings the expected number down even more. Where's the error in my logic?

"Nothing in this world that's worth having comes easy"

### Re: Problem 352

Your calculations are incorrect for the same reason as discussed earlier in this thread. I'm not really going to say more as this is very bordering on discussing solutions/approaches to the problem.

### Re: Problem 352

Ah... Thanks TripleM!

So you're saying before the first test it's just a Binomial(p = .02) and with the new information the distribution gets adjusted because PDF(0) = 0

To all:

I just have no clue how to adjust it. I mean I can plot it in R, but lack the knowledge to express it in a computable way.

Any hint is appreciated.

P.S. I want the posterior so I can compute the equivalent p and do a recursive call.

Edit: The posterior p has a formula. It's not very intuitive, but it is simple.

So you're saying before the first test it's just a Binomial(p = .02) and with the new information the distribution gets adjusted because PDF(0) = 0

To all:

I just have no clue how to adjust it. I mean I can plot it in R, but lack the knowledge to express it in a computable way.

Any hint is appreciated.

P.S. I want the posterior so I can compute the equivalent p and do a recursive call.

Edit: The posterior p has a formula. It's not very intuitive, but it is simple.

Last edited by mctrafik on Tue Oct 11, 2011 8:24 pm, edited 1 time in total.

"Nothing in this world that's worth having comes easy"

### Re: Problem 352

Well this is fun. First problem I've sat down to tackle in a while and I've got a great algorithm going that I know is correct (both T(25,0.02) and T(25,0.10) are spot on the nose, and tactically it makes perfect sense). The only problem is that when I do the full problem, I get an invalid answer. When I run the same exact code on a different machine, I get different values (margin of error around +-12)! Turns out massive rounding errors in doubles are a pain in the butt to fix. Time to figure a way to simplify things to make it numerically stable.

Edit: Oops. Looks like my concurrency wasn't correctly locking the critical section. Works now!

Edit: Oops. Looks like my concurrency wasn't correctly locking the critical section. Works now!

ex ~100%'er... until the gf came along.

### Re: Problem 352

Finally managed to get the correct values for the examples given in the problem. Now I have to figure out how to scale it for s = 10000.

puzzle is a euphemism for lack of clarity

### Re: Problem 352

Still can't get the initial values.

I must be doing something wrong.

I keep getting T(25, .02) = 4.12... and T(25, .10) = 12.39...

My thinking goes I only need to take care of two possibilities:

1 - Where we don't know if any are infected *uniform*

2 - Where we know at least one is infected

The first case is straight forward, but for the second case I literally go through all the possibilities: WhatIfOneIsInfected * ChanceOnlyOneInfected ... and sum them up.

But.. not only I highly doubt this method will scale to 10k I get wrong answers. Is there something wrong with my thinking, or should I look for subtle bugs in code?

P.S. This problem is totaling more than 25 hours already... sigh.

Edit: 50 hours into this problem. Turns out Sum WhatIfKInfected * ChanceOnlyEInfected has a closed solution, and I wasn't getting the values I expected because of a bug in the code. Still haven't solved it though.

I must be doing something wrong.

I keep getting T(25, .02) = 4.12... and T(25, .10) = 12.39...

My thinking goes I only need to take care of two possibilities:

1 - Where we don't know if any are infected *uniform*

2 - Where we know at least one is infected

The first case is straight forward, but for the second case I literally go through all the possibilities: WhatIfOneIsInfected * ChanceOnlyOneInfected ... and sum them up.

But.. not only I highly doubt this method will scale to 10k I get wrong answers. Is there something wrong with my thinking, or should I look for subtle bugs in code?

P.S. This problem is totaling more than 25 hours already... sigh.

Edit: 50 hours into this problem. Turns out Sum WhatIfKInfected * ChanceOnlyEInfected has a closed solution, and I wasn't getting the values I expected because of a bug in the code. Still haven't solved it though.

Last edited by mctrafik on Tue Oct 11, 2011 8:26 pm, edited 1 time in total.

"Nothing in this world that's worth having comes easy"

### Re: Problem 352

I don't think it's a buster of a clue, but the second case is what threw me off for a long time. Turns out I was calculating it wrong -- there's actually a direct formula for calculating it. Can't give any more hints than that. Keep trying at it & good luck!mctrafik wrote:My thinking goes I only need to take care of two possibilities:

1 - Where we don't know if any are infected *uniform*

2 - Where we know at least one is infected

The first case is straight forward, but for the second case I literally go through all the possibilities: WhatIfOneIsInfected * ChanceOnlyOneInfected ... and sum them up.

But.. not only I highly doubt this method will scale to 10k I get wrong answers. Is there something wrong with my thinking, or should I look for subtle bugs in code?

ex ~100%'er... until the gf came along.

### Re: Problem 352

Thank you

<-- doubting everything at this point.

Sorry for the stupid question though, but what do you mean by "it"? The expected value? Posterior probability of any single one being infected? I have a formula, too. It's just got capital sigmas and pis in it. Also, isn't a direct formula impossible when every call is recursive?quilan wrote: the second case is what threw me off ... there's actually a direct formula for calculating it.

<-- doubting everything at this point.

"Nothing in this world that's worth having comes easy"

### Re: Problem 352

The second case you listed above -- the probability of a subgroup being/not-being infected, knowing that the group itself tested positive. It can be done by a very direct calculation which wasn't the one I originally thought. That threw off all of my values for the longest time even though I had the right idea.mctrafik wrote:Thank you

Sorry for the stupid question though, but what do you mean by "it"? The expected value? Posterior probability of any single one being infected? I have a formula, too. It's just got capital sigmas and pis in it. Also, isn't a direct formula impossible when every call is recursive?quilan wrote: the second case is what threw me off ... there's actually a direct formula for calculating it.

<-- doubting everything at this point.

ex ~100%'er... until the gf came along.