Set containing max number of subsets

 Posts: 18
 Joined: Wed Apr 22, 2009 10:46 am
Set containing max number of subsets
Let's say we have the set of numbers [1..100] and some subsets with size<=5.
{1} , {25}, {2,30}, {1,2,3}, {3, 20, 50}, {40,50,60,70}, {1,2,3,4,5}....
Is it some efficient method to find a set of size=20 το contain the most sets?
{1} , {25}, {2,30}, {1,2,3}, {3, 20, 50}, {40,50,60,70}, {1,2,3,4,5}....
Is it some efficient method to find a set of size=20 το contain the most sets?
Re: Set containing max number of subsets
Doesn't have every set of size n have the same number of subsets? (2^n)

 Posts: 18
 Joined: Wed Apr 22, 2009 10:46 am
Re: Set containing max number of subsets
Yes.hk wrote:Doesn't have every set of size n have the same number of subsets? (2^n)
I mean we have some (not all) singles, doubles, triples, 4uples and 5uples
{1} , {25}, {2,30}, {1,2,3}, {3, 20, 50}, {40,50,60,70}, {1,2,3,4,5} ...(they are less than Cnk(100, 1)+Cnk(100, 2)+Cnk(100, 3)+Cnk(100, 4)+Cnk(100, 5)
And I want a set or size=20 that contains the most the above subsets
Re: Set containing max number of subsets
Hard to tell if you don't specify the rules that make a subset a valid subset.
Re: Set containing max number of subsets
I gather that what albert_nik is looking for is the following:
You are given the set {1,2,...,100}, and some subsets of it, of size ≤ 5.
The task is to find a subset of size 20 that contain the most of the given subsets.
It looks similar to a knapsack/bin packing problem.The task is to find a subset of size 20 that contain the most of the given subsets.
Re: Set containing max number of subsets
This could be done with dynamic programming, for some cases.
The state would be the set built so far and the score so far (number of target subsets included).
Start with the empty set, which has a score of 0.
Iterate over the list of target subsets. For each state, calculate the union of the set and the current target subset, and add one to the score. If the size of the union is greater than 20, discard it. Otherwise add it to the collection.
Optionally, you can prune out any state with an equal solution set but a lower score than another.
At the end, the state(s) with the highest score are the answer(s).
The state would be the set built so far and the score so far (number of target subsets included).
Start with the empty set, which has a score of 0.
Iterate over the list of target subsets. For each state, calculate the union of the set and the current target subset, and add one to the score. If the size of the union is greater than 20, discard it. Otherwise add it to the collection.
Optionally, you can prune out any state with an equal solution set but a lower score than another.
At the end, the state(s) with the highest score are the answer(s).

 Posts: 18
 Joined: Wed Apr 22, 2009 10:46 am
Re: Set containing max number of subsets
I tried starting from empty set, adding subset A, then B, A∪B, C , A∪C, B∪C , A∪B∪C, D ... removing duplicates(same unions with same number of subsets) and those with size >20.
This can work for a few subsets, but the list grows rapidly.
Checking each of 100!/(80!*20!) subsets is impossible also.
This can work for a few subsets, but the list grows rapidly.
Checking each of 100!/(80!*20!) subsets is impossible also.
Re: Set containing max number of subsets
Doing it this way doubles the number of combinations at each stage, apart from those you can filter out for being too large.albert_nik wrote:I tried starting from empty set, adding subset A, then B, A∪B, C , A∪C, B∪C , A∪B∪C, D ... removing duplicates(same unions with same number of subsets) and those with size >20.
This can work for a few subsets, but the list grows rapidly.
I would do it in a different order  first the empty set, then every single subset (A, B, C, ...), then every combination of two subsets (A∪B, A∪C, B∪C, A∪D, ...), then every combination of three subsets, and so on.
After each stage you can discard the previous one, since they are no longer needed.
If you have n subsets, the stage number r will have at most nCr combinations that need to be considered. This is worst at around r=n/2, but you may find that by then many of them are pruned away already for being too large.
I have tried this out, coding it up in C#. Here is the main routine (for brevity I have left out some utility functions and the implementation of the classes MySet and PotentialAnswer).
Code: Select all
static void Main()
{
// Generate some random subsets, but filter out any duplicates
MySet[] subSets = generateSubSets();
foreach (MySet subset in subSets) Console.WriteLine(subset);
HashSet<PotentialAnswer> answers1 = new HashSet<PotentialAnswer>();
HashSet<PotentialAnswer> answers2 = new HashSet<PotentialAnswer>();
// start with the empty set
answers1.Add(new PotentialAnswer());
int size = 0;
while(true)
{
// try to add one more subset to any of the current potential answers
int index = 0;
foreach (MySet subSet in subSets)
{
foreach (PotentialAnswer potentialAnswer in answers1)
{
PotentialAnswer newAnswer = PotentialAnswer.CreatePotentialAnswer(potentialAnswer, subSet, index);
// store it if it is valid
if (newAnswer != null && newAnswer.Count <= 20)
{
answers2.Add(newAnswer);
}
}
index++;
}
// if have not found better answers, we are done.
if (answers2.Count == 0) break;
size++;
Console.WriteLine(size+": "+answers2.Count);
// replace previous answers by improved ones
answers1.Clear();
var t = answers2;
answers2 = answers1;
answers1 = t;
}
foreach (PotentialAnswer answer in answers1)
{
Console.WriteLine(answer);
}
}
This works very well for, say, 20 subsets:
Code: Select all
11 55 89
21
38 57 86
59 95
42
4
79
74
10 57 72 82
61
38 57 58 93
16 20 67 73
11 16 39 70
24 95
44 86
12 62
77
1 6 46 59
25 41
35 98
1: 20
2: 190
3: 1140
4: 4845
5: 15504
6: 38760
7: 77294
8: 120594
9: 134237
10: 91958
11: 33734
12: 5717
13: 351
14: 4
01111111011001111010 4 12 21 24 25 38 41 42 44 57 58 59 61 62 74 77 79 86 93 95
01111111011001111001 4 12 21 24 35 38 42 44 57 58 59 61 62 74 77 79 86 93 95 98
01111111011001101011 4 21 24 25 35 38 41 42 44 57 58 59 61 74 77 79 86 93 95 98
01111111010001111011 4 12 21 24 25 35 38 41 42 44 57 59 61 62 74 77 79 86 95 98
Code: Select all
65
35 63 72
8 75 84 96
29
56 77 89
7 21
29 38 67 98
12 30 56
14 40 76
31
44 64
15 59
64 72
40 68 90
64 82 90 98
1 15 36 80
88
22
12 19 94
12 27 31 44
5
47
33 56 76
61 83
15 21 80
55
48
10 47
51 53
1: 29
2: 406
3: 3654
4: 23751
5: 118755
6: 474874
7: 1545720
8: 3989207
9: 7559421
10: 9590741
11: 7474127
12: 3249174
13: 696534
14: 61391
15: 1670
16: 5
11010100011110001100110011111 5 7 10 15 21 22 29 31 35 44 47 48 55 59 63 64 65 72 80 88
10010100011110011100110011111 1 5 7 10 15 21 22 29 31 36 44 47 48 55 59 64 65 72 80 88
10010100011110001101110011111 5 7 10 12 15 21 22 27 29 31 44 47 48 55 59 64 65 72 80 88
10010100011110001100110111111 5 7 10 15 21 22 29 31 44 47 48 55 59 61 64 65 72 80 83 88
10010100011110001100110011111 5 7 10 15 21 22 29 31 44 47 48 51 53 55 59 64 65 72 80 88
_{Jaap's Puzzle Page}

 Posts: 18
 Joined: Wed Apr 22, 2009 10:46 am
Re: Set containing max number of subsets
Thank You jaap.
The idea was from a lottery game (keno).
There are 80 numbers [1..80] and are selected randomly 20 numbers.
A player can select 1 to 8 numbers.
If the player's numbers are all included in the 20 he wins.
There are N players that have selected some sets with 1<=size<=8. The task is to find 20 numbers that maximize the number of winners. I dont know if there is a different approach to this problem.
The idea was from a lottery game (keno).
There are 80 numbers [1..80] and are selected randomly 20 numbers.
A player can select 1 to 8 numbers.
If the player's numbers are all included in the 20 he wins.
There are N players that have selected some sets with 1<=size<=8. The task is to find 20 numbers that maximize the number of winners. I dont know if there is a different approach to this problem.
Re: Set containing max number of subsets
I did not quite understand the question. In particular, is the output set a set of integers or a set of set of integers ? Since you are talking about keno I'm assuming it is a set of integers. In that case just put all the numbers in a set, if it is less than 20 then add random numbers (that aren't already in it) otherwise remove the integer that appears in the less number of subsets and repeat until you reach 20.
Please tell me if I have misunderstood the question.
Please tell me if I have misunderstood the question.
Re: Set containing max number of subsets
The problem is that the algorithm you propose (essentially a kind of Greedy Algorithm) will not always produce the optimal solution. At best it always gives a kind of local optimum rather than a global optimum.ggoyo wrote:I did not quite understand the question. In particular, is the output set a set of integers or a set of set of integers ? Since you are talking about keno I'm assuming it is a set of integers. In that case just put all the numbers in a set, if it is less than 20 then add random numbers (that aren't already in it) otherwise remove the integer that appears in the less number of subsets and repeat until you reach 20.
Please tell me if I have misunderstood the question.
For example, consider the first randomly generated example I gave. The sets are (one set on each line):
Code: Select all
11 55 89
21
38 57 86
59 95
42
4
79
74
10 57 72 82
61
38 57 58 93
16 20 67 73
11 16 39 70
24 95
44 86
12 62
77
1 6 46 59
25 41
35 98
There are only 4 optimal solutions:
Code: Select all
4 12 21 24 25 38 41 42 44 57 58 59 61 62 74 77 79 86 93 95
4 12 21 24 35 38 42 44 57 58 59 61 62 74 77 79 86 93 95 98
4 21 24 25 35 38 41 42 44 57 58 59 61 74 77 79 86 93 95 98
4 12 21 24 25 35 38 41 42 44 57 59 61 62 74 77 79 86 95 98
_{Jaap's Puzzle Page}
Re: Set containing max number of subsets
Yes, you are right, I was too fast answering.jaap wrote: The problem is that the algorithm you propose (essentially a kind of Greedy Algorithm) will not always produce the optimal solution. At best it always gives a kind of local optimum rather than a global optimum.
jaap wrote: All of these solutions use the numbers 4 21 24 42 44 61 74 77 79, which are used only once in the sets. How would you know to keep those? None of these optimal solutions use 11 or 16, and yet these are used twice in the sets, so how would you know to throw those out?
All these numbers have one thing in common : throwing them out cost one set but that set only contains 1 number which can be thrown out. However if I throw 1 I'd be able to throw 6 and 46 for free.
I could sort it by (1/S) where S is the sum of (1/number of appearances) of each number in each set the number appears in. For instance, in your exemple, 1 would be weighted 2/7 but 4 would be weighted 1.
I coded it in haskell (real nice for list manipulation) and it outputs your last solution (not a proof).