Set containing max number of subsets

Arrangements, combinations and permutations, probability, ...
Post Reply
albert_nik
Posts: 18
Joined: Wed Apr 22, 2009 10:46 am

Set containing max number of subsets

Post by albert_nik » Sun May 25, 2014 11:45 am

Let's say we have the set of numbers [1..100] and some subsets with size<=5.
{1} , {25}, {2,30}, {1,2,3}, {3, 20, 50}, {40,50,60,70}, {1,2,3,4,5}....
Is it some efficient method to find a set of size=20 το contain the most sets?

User avatar
hk
Administrator
Posts: 10403
Joined: Sun Mar 26, 2006 9:34 am
Location: Haren, Netherlands

Re: Set containing max number of subsets

Post by hk » Sun May 25, 2014 2:10 pm

Doesn't have every set of size n have the same number of subsets? (2^n)
Image

albert_nik
Posts: 18
Joined: Wed Apr 22, 2009 10:46 am

Re: Set containing max number of subsets

Post by albert_nik » Sun May 25, 2014 2:31 pm

hk wrote:Doesn't have every set of size n have the same number of subsets? (2^n)
Yes.
I mean we have some (not all) singles, doubles, triples, 4-uples and 5-uples
{1} , {25}, {2,30}, {1,2,3}, {3, 20, 50}, {40,50,60,70}, {1,2,3,4,5} ...(they are less than Cnk(100, 1)+Cnk(100, 2)+Cnk(100, 3)+Cnk(100, 4)+Cnk(100, 5)

And I want a set or size=20 that contains the most the above subsets

User avatar
hk
Administrator
Posts: 10403
Joined: Sun Mar 26, 2006 9:34 am
Location: Haren, Netherlands

Re: Set containing max number of subsets

Post by hk » Sun May 25, 2014 7:23 pm

Hard to tell if you don't specify the rules that make a subset a valid subset.
Image

User avatar
kevinsogo
Administrator
Posts: 1204
Joined: Thu Sep 16, 2010 3:39 am
Location: Manila, Philippines

Re: Set containing max number of subsets

Post by kevinsogo » Mon May 26, 2014 3:34 am

I gather that what albert_nik is looking for is the following:
You are given the set {1,2,...,100}, and some subsets of it, of size &le; 5.

The task is to find a subset of size 20 that contain the most of the given subsets.
It looks similar to a knapsack/bin packing problem.

thundre
Posts: 356
Joined: Sun Mar 27, 2011 9:01 am

Re: Set containing max number of subsets

Post by thundre » Mon May 26, 2014 5:26 pm

This could be done with dynamic programming, for some cases.

The state would be the set built so far and the score so far (number of target subsets included).

Start with the empty set, which has a score of 0.

Iterate over the list of target subsets. For each state, calculate the union of the set and the current target subset, and add one to the score. If the size of the union is greater than 20, discard it. Otherwise add it to the collection.

Optionally, you can prune out any state with an equal solution set but a lower score than another.

At the end, the state(s) with the highest score are the answer(s).
Image

albert_nik
Posts: 18
Joined: Wed Apr 22, 2009 10:46 am

Re: Set containing max number of subsets

Post by albert_nik » Tue May 27, 2014 6:25 am

I tried starting from empty set, adding subset A, then B, A∪B, C , A∪C, B∪C , A∪B∪C, D ... removing duplicates(same unions with same number of subsets) and those with size >20.
This can work for a few subsets, but the list grows rapidly.
Checking each of 100!/(80!*20!) subsets is impossible also.

User avatar
jaap
Posts: 538
Joined: Tue Mar 25, 2008 3:57 pm
Contact:

Re: Set containing max number of subsets

Post by jaap » Tue May 27, 2014 9:55 am

albert_nik wrote:I tried starting from empty set, adding subset A, then B, A∪B, C , A∪C, B∪C , A∪B∪C, D ... removing duplicates(same unions with same number of subsets) and those with size >20.
This can work for a few subsets, but the list grows rapidly.
Doing it this way doubles the number of combinations at each stage, apart from those you can filter out for being too large.
I would do it in a different order - first the empty set, then every single subset (A, B, C, ...), then every combination of two subsets (A∪B, A∪C, B∪C, A∪D, ...), then every combination of three subsets, and so on.
After each stage you can discard the previous one, since they are no longer needed.
If you have n subsets, the stage number r will have at most nCr combinations that need to be considered. This is worst at around r=n/2, but you may find that by then many of them are pruned away already for being too large.

I have tried this out, coding it up in C#. Here is the main routine (for brevity I have left out some utility functions and the implementation of the classes MySet and PotentialAnswer).

Code: Select all

      static void Main()
      {
         // Generate some random subsets, but filter out any duplicates
         MySet[] subSets = generateSubSets();
         foreach (MySet subset in subSets) Console.WriteLine(subset);

         HashSet<PotentialAnswer> answers1 = new HashSet<PotentialAnswer>();
         HashSet<PotentialAnswer> answers2 = new HashSet<PotentialAnswer>();
         // start with the empty set
         answers1.Add(new PotentialAnswer());

         int size = 0;
         while(true)
         {
            // try to add one more subset to any of the current potential answers
            int index = 0;
            foreach (MySet subSet in subSets)
            {
               foreach (PotentialAnswer potentialAnswer in answers1)
               {
                  PotentialAnswer newAnswer = PotentialAnswer.CreatePotentialAnswer(potentialAnswer, subSet, index);
                  // store it if it is valid
                  if (newAnswer != null && newAnswer.Count <= 20)
                  {
                     answers2.Add(newAnswer);
                  }
               }
               index++;
            }
            // if have not found better answers, we are done.
            if (answers2.Count == 0) break;
            size++;

            Console.WriteLine(size+": "+answers2.Count);
            
            // replace previous answers by improved ones
            answers1.Clear();
            var t = answers2;
            answers2 = answers1;
            answers1 = t;
         }

         foreach (PotentialAnswer answer in answers1)
         {
            Console.WriteLine(answer);
         }
      }
Note that different combinations of subsets may have the same union, so the PotentialAnswer class also keeps track of which subsets were used in its formation.

This works very well for, say, 20 subsets:

Code: Select all

11 55 89
21
38 57 86
59 95
42
4
79
74
10 57 72 82
61
38 57 58 93
16 20 67 73
11 16 39 70
24 95
44 86
12 62
77
1 6 46 59
25 41
35 98

1: 20
2: 190
3: 1140
4: 4845
5: 15504
6: 38760
7: 77294
8: 120594
9: 134237
10: 91958
11: 33734
12: 5717
13: 351
14: 4

01111111011001111010 4 12 21 24 25 38 41 42 44 57 58 59 61 62 74 77 79 86 93 95
01111111011001111001 4 12 21 24 35 38 42 44 57 58 59 61 62 74 77 79 86 93 95 98
01111111011001101011 4 21 24 25 35 38 41 42 44 57 58 59 61 74 77 79 86 93 95 98
01111111010001111011 4 12 21 24 25 35 38 41 42 44 57 59 61 62 74 77 79 86 95 98
It gets much harder with say 30 subsets (more so especially if there are several small ones), but still doable:

Code: Select all

65
35 63 72
8 75 84 96
29
56 77 89
7 21
29 38 67 98
12 30 56
14 40 76
31
44 64
15 59
64 72
40 68 90
64 82 90 98
1 15 36 80
88
22
12 19 94
12 27 31 44
5
47
33 56 76
61 83
15 21 80
55
48
10 47
51 53

1: 29
2: 406
3: 3654
4: 23751
5: 118755
6: 474874
7: 1545720
8: 3989207
9: 7559421
10: 9590741
11: 7474127
12: 3249174
13: 696534
14: 61391
15: 1670
16: 5

11010100011110001100110011111 5 7 10 15 21 22 29 31 35 44 47 48 55 59 63 64 65 72 80 88
10010100011110011100110011111 1 5 7 10 15 21 22 29 31 36 44 47 48 55 59 64 65 72 80 88
10010100011110001101110011111 5 7 10 12 15 21 22 27 29 31 44 47 48 55 59 64 65 72 80 88
10010100011110001100110111111 5 7 10 15 21 22 29 31 44 47 48 55 59 61 64 65 72 80 83 88
10010100011110001100110011111 5 7 10 15 21 22 29 31 44 47 48 51 53 55 59 64 65 72 80 88
With much more than that, my machine would probably not be able to cope.

albert_nik
Posts: 18
Joined: Wed Apr 22, 2009 10:46 am

Re: Set containing max number of subsets

Post by albert_nik » Wed May 28, 2014 10:47 am

Thank You jaap.
The idea was from a lottery game (keno).
There are 80 numbers [1..80] and are selected randomly 20 numbers.
A player can select 1 to 8 numbers.
If the player's numbers are all included in the 20 he wins.
There are N players that have selected some sets with 1<=size<=8. The task is to find 20 numbers that maximize the number of winners. I dont know if there is a different approach to this problem.

User avatar
ggoyo
Posts: 25
Joined: Sun Jun 22, 2014 9:45 am
Location: Paris
Contact:

Re: Set containing max number of subsets

Post by ggoyo » Mon Jun 23, 2014 2:52 pm

I did not quite understand the question. In particular, is the output set a set of integers or a set of set of integers ? Since you are talking about keno I'm assuming it is a set of integers. In that case just put all the numbers in a set, if it is less than 20 then add random numbers (that aren't already in it) otherwise remove the integer that appears in the less number of subsets and repeat until you reach 20.

Please tell me if I have misunderstood the question.
Image

User avatar
jaap
Posts: 538
Joined: Tue Mar 25, 2008 3:57 pm
Contact:

Re: Set containing max number of subsets

Post by jaap » Tue Jun 24, 2014 8:17 am

ggoyo wrote:I did not quite understand the question. In particular, is the output set a set of integers or a set of set of integers ? Since you are talking about keno I'm assuming it is a set of integers. In that case just put all the numbers in a set, if it is less than 20 then add random numbers (that aren't already in it) otherwise remove the integer that appears in the less number of subsets and repeat until you reach 20.

Please tell me if I have misunderstood the question.
The problem is that the algorithm you propose (essentially a kind of Greedy Algorithm) will not always produce the optimal solution. At best it always gives a kind of local optimum rather than a global optimum.

For example, consider the first randomly generated example I gave. The sets are (one set on each line):

Code: Select all

11 55 89
21
38 57 86
59 95
42
4
79
74
10 57 72 82
61
38 57 58 93
16 20 67 73
11 16 39 70
24 95
44 86
12 62
77
1 6 46 59
25 41
35 98
There are 37 numbers used - seven multiple times, thirty used only once. Which of the ones that are used only once would you throw out?

There are only 4 optimal solutions:

Code: Select all

4 12 21 24 25 38 41 42 44 57 58 59 61 62 74 77 79 86 93 95
4 12 21 24 35 38 42 44 57 58 59 61 62 74 77 79 86 93 95 98
4 21 24 25 35 38 41 42 44 57 58 59 61 74 77 79 86 93 95 98
4 12 21 24 25 35 38 41 42 44 57 59 61 62 74 77 79 86 95 98
All of these solutions use the numbers 4 21 24 42 44 61 74 77 79, which are used only once in the sets. How would you know to keep those? None of these optimal solutions use 11 or 16, and yet these are used twice in the sets, so how would you know to throw those out?

User avatar
ggoyo
Posts: 25
Joined: Sun Jun 22, 2014 9:45 am
Location: Paris
Contact:

Re: Set containing max number of subsets

Post by ggoyo » Tue Jun 24, 2014 9:57 am

jaap wrote: The problem is that the algorithm you propose (essentially a kind of Greedy Algorithm) will not always produce the optimal solution. At best it always gives a kind of local optimum rather than a global optimum.
Yes, you are right, I was too fast answering.
jaap wrote: All of these solutions use the numbers 4 21 24 42 44 61 74 77 79, which are used only once in the sets. How would you know to keep those? None of these optimal solutions use 11 or 16, and yet these are used twice in the sets, so how would you know to throw those out?


All these numbers have one thing in common : throwing them out cost one set but that set only contains 1 number which can be thrown out. However if I throw 1 I'd be able to throw 6 and 46 for free.

I could sort it by (1/S) where S is the sum of (1/number of appearances) of each number in each set the number appears in. For instance, in your exemple, 1 would be weighted 2/7 but 4 would be weighted 1.

I coded it in haskell (real nice for list manipulation) and it outputs your last solution (not a proof).
Image

Post Reply