## Graph of the Statistics

### Graph of the Statistics

Here are some graphs I quickly made just for fun, which graph the data in the statistics page. Nothing unexpected or revealing can be divined from them, but here they are if anyone's interested.

**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

- euler
- Administrator
**Posts:**3226**Joined:**Sun Mar 05, 2006 4:49 pm**Location:**Cheshire, England-
**Contact:**

### Re: Graph of the Statistics

That's a great idea! When I get some time I'll modify the script to display that information on the Statistics page dynamically (obviously cached).

*impudens simia et macrologus profundus fabulae*

### Re: Graph of the Statistics

euler, much of the data is all flat. I think using logarithm can visually provide the variation better. When you get to implement it, please check how the graph looks with and without logarithm and decide. Also, if your hosting provider supports gnuplot it's easy to implement this. With gnuplot plot the data points using "with lines" and that would be smoother than the default option of markers.

Last edited by sivakd on Fri Sep 28, 2012 9:58 pm, edited 1 time in total.

puzzle is a euphemism for lack of clarity

### Re: Graph of the Statistics

Great to hear! I'll be looking forward to it.euler wrote:That's a great idea! When I get some time I'll modify the script to display that information on the Statistics page dynamically (obviously cached).

That's a good point, the data points for high n are visually unrevealing and indiscernible from the rest. Using a logarithm is a good idea! I'm sure euler could display the graphs with and without logarithms, as it doesn't seem so complex, although I have no idea about the implementation.sivakd wrote:euler, much of the data is all flat. I think using logarithm might can visually provide the variation better. When you get to implement it, please check how the graph looks with and without logarithm and decide. Also, if your hosting provider supports gnuplot it's easy to implement this. With gnuplot plot the data points using "with lines" and that would be smoother than the default option of markers.

I only used markers, because I thought I could only display the points and have a 'line of best fit', as in the first graph, I couldn't be bothered to do it for the second. I guessed it might be a hyperbolic curve, so just guessed some values for the first graph and when it fit mostly, plotted the curve. I remember a Project Euler problem about polynomial interpolation, but the line of best fit for these curves, it would seem to me, are asymptotic, thus I thought they were hyperbolic, but I'm not sure if there's hyperbolic interpolation.

**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

### Re: Graph of the Statistics

I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.thedoctar wrote:I only used markers, because I thought I could only display the points and have a 'line of best fit', as in the first graph, I couldn't be bothered to do it for the second. I guessed it might be a hyperbolic curve, so just guessed some values for the first graph and when it fit mostly, plotted the curve. I remember a Project Euler problem about polynomial interpolation, but the line of best fit for these curves, it would seem to me, are asymptotic, thus I thought they were hyperbolic, but I'm not sure if there's hyperbolic interpolation.

In other words, for the graph of ln(y) versus x we have

ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)

and just need to set c and a to the best fit straight line.

_{Jaap's Puzzle Page}

### Re: Graph of the Statistics

You're probably right, as I have no experience in these sorts of things. Is there any way of mathematically determining a and c?jaap wrote:I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.

In other words, for the graph of ln(y) versus x we have

ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)

and just need to set c and a to the best fit straight line.

**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

### Re: Graph of the Statistics

Least squares line fitting.thedoctar wrote:You're probably right, as I have no experience in these sorts of things. Is there any way of mathematically determining a and c?jaap wrote:I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.

In other words, for the graph of ln(y) versus x we have

ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)

and just need to set c and a to the best fit straight line.

_{Jaap's Puzzle Page}

### Re: Graph of the Statistics

Apparently gnuplot can apply logscale to the graphs. Here there are for anyone who's interested.

They don't seem to be straight lines, the second graph curves off at the end. Does this mean they're not exponential?**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

### Re: Graph of the Statistics

Semi-related to this, some stuff I did a while ago from stats on participants by country at http://www.bottlenose.demon.co.uk/artic ... rticipants (bear in mind this was back in the day of the old 50-problem "levels"), and also at http://www.bottlenose.demon.co.uk/artic ... difficulty are some attempts to correlate the "solved by" number to how long (or how many lines of code or number of svn commits) it was taking me to solve problems.

Would certainly be nice to see some more derived plots and statistics on the PE site's "Statisics" tab.

Would certainly be nice to see some more derived plots and statistics on the PE site's "Statisics" tab.

### Re: Graph of the Statistics

@ thedoctar , thanks for putting together the graphs with log.

> They don't seem to be straight lines, the second graph curves off at the end. Does this mean they're not exponential?

No, they are exponential. It's not necessary that all the data should fit into a single function. If you sort the problems in ascending order of solvers, you would notice that a few problems (like 331, 344, 361) that have very few solvers. Of course, the top 20 list also includes some of the recent problems, but with time they may go down that list. So, it's a matter of time for the steep down line to slowly raise a bit. But still the slope is not likely to be same as the rest of the bigger line. This just suggests that a small set of problems are a lot more difficult than the rest. At least, that's my interpretation of the data.

> They don't seem to be straight lines, the second graph curves off at the end. Does this mean they're not exponential?

No, they are exponential. It's not necessary that all the data should fit into a single function. If you sort the problems in ascending order of solvers, you would notice that a few problems (like 331, 344, 361) that have very few solvers. Of course, the top 20 list also includes some of the recent problems, but with time they may go down that list. So, it's a matter of time for the steep down line to slowly raise a bit. But still the slope is not likely to be same as the rest of the bigger line. This just suggests that a small set of problems are a lot more difficult than the rest. At least, that's my interpretation of the data.

puzzle is a euphemism for lack of clarity

### Re: Graph of the Statistics

I looked at some of your results, pretty interesting, although I'm pretty sure Christmas island isn't a country!timday wrote:Semi-related to this, some stuff I did a while ago from stats on participants by country at http://www.bottlenose.demon.co.uk/artic ... rticipants (bear in mind this was back in the day of the old 50-problem "levels"), and also at http://www.bottlenose.demon.co.uk/artic ... difficulty are some attempts to correlate the "solved by" number to how long (or how many lines of code or number of svn commits) it was taking me to solve problems.

Would certainly be nice to see some more derived plots and statistics on the PE site's "Statisics" tab.

No worries, it wasn't very difficult.sivakd wrote:@ thedoctar , thanks for putting together the graphs with log.

Good point. I also saw this point being made in timday's article about the relative difficulty of a problem compared to the number of people solved.sivakd wrote:No, they are exponential. It's not necessary that all the data should fit into a single function. If you sort the problems in ascending order of solvers, you would notice that a few problems (like 331, 344, 361) that have very few solvers. Of course, the top 20 list also includes some of the recent problems, but with time they may go down that list. So, it's a matter of time for the steep down line to slowly raise a bit. But still the slope is not likely to be same as the rest of the bigger line. This just suggests that a small set of problems are a lot more difficult than the rest. At least, that's my interpretation of the data.

I also notice that for problems less than 50, there seems to be a definite curve. Maybe this suggests that people who solve less than 50 tend to quit earlier?

**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

### Re: Graph of the Statistics

It seems to me that one could better focus on the middle part of the graph.

From about n=100 to n=300 the graph seems pretty much a straight line to me.

Undoubtedly there will be reasons for the fact that the extreme parts of the graph not fit onto that line.

-left part: lots of people try out a few problems but don't get hooked.

-right part: number of people involved seems to me too small to do any statistics on.

From about n=100 to n=300 the graph seems pretty much a straight line to me.

Undoubtedly there will be reasons for the fact that the extreme parts of the graph not fit onto that line.

-left part: lots of people try out a few problems but don't get hooked.

-right part: number of people involved seems to me too small to do any statistics on.

- euler
- Administrator
**Posts:**3226**Joined:**Sun Mar 05, 2006 4:49 pm**Location:**Cheshire, England-
**Contact:**

### Re: Graph of the Statistics

I've modified the scripts to display linear scaled graphs on the "Problems Solved" page; one shows 0-100% of members and the other magnifies in on the data for 0-2% of members. It's nice to see that the data is approximately exponential, and using a logarithmic scale approximates to a straight line which confirms this, but I wasn't convinced about using a logarithmic scale because for most people it is far from intuitive and the elegance of the graphical representation of the data could be lost. I think the current graphs add a great deal to the page and certainly communicates visually the massive tail-off of the number of members who solve an increasing number of problems.

*impudens simia et macrologus profundus fabulae*

### Re: Graph of the Statistics

The graph looks great! If there's one thing I might suggest, it's that maybe a line on the 0-100% graph could be also plotted at 2%, to give a better sense of scale of the second graph.

**4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz**

*fabas indulcet fames*

### Re: Graph of the Statistics

For me, histogram looks better than curve

### Re: Graph of the Statistics

So very roughly, if the distances from a 'curve of best fit' are compared, you should theoretically be able to sort problems by true 'difficulty'. Would be interesting to see that done.