Graph of the Statistics

General chat, humour, riddles, logic/lateral/word puzzles...
Post Reply
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Graph of the Statistics

Post by thedoctar »

Here are some graphs I quickly made just for fun, which graph the data in the statistics page. Nothing unexpected or revealing can be divined from them, but here they are if anyone's interested.
problem-solved-equals.png
problem-solved-equals.png (10.89 KiB) Viewed 10194 times
problem-solved-more-percentage.png
problem-solved-more-percentage.png (10.06 KiB) Viewed 10194 times
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
euler
Administrator
Posts: 4138
Joined: Sun Mar 05, 2006 4:49 pm
Location: Cheshire, England
Contact:

Re: Graph of the Statistics

Post by euler »

That's a great idea! When I get some time I'll modify the script to display that information on the Statistics page dynamically (obviously cached).
Image
impudens simia et macrologus profundus fabulae
sivakd
Posts: 217
Joined: Fri Jul 17, 2009 9:37 am
Location: California, USA
Contact:

Re: Graph of the Statistics

Post by sivakd »

euler, much of the data is all flat. I think using logarithm can visually provide the variation better. When you get to implement it, please check how the graph looks with and without logarithm and decide. Also, if your hosting provider supports gnuplot it's easy to implement this. With gnuplot plot the data points using "with lines" and that would be smoother than the default option of markers.
Last edited by sivakd on Fri Sep 28, 2012 10:58 pm, edited 1 time in total.
Image
puzzle is a euphemism for lack of clarity
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Re: Graph of the Statistics

Post by thedoctar »

euler wrote:That's a great idea! When I get some time I'll modify the script to display that information on the Statistics page dynamically (obviously cached).
Great to hear! I'll be looking forward to it.
sivakd wrote:euler, much of the data is all flat. I think using logarithm might can visually provide the variation better. When you get to implement it, please check how the graph looks with and without logarithm and decide. Also, if your hosting provider supports gnuplot it's easy to implement this. With gnuplot plot the data points using "with lines" and that would be smoother than the default option of markers.
That's a good point, the data points for high n are visually unrevealing and indiscernible from the rest. Using a logarithm is a good idea! I'm sure euler could display the graphs with and without logarithms, as it doesn't seem so complex, although I have no idea about the implementation.

I only used markers, because I thought I could only display the points and have a 'line of best fit', as in the first graph, I couldn't be bothered to do it for the second. I guessed it might be a hyperbolic curve, so just guessed some values for the first graph and when it fit mostly, plotted the curve. I remember a Project Euler problem about polynomial interpolation, but the line of best fit for these curves, it would seem to me, are asymptotic, thus I thought they were hyperbolic, but I'm not sure if there's hyperbolic interpolation.
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
jaap
Posts: 587
Joined: Tue Mar 25, 2008 3:57 pm
Contact:

Re: Graph of the Statistics

Post by jaap »

thedoctar wrote:I only used markers, because I thought I could only display the points and have a 'line of best fit', as in the first graph, I couldn't be bothered to do it for the second. I guessed it might be a hyperbolic curve, so just guessed some values for the first graph and when it fit mostly, plotted the curve. I remember a Project Euler problem about polynomial interpolation, but the line of best fit for these curves, it would seem to me, are asymptotic, thus I thought they were hyperbolic, but I'm not sure if there's hyperbolic interpolation.
I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.
In other words, for the graph of ln(y) versus x we have
ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)
and just need to set c and a to the best fit straight line.
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Re: Graph of the Statistics

Post by thedoctar »

jaap wrote:I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.
In other words, for the graph of ln(y) versus x we have
ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)
and just need to set c and a to the best fit straight line.
You're probably right, as I have no experience in these sorts of things. Is there any way of mathematically determining a and c?
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
jaap
Posts: 587
Joined: Tue Mar 25, 2008 3:57 pm
Contact:

Re: Graph of the Statistics

Post by jaap »

thedoctar wrote:
jaap wrote:I think it is more likely to be exponential, so y = a*exp(-c*x). If so, then on the logarithmic scale it would be a straight line, and therefore easy to find a fit for.
In other words, for the graph of ln(y) versus x we have
ln(y) = ln(a*exp(-c*x)) = -c*x + ln(a)
and just need to set c and a to the best fit straight line.
You're probably right, as I have no experience in these sorts of things. Is there any way of mathematically determining a and c?
Least squares line fitting.
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Re: Graph of the Statistics

Post by thedoctar »

Apparently gnuplot can apply logscale to the graphs. Here there are for anyone who's interested.
problem-solved-equals-logscale.png
problem-solved-equals-logscale.png (12.84 KiB) Viewed 10081 times
problem-solved-more-percentage-logscale.png
problem-solved-more-percentage-logscale.png (11.29 KiB) Viewed 10081 times
They don't seem to be straight lines, the second graph curves off at the end. Does this mean they're not exponential?
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
timday
Posts: 40
Joined: Tue Oct 12, 2010 11:25 pm

Re: Graph of the Statistics

Post by timday »

Semi-related to this, some stuff I did a while ago from stats on participants by country at http://www.bottlenose.demon.co.uk/artic ... rticipants (bear in mind this was back in the day of the old 50-problem "levels"), and also at http://www.bottlenose.demon.co.uk/artic ... difficulty are some attempts to correlate the "solved by" number to how long (or how many lines of code or number of svn commits) it was taking me to solve problems.

Would certainly be nice to see some more derived plots and statistics on the PE site's "Statisics" tab.
Image
sivakd
Posts: 217
Joined: Fri Jul 17, 2009 9:37 am
Location: California, USA
Contact:

Re: Graph of the Statistics

Post by sivakd »

@ thedoctar , thanks for putting together the graphs with log.

> They don't seem to be straight lines, the second graph curves off at the end. Does this mean they're not exponential?

No, they are exponential. It's not necessary that all the data should fit into a single function. If you sort the problems in ascending order of solvers, you would notice that a few problems (like 331, 344, 361) that have very few solvers. Of course, the top 20 list also includes some of the recent problems, but with time they may go down that list. So, it's a matter of time for the steep down line to slowly raise a bit. But still the slope is not likely to be same as the rest of the bigger line. This just suggests that a small set of problems are a lot more difficult than the rest. At least, that's my interpretation of the data.
Image
puzzle is a euphemism for lack of clarity
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Re: Graph of the Statistics

Post by thedoctar »

timday wrote:Semi-related to this, some stuff I did a while ago from stats on participants by country at http://www.bottlenose.demon.co.uk/artic ... rticipants (bear in mind this was back in the day of the old 50-problem "levels"), and also at http://www.bottlenose.demon.co.uk/artic ... difficulty are some attempts to correlate the "solved by" number to how long (or how many lines of code or number of svn commits) it was taking me to solve problems.

Would certainly be nice to see some more derived plots and statistics on the PE site's "Statisics" tab.
I looked at some of your results, pretty interesting, although I'm pretty sure Christmas island isn't a country!
sivakd wrote:@ thedoctar , thanks for putting together the graphs with log.
No worries, it wasn't very difficult.
sivakd wrote:No, they are exponential. It's not necessary that all the data should fit into a single function. If you sort the problems in ascending order of solvers, you would notice that a few problems (like 331, 344, 361) that have very few solvers. Of course, the top 20 list also includes some of the recent problems, but with time they may go down that list. So, it's a matter of time for the steep down line to slowly raise a bit. But still the slope is not likely to be same as the rest of the bigger line. This just suggests that a small set of problems are a lot more difficult than the rest. At least, that's my interpretation of the data.
Good point. I also saw this point being made in timday's article about the relative difficulty of a problem compared to the number of people solved.

I also notice that for problems less than 50, there seems to be a definite curve. Maybe this suggests that people who solve less than 50 tend to quit earlier?
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
hk
Administrator
Posts: 12164
Joined: Sun Mar 26, 2006 10:34 am
Location: Haren, Netherlands

Re: Graph of the Statistics

Post by hk »

It seems to me that one could better focus on the middle part of the graph.
From about n=100 to n=300 the graph seems pretty much a straight line to me.
Undoubtedly there will be reasons for the fact that the extreme parts of the graph not fit onto that line.
-left part: lots of people try out a few problems but don't get hooked.
-right part: number of people involved seems to me too small to do any statistics on.
Image
War ruins the life and health of untold numbers of innocent children.
User avatar
euler
Administrator
Posts: 4138
Joined: Sun Mar 05, 2006 4:49 pm
Location: Cheshire, England
Contact:

Re: Graph of the Statistics

Post by euler »

I've modified the scripts to display linear scaled graphs on the "Problems Solved" page; one shows 0-100% of members and the other magnifies in on the data for 0-2% of members. It's nice to see that the data is approximately exponential, and using a logarithmic scale approximates to a straight line which confirms this, but I wasn't convinced about using a logarithmic scale because for most people it is far from intuitive and the elegance of the graphical representation of the data could be lost. I think the current graphs add a great deal to the page and certainly communicates visually the massive tail-off of the number of members who solve an increasing number of problems.
Image
impudens simia et macrologus profundus fabulae
User avatar
thedoctar
Posts: 74
Joined: Fri Apr 15, 2011 11:57 am
Location: Sydney, Australia

Re: Graph of the Statistics

Post by thedoctar »

The graph looks great! If there's one thing I might suggest, it's that maybe a line on the 0-100% graph could be also plotted at 2%, to give a better sense of scale of the second graph.
4x Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz
Image
fabas indulcet fames
User avatar
usrbin
Posts: 16
Joined: Sat Apr 21, 2012 3:43 pm
Location: Beijing, China

Re: Graph of the Statistics

Post by usrbin »

For me, histogram looks better than curve :D
Image
TripleM
Posts: 382
Joined: Fri Sep 12, 2008 3:31 am

Re: Graph of the Statistics

Post by TripleM »

So very roughly, if the distances from a 'curve of best fit' are compared, you should theoretically be able to sort problems by true 'difficulty'. Would be interesting to see that done.
Post Reply