This is intended to be fun. Seriously.
If you think these numbers are precise, you're right. Two decimal points for some of these stats. Very precise. However, don't confuse precision and accuracy. The error bars are larger than many of the numbers.
First off, the data gathering is flaky. While we now scrape directly off the Perlmonks website, it still relies on a lot of stuff running. And we aren't running a 24-by-7 operation here with people monitoring that everything is up. Heck, all the real work is done on my desktop - it could be rebooted with no care for redundancy. That it works at all is a marvel. Don't assume it worked the entire 168 hours in a row because it probably didn't. Heck, we know we drop data after 168 hours. Data goes missing all the time.
Second, there are missing users. And this is on purpose. Some people don't want their names here. They participate, but their names don't show up, meaning others get artificially moved up the ranks. So the people who are the noisiest may not even show up, making it look like someone else is "hogging" the chatterbox. You can't tell whether that person is there legitimitely or just because everyone who hogs it more have elected to be hidden.
For those who want to be missing, you need to remember that all of this information is public. Even if it's transient on the server, there is absolutely nothing preventing anyone from logging it. As they say, once it gets onto the web, there's no taking it back. Even without anyone logging anything, it lives on in the CB history sites for an hour (or more - see last hour of cb which can go back up to two hours at times). And maybe even longer if the wayback machine happens to grab a piece of the site at the time, or google happens to cache a page while your remark is still there. So, even though I do not provide a feed, there is nothing preventing others from doing so. It's best to treat your CB chatter as permanent because it just might be.
Because it's public information, there is nothing in my eyes that makes it improper to capture it myself and torture the information into these stats. This makes "opting-out" legitimite the same way that sites must "opt out" of being scanned by Google or other search engines via a robots.txt file. In my case, it's just a private message to me and I will remove you from the output as soon as I get the message.
Further, because it's public information, it's also free to be quoted. Just like someone can quote what you say in the CB, someone can aggregate the information into a statistic and post that.
Even if you do see someone mentioned, especially in the most-referenced monk section or the karma section, doesn't mean that they actually participated. There's no requirement for users to reference or give karma to people who are actually present. For example, referencing paco or giving him (her?) karma doesn't mean s/he was in the CB. These can easily be misleading.
Basically, if you don't have a degree in statistics, don't try to extrapolate anything from these stats. And if you do have such a degree, you'll understand why there is no conclusion supported by any of this.
It's purely fun. Don't treat it serious, because it's not. Don't conclude anything from the stats because, well, they're inaccurate. And I'm not giving out the raw data to perform your own analysis on because then you're missing the point.
Briefly (I hope to make this more readable later... but, let's be realistic here, I probably won't):
qr/\?(?:\s|$)/
qr/\!(?:\s|$)/
my $aggress_user = qr{(?: ([^[]\S+) | )}x; my $aggress = qr{ /me\s+ (?:swats?| smacks?| beats?| slaps?| hits?| strikes?| kicks?| clobbers?| throws?\b.*?\bat| bites?| thwaps?| sits?\s*on| breathes\s+.*fire.*\s+(?:at|on|in)| spits?.*fire.*at )\s+$aggress_user }x; sub { if (/$aggress/) { require URI::Escape; my $user = URI::Escape::uri_unescape($+); # make sure the user exists... $user = CBStats::User::fetch($user); $user && $user->nodeid() > 0; } else { return 0; } }
qr/(?:^|\s|\b)[:;B8]-?[)D}P>]+|[(q]-?[:;](?:$|\s|\b)/
qr/(?:^|\s|\b):['`]?-?\(+|[)]-?['`]?[:](?:$|\s|\b)/
qr/\.\s*o\s*O\s*\(.*\)/
SUM(LENGTH(MSG))
in SQL), where
MSG is the raw text that the user typed in, unparsed.
sub { my @x = split ' ', $_; scalar @x; }and then I sum, multiply by 100, divide by count, and display using Template's ability to divide for two decimal places.
qr/\/me/
(note lack of anchor). Probably should try to note the switch into italics starting with a user's own nick, too.m[(https?://\w[\w\@:.-]*)]g
, but then
we eliminate anything with "perlmonks" in it, or anything missing a dot.our $user = qr{ \[([^\]\s][^\]]+)\] | \[\s\Qhttp://(?:www.)?perlmonks.(?:org|com)/?node(?:_id)=\E([^\s;&=]+)\s\| }x; $msg =~ / ^$user(\+\+) (?:\s*;)? \s* [-\#:.]?\s*(.*\S) | $user(\+\+|\-\-) /xwhere $user is a qr that looks for users in []'s. And, no, you can't affect your own karma. --'s are no longer counted.