Confidential Report: After all they've done for truthiness...

Tuesday, August 01, 2006

After all they've done for truthiness...

On this evening's Colbert Report (yeah, that's what I do when Lady C is out of town, watch recently Tivo'd shows at 1:30 in the morning), Colbert asked his viewers to revise entries on African elephants to say that the population of African elephants have grown in the last decade or so and I notice that if you search for "African elephants", all the initial entries are currently under semi-protected status due to acts of vandalism.

Wikipedia, like Mel Gibson, is not having the Best Week Ever. Just last week I read this piece in The Onion and this article in the New Yorker.

Now, I really like Wikipedia. I've defended it before and visit it pretty frequently. When I want to know something like what else Robert W. Chambers wrote besides The King in Yellow or when James Rhodes first appeared in Iron Man comics I'll check it out on Wikipedia, but I wouldn't rely on it for any important information. The really neat thing about Wikipedia, as I see it, is its role as an experiment in self-organized systems. It is rather impressive that such a vast and decentralized group of contributors have collectively produced so many entries about so many topics in such a short space of time, but we've also learned, I think, that the accuracy and thoroughness of such a decentralized system depends on its editors' tastes for accuracy and thoroughness.

It shouldn't surprise anyone that the Wikipedia entry on The Matrix movies is longer than the entry on matrices. Likewise, some subjects are just too controversial, too contested, to ever be reliable under such a system. My rule of thumb for deciding whether to credit a Wikipedia article is to do so when I would expect to be told reliable information on a subject if I were to ask a stranger who'd told me s/he was interested in the subject. The real question is, why would someone volunteer to write/edit a Wikipedia entry? For some subjects, the likely motivation is to share their accumulated knowledge, born of pure enthusiasm (known in some quarters as geekishness). In other cases, people are interested in shaping other people's perspectives to conform to their own. You can never know for sure what motivated a particular entry, or even how warranted a person's confidence in being well-versed enough to write an entry, but you can make some educated guesses.

Still, you can never really know for sure what subject someone will want to mess with. I think I'll try to edit the entry on the African Bush Elephant.

4 Comments:

Anonymous said...: Two overly restrictive arguments that were common to many criticisms--and brought up in The New Yorker article--are that (1) pop culture article X has more words than historical/scientific article Y, and that (2) intellectual bullies own the controversies.

On the issue of (1), I wonder really if quantity should be viewed as an imbalance to quality. The Simpsons is 36,144 bytes and Jesus Christ is 56,392 bytes: does that tell us anything? And in your example: The Matrix can be put in the context of cinema, science fiction, philosophy, computer science, and sociology whereas matrices are part of a specific domain of mathematics. One has more valencies than the other. Measuring importance based on volume is more entertaining than informative.

On the issue of (2), I question first how much controversy exists in an encyclopedia, 1% of the entries? 5% of the entries? Beethoven and the internal combustion engine are more than likely the norm and more than likely very truthy entries. Second, I wonder how much of Wikipedia is disastrously biased and not simply imprecise. The best the The New Yorker could come up with was the article on global warming and yet any Wikipedia discussions on bias were held up as obsessively bureaucratic. How are minutiae dealt with in the Encyclopedia Brittanica?

I defend it too much, I know, but I suspect that in its tipping point there's an opportunity for knowledgeable individuals to become the intellectual watchdogs. They already are for many entries. The controversies can be examined without having them destroy the article. I would like to see more and better examinations on quality. Quotes both for and against Wikipedia seem extreme: will Brittanica “be crushed out of existence within five years” or is “disaster ... not too strong a word” for Wikipedia? Both positions seem silly to me.; 2:37 PM
Mr. Arkadin said...: "Measuring importance based on volume is more entertaining than informative."

Can't agree. Quantity/length is a very good indication of thoroughness. Matrices have to do with a great deal more than mathematics and there's a lot to matrix theory that isn't covered in the Wikipedia article. Sure, The Matrix can be put into all sorts of contexts, but is it important to do so? I come away from The Matrix article knowing (or is it thinking I know, or knowing what "some people say"?) a lot more than I really need to about it, but look in vain for the definition of a cofactor matrix on the "matrix" page. Sure, you can think of a few examples where quantity doesn't necessarily equate to thoroughness, but most of the time it will. A more thorough analysis would incorporate the proportion of hypertext or the cumulative text covered by the page and its linked pages (if we use that as a measure of "valences," we see that matrices have plenty), but length/words is a perfectly appropriate and likely accurate reflection of thoroughness.

You used the word "importance" above, which is really a better way to approach the issue. Certainly you can accept that the amount of "important" information (set at an arbitrary but consistent level) about Jesus Christ is more than one and a half times the amount about the Simpsons. Not if you rely on Wikipedia as a source of "important" information. Now, if Wikipedia strives to provide information about subjects and to place a valuation on that information/those subjects in a similar fashion, it probably succeeds. Lots of important institutions (markets, for instance) strive to do the same thing, so it's not a shabby achievement.

In regard to (2), I can provide you with plenty of examples drawn from my field where biases and inaccuracies enter due to the identity and point of view of the writer. Check out the pages on "international law", "freedom of contract", or similar pages that don't cite references. I've banned citations to Wikipedia from my undergrad class papers due to the frequency with which I see contentious claims on issues of law asserted with a citation to it. Imprecision is a problem as well, but a less serious one, in my opinion.

In defense of Wikipedia (see, I do it too) if you compare it to other online sources produced by a Google search, I'm sure it will outperform most competitors. I'd much rather rely (and "rely" is an important concept here, it's not just a question of being demonstrably accurate and thorough, it's also a question of deciding whether to trust or base decisions on information from a given source) on a Wikipedia entry over something posted to freerepublic.com or another clearly biased source not subject to correction or arbitration. As I suggested above, Wikipedia is like a market, meaning that it can produce some things very well, but is subject to market-type failures. It's also amenable to beneficial interventions (some of which Wikipedia exercises or will exercise.); 3:22 PM
Anonymous said...: How can we compare thoroughness of unlike subjects? I suggested that one subject had more general associations and therefore was likely to have a wider-but-shallower net. In one instance you want more thoroughness but with no definition of what should be reserved for textbooks. In another instance you want less but don't explain why the discussion of multiple associations may be excessive. Comparing the metrics of articles across different domains--as if there were an absolute measure of “importance” beyond examples in the extreme--starts to feel silly.

Many subjects on their own could warrant enough attention to fill a book--e.g. a work of art such as The Night Watch or the history of Pi--and so the *possibility* of an informed and high word count can't really be a measure of whether an article is lacking. An article is deficient if it's missing information according to an “expert” on the subject, not because it's shorter than an arbitrary other article. Matrix is deficient because a knowledgeable person says its deficient despite whether an article on The Matrix exists or not. The Matrix is more complete (as is the article on RAID configurations or hurricane Katrina) simply because those interested are more internet-savvy (or internet-obsessed). When will the Renaissance music performance and Gaelic literature pundits begin to speak up? Ultimately I think, each subject contains its own measure of discussability, and the medium of discussion defines the limits to that discussability.

Regarding bias: out of curiosity, why haven't you become an intellectual bully on those articles?; 7:15 PM
Mr. Arkadin said...: Certainly we can all arrive at some judgments about which articles are more "complete" than others; you did it yourself. I think we also agree why that's the case, and I'd suggest that the factors determining the degree of completeness an article has include not just the medium of discussion, but also the non-random set of users (and non-users, by default) of that medium. The degree of completeness of articles is a result of their choices. Sure, we can and should judge the completeness of an article on its individual merit, but if you want to assess the valuation of information on a certain subject arrived at by that medium and its users' actions, it helps to have a scale. As someone who measures and analyzes social phenomena for a living, I appreciate the "good but not perfect" measure of things, and using bytes to measure information is a pretty good one, especially if you want to rely on more than just the isolated example.

As an encyclopedia of general knowledge, not an encyclopedia of pop culture or whatever, Wikipedia invites judgment on how well it authoritatively covers subjects of general interest. If its entries are concentrated primarily on movies and current events, it fails to be, or becomes something other than, a general knowledge encyclopedia.

I haven't edited much on Wikipedia mainly because it won't get me tenure. That's also the reason for neglecting the blog. Also, I'm quite interested in producing new knowledge, rather than cataloging old knowledge. Given the infrequency with which social scientists produce anything I'd call "real knowledge" and Wikipedia's prominence, my time might be better spent in the "public good" sense editing Wikipedia entries, though.; 10:02 AM

Confidential Report

Tuesday, August 01, 2006

After all they've done for truthiness...

4 Comments:

Previous Posts