Transparency and MediaWiki

From LLN

Transparency and MediaWiki

Contents


by Walt Crawford, published May 5, 2008

Wikis can have many uses within libraries—not only the huge wikis such as Wikipedia but also library-hosted wikis. Most library wikis use the same wiki software as Wikipedia and Citizendium: MediaWiki, created to support Wikipedia and issued as open source software, free for the taking.

MediaWiki is a good choice. It obviously scales well: Your library wiki probably won’t have two million articles or be edited by tens of thousands of people. It has lots of extensions for those who need more than the standard features—and those standard features are fairly extended. The markup language is no worse than most other wikitext systems and deviates from the norm primarily in one respect, one where I regard MediaWiki’s choice as superior: You don’t link to or create new pages by using CamelCase (that is, words and phrases without internal spaces but with internal capitals); you use explicit markup for links.

One of MediaWiki’s strengths is also, potentially, a weakness: It is extremely transparent, at least in a standard install. That is to say: Anyone with access to a typical MediaWiki can find out a lot about how that wiki is being used—perhaps more than you’d like them to know.

Obvious case: Recent changes

One mark of a standard install is the left-hand boxes: typically three or more boxed areas to the left of article text, one marked navigation, one marked search, one marked toolbox. Sometimes there are more boxes. Wikipedia adds interaction and languages. PLN adds topics. Citizendium adds several and changes one or two, and reformats the boxes as expanded menus. The new Open Access Directory adds interaction. In any case, navigation and toolbox are both standard (if far from universal).

Navigation includes one clear piece of high transparency (sometimes moved to interaction): The Recent changes link. By default, it brings up the most recent 50 changes over the most recent week—and you can adjust those to track up to 30 days and up to 500 changes. (For wikis with multiple namespaces, you can usually choose which namespace you want. You can typically also hide certain categories of changes—and, crucially for a wiki’s manager/editor, you can hide your own changes.) Glancing at Recent changes for a wiki can hint at several things:

  • If a week’s worth of changes is empty or has only one or two items, and that doesn’t change much when you go to 30 days, that means the wiki isn’t being actively edited. That may not be a bad thing, depending on the nature and intent of the wiki, but it’s an interesting thing that you wouldn’t always know about most writable websites.
  • If 50 changes only go back for an hour or two, you know you’re at a lively site—unless you see that all those changes are from the same user or they all seem to involve deleting material or undoing other changes. Checking Wikipedia at 3:15 p.m. (PDT) on a Friday afternoon, the first 50 changes go back all of two minutes—and so do the first 200 changes. I don’t think you’ll find anything like that anywhere else. Even at Wiktionary, 50 changes go back half an hour—and at many active sites, that will take you back at least a day or two.
  • Who’s making the changes? If it’s all one or two names, that tells you something about the wiki as a collaborative writing project, although nothing about its worth. If it’s 20 names for 50 changes, there’s apparently a lot of collaboration.

You might also glance at the nature of the changes. If you see lots of cases where there’s an IP address instead of a user and there’s a parenthetical number in the thousands (e.g., “(+9,375)”), and a little more recently you see a named user and a negative number exactly matching the other, you’re seeing spam and spamfighting—the anonymous idiot (or bot) adding huge numbers of links, some alert editor or user reversing the change. If you wonder why more and more wikis require some level of authentication for editing, wonder no more: Any reasonably popular wiki runs into massive spam problems, and most wiki owners can’t afford to keep monitoring and reversing the problem.

Less obvious cases: Special pages

Recent changes tells the observer something about a wiki—but only about its editorial activity, which isn’t always very important, depending on the nature of the wiki.

More extensive transparency lurks behind this innocent link, usually in the toolbox: Special pages.

How much can you find out about a wiki? More than you might expect. Here are three examples. I’m not going to name them—in one case, the wiki’s too new (and promising) for such scrutiny and in all cases it’s not relevant to this discussion. The page within the Special pages list appears in bold—noting that there are a lot of pages in most Special pages lists.

Wiki A: Young and growing

  • Statistics: The total of page views for the wiki is 6,422—and an average of 6.77 page views per edit. (There’s a claim that 14 of 142 pages are “probably legitimate content pages”; that claim may be meaningless.) The most viewed page (other than home and administrative pages) was viewed just over 200 times, which isn’t bad. Note: Checked a week later, pageviews had more than doubled and the most viewed page was nearing 500 views, both strong indications of a healthy young wiki.
  • Orphaned pages: 50 pages don’t have any links from any other pages, which suggests that interlinks aren’t a primary means of navigation or that quite a few pages haven’t become part of the whole. The empty Categories page indicates another typical means of navigation that this wiki doesn’t use—which only leaves searching and the table of contents on the main page. (Dead-end pages takes a different view: Pages that don’t link anywhere else. There are even more dead-end pages, 98, but that makes some sense given the nature of this wiki.)
  • New pages: Another indication of activity—and in this case it’s a strong indication that the wiki’s being developed actively, as 50 new pages go back less than a month. Meanwhile, Oldest pages usually offers a good indication of the age of the wiki—in this case, only four months.
  • While Statistics includes pageview counts for the ten most popular pages, Popular pages offers a sense of how diffuse usage is—whether there are a lot of pages with reasonably high pageviews. For a very young, fairly specialized wiki, a cutoff of 50 views might make sense—and Popular pages shows immediately that 27 pages have at least 50 views. There isn’t an “Unpopular pages” but you can keep pulling up more sets of 50—in this case getting to three pages with two views, one with one—and one that’s never been viewed at all. Note that comparing Popular pages with the page count on Statistics shows one oddity of MediaWiki counts: Some page categories aren’t included in Popular pages. Thus, in this case, the least popular page is #112—leaving 28 mystery pages.

There’s a whole lot more. Articles with the most revisions provides insight into where the most collaboration is happening—although, without double-checking history and talk/discussion pages, it’s hard to be sure just what it means. There’s also Articles with the fewest revisions, in this case showing a lot of “stable” or relatively non-collaborative pages—26 pages with two revisions each.

Two oddities that the curious may find interesting: Long pages and Short pages. This wiki has a handful of very long pages (from 7,000 to 8,500 words, assuming six characters per word)—but even more pages that have been identified but have no content (19 pages with 0 bytes eacy).

Finally, for this bit of snooping, and ignoring more than 50 other special pages, there’s All pages—which lacks a counter but which shows alphabetic lists for each namespace. What’s a namespace? A specific kind of page, typically indicated by a prefix in the pagename. For most articles in most wikis, (Main) is the namespace (and there is no prefix). But there’s also a Talk namespace (the talk or discussion pages that appear with each article—but no Talk page will be listed unless there’s actually text on the page), Help and Help Talk namespaces, User and User Talk—and frequently more. (For example, PLN has an Essay namespace for third-party content that’s somewhat more protected than other pages and can only be viewed by registered users—and, to be sure, there’s an Essay Talk namespace to match. Note: These special namespaces are being deleted.)

What can we learn from All pages in this case? Four users have text on their pages, so we can read a little about them. Three regular pages have Talk pages, frequently interesting to investigate in an unfamiliar wiki.

In all, you’ve got a fair indication of the level and kind of activity in this blog by looking at a handful of pages—and, for a typical MediaWiki install, anyone can look at those pages. (Suspect that a wiki has gone dormant, not only in changes but in readership? Check Statistics one day and print it out or jot down some numbers—then check it again a week or a month later. Do be aware that your observations change reality: Every page you look at is a pageview.)

Wiki B: Small audience, specific focus

Let’s look at another, very different example, one that’s been around for a few years and serves a relatively small, specialized audience at one institution (but is open to anyone). You already know most of the pages I’m looking at. What can we find out about this wiki?

  • It’s being edited, but not heavily. Fifty changes go back a week and involve three different users.
  • Overall usage is impressive for a specialized wiki, with more than 1.2 million pageviews. As it happens, this is a wiki I’d looked at two months previously—and that makes the pageviews even more impressive, as it comes out to 175,000 pageviews in two months: A lot of use! (At more than 48 pageviews per edit, this is clearly a wiki used for reading more than writing.) The claim is that just over 1,100 pages out of 3,500 total are “legitimate content,” and that may be right in this case. Two content pages show more than 20,000 views and the 10th most viewed page is still well above 8,000—which is very good.
  • There are a lot of Orphaned pages—more than 1,000—including a few that are spam and many that are supposedly visible only to special users. (That’s not true: They show up from Orphaned pages, which means this wiki may be more transparent than its managers intend. For that matter, they also show up when reached from the Restricted category page.)
  • This wiki does use categories, and there are a lot fewer Uncategorized pages than orphaned pages, so categories are a strong navigation tool (but not part of the leftside toolboxes). There are more than 1,000 dead-end pages, many of which appear to be orphaned pages: Pages stored for convenience but not intended to be part of the main wiki.
  • The wiki’s been around for a while and is relatively stable in terms of topics: only two new pages were added in the last month. Oldest pages suggests that the wiki started in June 2004 (with a trial entry somewhat earlier).
  • As for breadth of use, it’s impressive. 58 pages have been viewed more than 2,000 times; another 90 have more than 1,000 pageviews; and 366 pages have at least 500 views—this in a wiki with a narrow focus and a narrow audience. (More than 1,100 pages have more than 100 views!)

What else? A handful of pages have been frequently revised (21 with more than 100 revisions) while a lot of pages haven’t involved much collaboration (more than 100 pages with two revisions and another 100+ with three). Two oddities: There are a few dozen “double redirects,” where a page has been renamed more than once and more than 20 “broken redirects”—redirects that link to nonexistent pages.

What about extremes of length? Five pages have more than 100,000 characters (thus, more than 16,000 words) and nearly two dozen in all exceed 42,000 characters (7,000 words)—the point at which MediaWiki sometimes complains about editability. Fewer than 10 pages have no content at all, but some fifty are short enough to suggest that they’re test pages. There’s nothing noteworthy in terms of namespaces.

All in all? The picture of an established specialized wiki that continues to be actively used across a broad range of content. The owners may not be aware that the “restricted” pages aren’t really restricted, but that’s about the only negative comment I can offer.

Wiki C: Wide audience, narrow focus

This wiki theoretically serves many institutions but with a narrow focus—and it’s another one I’d looked at two months ago, allowing me to see how active it is currently.

  • The wiki’s three years old and has just over three-quarters of a million pageviews—including just over 100,000 in the last two months, which is healthy activity. About 10% of all pages appear to be content pages—something over 100.
  • Two dozen pages don’t have links from other pages and six dozen are dead-end. While categories are definitely used, more than 150 pages lack categories—but checking a sampling of those showed strong linkage in most cases.
  • The wiki isn’t getting many new pages: Three in the last five months. Neither is it heavily collaborative at the moment: All edits over the last week were either spam or reversion of spam. Looking at old pages marks the start of this wiki in March 2005—with a lot of pages added in the first few months.
  • Breadth of use? Quite good. A fair number in excess of 10,000 views; a lot with more than 2,000 (more than 60, with another 50-odd exceeding 1,000); and nearly all of the pages that show up in this list (which typically excludes most special categories and namespaces) have more than 500 views—just over 160 out of a total 198. Basically, whatever’s there is being viewed frequently.
  • Some typical special pages don’t show up on this install; I can’t tell you which pages are most or least frequently revised or whether there are any double or broken redirects. On the other hand, Long pages and Short pages are here but undramatic. No page exceeds 20,000 characters (roughly 3,500 words) and only a dozen are much more than 1,000 words; there’s one empty page but only a couple more so short to be accidental or quick definitions.
  • All pages shows rather a lot of Talk pages relative to the total number of articles—which usually means one of two things: The wiki has a lot of real conversation, or there’s a spam problem. Clicking through to a sampling suggests that both are true—and the number of empty but created Talk pages says there’s an ongoing effort to battle spam.

Summing up

If you or your institution has a wiki, particularly a MediaWiki wiki, I’m not suggesting that you panic or find ways to lock things down. I believe most of this transparency is all to the good in most situations—as long as you’re aware of it.

I wouldn’t store sensitive information on supposedly-restricted pages unless you’re sure they’re restricted. I wouldn’t make claims about the activity on your wiki unless internal evidence backs up those claims.

Sure, you can make your wiki more opaque. You can use a different wiki package. Of those I’ve observed, most seem to offer a lot less information to outsiders than MediaWiki does. Or you can modify MediaWiki to be less transparent: It’s open source software, after all. If you look at wikindex , “the index for wiki sites,” the MediaWiki section lists more than 3,000 MediaWiki wikis ranked by some combination of usage, size, users and updates. (It also shows some sets of wikis using other software, but none of them have ranks at this writing.) Consider the highest-ranked wikis that aren’t various Wikipedias (that is, English, German, French, Italian, Japanese, Polish, Swedish—which are seven of the eight highest-ranked wikis). Wikipedia shows most special pages, although pageviews don’t appear, at least on English Wikipedia. What about some other “popular” wikis (many of which appear to come from Wikia, Jimmy Wales’ for-profit operation)?

  • Wikimedia Commons (a repository of open media) and Wiktionary use what appears to be the same modified set of special pages as Wikipedia, which means that information on pageviews (overall or for a given page) and popular pages is omitted.
  • Uncyclopedia, the "content-free encyclopedia" (#9 on May 5, 2008--and this collaborative joke has more than 23,000 articles) doesn’t show overall pageviews—and while there’s a “Most popular articles” page, it shows no results. Since articles also don’t show pageviews, it’s anyone’s guess as to how often people actually look at Uncyclopedia.
  • WeRelate (#11 on May 5, 2008), a genealogy wiki, has Special pages—but you have to look a little. Once there, you find most special pages but not Popular pages and overall page views but not the most widely-viewed pages. Understanding that wikindex’ ranking isn’t entirely based on pageviews, it’s interesting that this wiki has had fewer than 7.7 million pageviews—not quite 3% of wikiHow. It’s very active—there are only 1.3 pageviews per edit! (It’s also apparently only two years old, and phenomenally active for such a young wiki.)
  • wikiHow (#12 on May 5, 2008 is transparent. The appearance is heavily modified (and the statistics page, for one, far more attractive than most), but as of early May 2008, it tells me there have been more than 281 million pageviews (more than 138 per edit) and that the most popular how-to page has 1.8 million views—and Popular pages is there, showing 32 pages with at least half a million views (the “least popular” of those being “Make Jello Shots,” but #49 at 420K views is much more essential: “Calculate Pi by Throwing Frozen Hot Dogs”).
  • Wookieepedia, the Star Wars wiki (I kid you not: it ranks 13th on May 5, 2008), one of what appear to be hundreds of Wikia pop-culture wikis, includes almost all of the special pages—but also omits Popular pages and omits pageviews in Statistics.

Awareness, not opacity

You can make your wiki more opaque—but why bother? I regard MediaWiki’s transparency as a strength, not a weakness. Better to spend your time establishing a user setup methodology that reduces spam as a problem—and you can be sure that any widely-used wiki will be attacked by spammers, who will even set up accounts if no confirmation is required.

With a little awareness, wiki transparency is a good thing. If you’re wondering: To the best of my knowledge, all the special pages are available for this wiki. For PLN, transparency is a good thing. I can’t think of many library-related wikis that shouldn’t operate with reasonably full transparency.


This article is excerpted from On Wikis and Transparency, which appears in the June 2008 Cites & Insights.

Related articles


Your turn: Talk about it

Personal tools
Home