Published on [Permalink]
Reading time: 5 minutes

Aaron Haspel just gave the 1911 Britannica the digital home it deserved

Aaron Haspel published britannica11.org last week. It is a complete, searchable, cross-referenced digital edition of the Encyclopædia Britannica’s celebrated 11th edition (1910 to 1911). One person built it. Project Gutenberg and Wikisource have been chipping away at the same artefact for over twenty years and are still not done.

The credits page is the giveaway. “Thanks most of all to Anthropic and Claude Code Opus, which did nearly all the heavy lifting, and to OpenAI and GPT Codex, which drafted the specification.” Haspel is a New York based aphorist and essayist who also writes code. That blend is the new shape of solo scholarship.

A bit of context for those who don’t know the 11th. It was published in 29 volumes between 1910 and 1911, edited by Hugh Chisholm and marketed under the slogan “Everything Explained That Is Explainable." Around forty thousand entries originally; Haspel’s reconstruction yields roughly 37,000 distinct articles. The last and best version of the work before Sears Roebuck bought the rights in 1920 and the project drifted into the American mass market.

Borges loved it. T.S. Eliot loved it. Virginia Woolf attended one of its inaugural balls and famously declared, thirteen years later, that “on or about December 1910, human character changed.” She was reaching for hyperbole. The 11th was the last great work of the encyclopaedic age. The last work that genuinely believed all human knowledge could be compiled by mortal hands and arranged in alphabetical order between cloth covers.

It is also a cultural artefact in the most literal sense. The British Empire is at its peak. Imperialism is unchallenged. World War I is three years away and still unimaginable. Some entries are jewels of late-Victorian prose, written by the last generation drilled in Latin and Greek. Some (the notorious entries on KU KLUX KLAN and NEGRO are the obvious examples) read as monuments to the educated consensus of a vanished and frequently odious age. Marie Curie, the most famous living scientist in 1910, has no entry of her own. She is mentioned briefly under her husband.

The voice is also part of the appeal. Articles in the 11th have personality. Authors editorialise. They digress. The COPENHAGEN entry is half geography and half a shot-by-shot retelling of the 1801 naval battle. This is what Haspel says he likes most about the edition: the articles have a personal tone and are less homogenised. Wikipedia’s neutral, committee-edited prose is in many ways more useful, but the trade-off is real and the 11th sits firmly on the other side of it.

The contributors are extraordinary. Hugh Chisholm wrote 43 articles himself, including JOAN OF ARC and NATIONAL DEBT. Edmund Gosse covered everything from PINDARICS to BELGIUM. The leading contributor, the archaeologist Thomas Ashby, wrote 237 entries on Italian towns. A French professor named Ernest Babelon, an authority on Carthage and ancient coins, wrote BASE-BALL. An obscure Scottish meteorologist named John Aitken wrote DUST and apparently knew more about it than seemed humanly possible. The age of the polymath, preserved in amber.

The 11th has been in the public domain for decades. At least six projects have hosted partial or complete copies in various states of OCR mangling: Project Gutenberg, Wikisource, the Internet Archive, the now-defunct LoveToKnow wiki, Theodora.com and StudyLight. None has been satisfactory. Project Gutenberg’s transcription is split across hundreds of alphabetical chunks. Wikisource’s transcription is partial and inconsistently proofread. The Internet Archive has clean scans but no usable text layer.

Haspel didn’t redo the transcription work. He built on top of it. The Wikisource text is the primary input; vision-LLM fills the gaps where Wikisource doesn’t reach (and articles based on unproofread sources are flagged). What he and Claude actually did was the structural reconstruction: detecting article boundaries across multi-page entries, parsing section headings, extracting and linking cross-references, normalising tables and equations, handling footnotes and plates, threading volume and page provenance through every article, and reconstructing the contributor index from author initials. That work is the part Distributed Proofreaders and similar volunteer efforts have never finished, because it is fiddly and unending and exactly the kind of thing humans burn out on.

What you get on the site is the payoff. Every article links back to the source scan. Volume and page numbers sit in the margins for citation. Search runs across full text, by contributor, by article title, with proper operators (min, max, exclusion). The contributor index links each writer to his own biographical entry. The Topic Index from volume 29 is faithfully reproduced. Cross-references inside articles are clickable. Long articles have section navigation rebuilt from the original shoulder-headers.

This is what changes. Two years ago, producing a site like this would have meant a small team, a grant and several years of work. Haspel scoped, specified, built the pipeline, fixed the edge cases and shipped the whole thing as an individual. The pipeline is the interesting part. The encyclopaedia is the visible part. Haspel’s own introduction to this digital edition is itself a piece of great wordsmithing.

The model matters. The 11th is one example. There are thousands of comparable cultural archives sitting in scan form on the Internet Archive, in OCR purgatory, waiting for someone to give them a decent home. Old encyclopaedias, gazetteers, scientific catalogues, parish records, regimental histories, technical journals, the 1952 Britannica Great Books with its extraordinary Syntopicon index. Until very recently, digitising any of them properly was an institutional act. It is now a solo project for someone with patience, taste and a Claude subscription.

That is the celebration. The 11th is a monument to one age. britannica11.org is a monument to another, much closer one, in which the cost of producing a polished, durable digital archive collapsed almost overnight.

Go read DUST. Or BAG-PIPE. Or EAVESDRIP, which informs us with a straight face that “though the offence of eavesdropping still exists at common law, there is no modern instance of a prosecution or indictment.” Then close the tab and remember that until last week none of this was practical to do.


Sources:

✍️ Reply by email