Why is so much law in PDF files?

In January, I came across Margaret Hagan’s “short manifesto,” Law’s PDF Problem, on the blog of her Open Law Lab project. She argues that too many public-facing legal resources — her examples include several explanatory and instructional charts or tools for the lay public — are locked away within PDF files, often as small parts of much longer PDFs. She argues that the lack of responsiveness of PDF files to different device sizes and screen formats, along with other obstacles to easy end-user interaction with small parts of larger PDFs, are a usability obstacle to general public access to law. As it happens, I’ve also been thinking a lot about legal publishing and the PDF, from a somewhat different angle.

PDF as a format for the distribution of legal materials (both primary law and articles and commentary related to law) may strike many readers in the legal and library communities as a “no brainer” — which is exactly why it is worth taking a deeper look at the ends served by this format selection. Several of the difficulties Hagan describes could and should be minimized by better use of the PDF format and authoring tools. Delivering documents that are likely to be searched/skimmed or to be useful in small sup-parts to the user only as large, book-length, PDFs (which must be downloaded, rendered on the end-users device, and possibly searched for a second time to locate the relevant bit) is a fairly obvious usability error regardless of whether shorter PDF files would have been appropriate to that material.

But she raises a good question as to why we retain such a strong habit of distributing legal material online, using a page description language to emulate a print “original,” rather than embracing formats, styles, and conventions that are more naturally suited for reading and use on digital devices. Indeed, this habit seems deeply ingrained even in circumstances where a print “original” may in fact be largely or even entirely hypothetical. An increasing number of documents, even official or quasi-official documents at some level of formality, go through their entire life-cycle from authorship to “publishing” to use without being embedded as ink on paper.

Citation Stability versus Usability

It may be that lawyers and legal scholars just like reading things that look like print (although a PDF document would not be the only way to accomplish that). More likely, our inclination toward document files, and specifically PDF, exists on account of a set of virtues almost directly opposite to those of responsive, mobile-friendly, browser-ready page designs. Citation practices and authenticity concerns both push us directly away from “browser native” mark up and the most flexible, device-independent, reflowable designs for our content.

Legal discourse relies on reliable and relatively (Bluebook-challenges notwithstanding) simple citation practices that work in two direction: Any citation must clearly and unambiguously point to one specific location in one particular written passage (e.g. a statute or a case) and, ideally, each location in each written passage should also consistently be cited by any Bluebook-compiant writer in the same way. Because the law changes, this must be true not only for documents that, in and of themselves, stay the same once they are written (cases, journal articles) but also for materials like statutory or administrative codes that are amended over time. So citation also requires some tool or technology or publishing practice that makes it possible to literally or figuratively “freeze” snapshots of these materials as they were at any given moment in history.

In theory. In practice there is enough ambiguity and variance in citation practices (and enough messy reality in the resources that are cited) that human judgment and discretion often enters in to what we often talk about (and encourage law students to think about) as a straightforward and quasi-mechanical process. Generating, interpreting, and even reliably identifying citations via algorithm remains difficult. Secondly, long efforts (led by the law library community) to define more vendor- and format-neutral citation practices that are less reliant on printing conventions (often specific publishers’ printing conventions) remain far from having fully taken hold. So we find ourselves tethered to the printed page for purposes of citation.

And this is where it comes back to our sometimes almost fetishistic approach to PDF. We haven’t come up with alternative tools to provide both real and perceived document “stability” for citation. So we value print conventions even to the point where we are emulating the printed page in the context of documents written, published, distributed, and consumed almost solely in electronic formats. We need tools for ensuring both preservation (including very long-term preservation) and sufficient stability for accurate, repeatable, citations. And it is true that reduction to a sufficient number of physical printed copies is the only reliable way we know to ensure long-term preservation. But the fact is that many legal documents and materials, probably inevitably including some sources of law itself, will cease to be produced in print. Thinking about best-case preservation and stability for these documents may require that we think beyond the styles and conventions of print and the use of PDF to encapsulate those.

So I share a sense that we in the legal community (and certainly in the academic reaches of it) have perhaps gone too far in relying upon a page description language for most of our “formal” communication in online media. And we haven’t necessarily thought through all the implicit ways in which that reliance continues to tie us, for better or worse, to a “printed word” paradigm. The conservative approach of the legal world to authority and particularly to citation have played a role that can’t be lightly dismissed. But I also think that some particular aspects of the history and use of online legal materials, and the past of computer-assisted legal research, have caused us to miss potential advantages of “web native” (e.g. well-structured and well-styled HTML) documents and publications for legal documentation and communication. (Most librarians and older attorneys, at least, remember the hard-to-read nature of the early CALR systems and any printed output from them.)

Ultimately there may be a call for tools that combine the advantages of the PDF for conferring stability and authority (and for preservation that retains verifiable authority) with the optimized machine readability and device flexibility of other formats.

I spent much of 2014 engaged with the development of an institutional scholarly repository in which the underlying resources are pretty much all PDF files. Most, if not all, comparable repositories of legal scholarship and other secondary materials in our field similarly present their core contents as PDF files, often “attached” to a metadata-only connecting page served as HTML from a database. The many journal and scholarly papers repositories on BePress’s Digital Commons platform are obvious examples. SSRN also distributes papers in PDF (and, in their case, without accompanying full-text indexing of the PDFs). Commercial scholarly databases also generally rely upon PDF for full text of scholarly articles, and academic-platform ebooks are often delivered via chapter level PDF.

Online primary sources also most commonly rely on PDF files, especially those that don’t emphatically disclaim their official status versus a print version. (Though given the poor quality of some state-provided or state-sanctioned versions of these resources, such as Lawriter’s Ohio statutory and administrative codes, perhaps this isn’t a bad thing. I talk more about state codes here.)

Except for Commercial Research Platforms

Significantly, the main exception to the PDF-centric approach is the main commercial legal research platforms. Bloomberg, Lexis, Westlaw, Fastcase, Casemaker. All primarily display legal texts to the end user in styled mark-up, with content drawn from a text repository in a database, and take advantage of both the comprehensive indexing they’ve conducted on the back end and the ability of browser-based tools and the modern Web to structure and deliver interactive experiences with those texts. True, one can sometimes request PDF as an addition to the native default display of those online systems. Westlaw’s PDF versions of cases, formatted to emulate the pages of its printed caselaw reporters and commonly relied upon by law journal cite checkers, are the prime example.

The Westlaw PDFs are prized — especially by law review cite checkers — for two reasons. The first, I think, is just for their surface resemblance to printed documents. The second, very valid, one is that they are not produced until the same editorial process that produces West’s printed slip opinions. So the presence of a Westlaw PDF for a case does indicate a case that has passed farther through the editorial stages that result in the official reporters. (Still not so far as to be “official” — that awaits the much later U.S. Reports volume — but at least far enough as to be included in West’s “quasi official” Supreme Court Reporter volumes.)

But, there’s the rub… by conflating those two reasons to demand the PDF for journal or judicial citation, we risk confusing a decision based on the editorial practices of a particular publishing company (private, but with a very privileged role in the production of U.S. legal materials) with a decision based on a sheer preference for something that looks more official because it shows up in a printer-ready PDF that looks, whether it ever actually appeared in a printed book or not, like “print.”

This entry was posted in Access to Law, Business of Law, Legal Publication, Librarianship, Print and tagged , , on by .

One thought on “Why is so much law in PDF files?

  1. Pingback: UELMA and Unintended Consequences • Andrew Plumb-Larrick

Leave a Reply

Your email address will not be published. Required fields are marked *