Discussion, whether in terms of collection building or of research instruction, of print “versus” online resources is outdated and fruitless if that discussion doesn’t seek to clarify and resolve the relative existing and potential qualities of each format in terms of permanence, stability, and “structured” vs. “random” accessibility.
The classic visual impression of a library is an image of books and periodical volumes printed in ink on paper, bound together, and arranged on ranks of shelving. Depending, perhaps, on attitude of the one crafting that image, these imagined shelves might well be either an image of picturesque rows of beautifully bound classics or a vision of of dusty, dark, and utterly impenetrable frustration. But of course all libraries, and particularly academic research libraries, are also deeply engaged with non-tangible resources. Not “books,” but instead the mass of content that we variously refer to as “digital,” “electronic,” or “online” resources. Stuff that is provided to end users, usually via the Internet, out of networked computerized storage.
It is often easy to draw excessively simple dichotomies between print materials, and the ways that they are used, and electronic resources. In legal research, for instance it was long a standard practice to teach “print research” and “online research” as parallel, and largely distinct, processes. The firmness with which we tend to present that dichotomy has evolved some as the electronic tools have matured and some of the print tools have been marginalized. But it remains easy to think in essential, absolute, terms about “print research” or “electronic research” — and to have positive or negative feelings about one or the other — without really inquiring into what it is that is currently valuable about each format (both as research tools and as part of the cultural/legal record) and of how those values can best be embraced and expanded upon. This post is a fairly quick and dirty take at defining and organizing what characteristics it really is that we are talking about when we refer to “print” or “digital” materials, at least when we discuss them in value-laden ways. Each of the characteristics or qualities I’ll mention raises issues that have been much discussed and much researched. But my aim is simply to make a useful effort to organize these characteristic qualities or values into a framework that will help us better discuss what values we are upholding when we defend “print,” what we worry about when we worry about over-reliance on electronic resources, and what aims or values of print should be engineered into the digital formats used for the resources about which we care the most.
The contents of a book that is printed in multiple copies on reasonably good-quality paper are pretty permanent, as go records of human communication and culture. Books are also “permanent” in being directly human-readable and not relying on external technologies or tools (or on a functional electrical grid and Internet) to be used. There are, of course, limits to the truth of this independence from “external tools” in terms of many printed legal books. How useful is a single volume of a case reporter, in utter isolation, without accompanying tools like digests or other forms of finding aid? But compared to a computer file, even these types of specialized legal serials are remarkably self-contained and independent of technological infrastructure and mediation.
Permanence for electronic materials is complicated. It is at least theoretically possible for an electronic document that is encoded in an open and well-documented format to last forever, or for at least for as long as there is a functioning human civilization to retain it. But this depends on so much that is unknowable. Most importantly, this “permanence” lasts only so long as somebody or something is providing ongoing attention to the resource (“attention” in any number of forms from maintaining data storage in which it resides, periodically migrating storage media, caching or copying, etc.). That attention need not be particularized to the particular document or item — witness the amount of digital ephemera that is online and available long after its producers would rather it have gone away. But it needs to exist on at least some aggregate level.
So while digital information in networked electronic environments can be relatively permanent, it can also be extremely ephemeral. And compared to our familiarity with the handling of books and other print-based technologies we (as a culture, as librarians, as readers and consumers) have far less clear and well-defined understandings and expectations about when, why, or how one resource ends up, in the real world, in the “permanent” or the “ephemeral” category and what practices or standards would most effectively direct that content toward one fate or the other.
Without discussing in detail digital preservation efforts, or the parallel tendency of some online content intended as ephemera to be preserved excessively, it is easy to note that print provides a relatively predictable level of permanence (bound books in a library are “permanent,” cheaply-printed flyers or ad inserts are “ephemeral”) while the permanence of electronic/online content is “complicated” — ambiguous, uncertain, and probably unknowable.
I’ll write at some later point about what this ambiguity about “permanence” might mean for legal materials. But some level of ongoing inquiry into what we mean by permanence will be necessary as primary law begins to be “born digital” and our community presses states to adopt UELMA and engage in related programs to ensure “permanent public access” to legal materials that they officially produce online.
What I’m going to call “stability” is easy to conflate with permanence, but is not the same thing. I would advance the idea of expanding the concept of “stability” to mean that a resource is either fixed into an unalterable form or that it has qualities that make its status and the history of any modifications abundantly and unambiguously clear to any user (essentially, good “versioning”).
As a technology, most published print materials naturally have very high stability of the first, unalterable, type. And this natural stability is augmented by commercial and legal regimes that surround the distribution of published print materials. Books are physically somewhat difficult to alter. And a print run of a publication, once released into the world, further resists any kind of central or mass alteration (and legal rules such as the First-Sale Doctrine tend to accommodate this aspect of once-distributed print). It isn’t impossible to manipulate the print record, post-production. But it has historically been very difficult, especially in Western democratic societies.
Print materials less of a natural fit for the second version of “stability” — the one in which the resource can be changed over time, but in a manner that clearly communicates the status and nature of any modifications to the work and makes very clear the current version. But even here, the print world’s approaches to these challenges are well-established by long usage and range from the numbering of editions of books with clearly established practices around title pages and colophons, to the issuance of supplements and “pocket parts,” to the details of how specialist publications such as legal looseleaf services are assembled and maintained.
Some of the lingering distrust of online, and particularly “born digital,” works — especially in law — owes more to this aspect of stability than it really does to permanence. We fear for the authenticity or accuracy of online resources more than we do for their print counterparts (for reasons both founded and unfounded). Concerns about citation to material on the Web, with links intended to point to a resource that can be broken even when that resource still exists online at a different address, are another form of concern about stability. Just as with permanence, the answer to whether digital works have “stability” is complicated. With print, the production and distribution of a significant but finite number of discrete copies, each of which is naturally rather hard to change, serves as a relatively strong guarantee of stability. Digital works are easy to copy, but each copy is also normally easy to alter. This can raise questions about which copy is the “canonical,” official, or governing copy. Multitudinous rival local caches, copies, and versions can vie for authority.
But at the same time the model of a batch of discrete copies, produced at a moment of time and then released into the world (just as books are from a printing plant), with each then to be read and used in isolation, is also incorrect. In a networked world, in a sort of inverse of that “rival copy” scenario, there are also a number of ways in which digital copies can, essentially, remain “in communication” with one another. Updates cascade from servers to mirrors to archives to clients. Search engines crawl, and re-crawl, and update the copies against which the world’s search queries are matched. If a canonical copy, at a known URI, is established and the distribution of “rival” copies is suppressed or managed (which is not something that is done only for nefarious reasons), the entity who controls that copy might be able, whether through error or ill intent, to alter the legal, cultural, or historical record.
The result is that “stability” of resources in networked electronic environments exists only to the extent that it is specifically engineered into systems, and then only to the extent that those systems work as designed and are used and maintained as intended. The technology of print — its characteristics and its powers and its limitations — pushed written communication toward a certain stability. Digital technology, as an environment for written or otherwise digitally encoded communication, has no such inherent tendencies. Stability, of either the “fixed and unalterable” or the “well version-controlled” form, for digital materials shared via networks, has to be engineered into the way those resources and networks are constructed, and ultimately depends on social trust in the maintenance and reliability of those resources and networks.
My observations about permanence and stability really apply to written communication generally, and in the details of my discussion still apply broadly to written legal materials. The next two observations, about “structured” and about random (or direct) access, are largely points that are more specific to legal research.
In a legal research context — and often in our discussions about teaching legal research — when we express a preference or fondness for print tools we are often really referring to the rich universe of highly structured access tools that have long accompanied legal resources in print. Indexes and tables of contents were published in individual books. Periodicals indexes accompanied the journal literature in various fields. Elaborate headnote and digest systems emerged to guide access to case law. (We could all continue to fill in that list.) A rich universe of tools guided and structured access to the fixed words of the print library, and librarians and their staffs acted in many ways as the “technicians” of those tools.
It is too easy to forget that fundamentally there is nothing about networked electronic systems that excludes such structured access tools or resists their use or value. Some are carried forward in quite literal forms: the print finding tools (the Lists of Sections Affected and other finding tools that accompany the Code of Federal Regulations are reproduced online exactly as they appear in print. Using them in print feels mildly inconvenient, particularly compared to keyword search, but isn’t really any more cumbersome than working with these same tools and tables online and captures all of the thoroughness of using them in print (and you still don’t have to leave your chair and walk to the library stacks). To the extent that the builders and users of networked electronic tools neglect such controlled or structured tools the reasons are not that they are harder to apply to digital resources than they are to print. They may well actually be easier and cheaper to produce for digital resources than they are for print ones. But, from a producer’s standpoint the costliness of producing such tools may cause them to fail to withstand the disruptive power of full-text search-based technologies (though even those work best in contexts where some structure and contextual queues are well-encoded into the resources). Users may fall victim to the seductive, if sometimes false, easiness of the full-text search query. On platforms ranging from Google to WestlawNext, the end user may not even realize the extent to which underlying structural tools and tables may or may not still have contributed to the result they received, in the most immediate sense, from the entry of a full-text keyword query.
Other structured tools with print-based origins work better online than they do in print. Who would consider returning to the paper card catalog — with linear access only to the full list of materials under a given subject or author entry, no way to combine search facets, and no ready way, without a librarian’s expertise in the classification system, even to harvest a particular subject heading along with its broader, narrower, or otherwise similar terms? Even indexes and tables-of-contents (to say nothing of controlled-vocabulary indexing of periodical materials) can work better when combined with other tools and techniques that are possible with digital information but not with print. The kind of structured metadata behind these sorts of tools, whether provided through a fully manual or a partially automated process, is very compatible with online materials and not confined narrowly to the print resources with which those of us who have been in this field for a while tend to associate it.
I’m being loose with the terminology here, perhaps — a book can be opened to any page as well as to any other, and so technically does offer direct or random access. But the tools we associate with print do tend to direct researcher attention in structured and relatively linear ways. We go to an index term, and then directly and in order through its alphabetically listed sub-headings and cross references. We go to a topic and a key number and then, in a linear way, proceed down through the court hierarchy and back through time skimming the headnotes of cases under that key number. Indexes and tables of contents do in essence facilitate meaningful random access to the pages of a book. But they are themselves, when limited to print, read and analyzed in a relatively linear manner.
Here digital is fundamentally different, and fundamentally comes into its own. We can have full-text indexing — “indexes” of every single word in not only a single text but the whole corpus of texts or even a giant swath of the whole Internet — to facilitate free-form search queries. We can combine those queries however we like, according to Boolean logic or “natural language,” and have results returned limited only by the algorithmic innovation of the engineers and the raw computing power available to search engines. Completely unstructured documents can be searched. Documents with structure carefully encoded in machine-parsable mark up can be searched. Documents with enough implicit order (bigger heading text, variations in paragraph size) that an algorithm can provide a best-guess at what good, machine-readable, structure might have been can be searched. Patterns of linking and cross-pollination between documents can be searched and explored. Every word of every thing that is written is an index entry.
The challenge of the future for those of us who think and work with research materials and in supporting researchers will be to explore what tools and structures we can develop and use to maintain desirable permanence and stability, while blending the powers of structured and “random” access findability.