In the past decade, the large-scale digitization of text collections combined with the application of sophisticated algorithms and visualization techniques has led to notable scholarly achievements that can be subsumed under the title of “distant reading”. But these achievements have brought up anxieties about close reading, leading some to wonder “in an age where distant reading is possible, is close reading dead?” In our view, these two poles of reading—the close and the distant—have developed in both methodological and practical estrangement. Researchers are left to work with single collections or perform only one type of query/data mining task at a time. They find themselves unable to work easily across different—and different kinds of—collections. Indeed, working in such limited environments does not allow researchers to take full advantage of the growing collections of available digitized text data.
The Intertextual Hub is a pilot project to develop a model that will allow researchers to bridge the gap between these two modes of text analysis. We are proposing to create an environment in which the conceptual relationships discovered by text mining and algorithms among texts in large, heterogeneous collections can fruitfully guide close reading. Fundamentally, we are contending that the core of scholarly reading in the digital age—and the true usefulness of computational analysis of texts—should be the discovery and navigation of intertextual relationships. The model we develop will therefore allow users to navigate between individual and larger groups of texts that are related through shared themes, ideas, and passages. What we propose to offer, along with the scalable reading tools, is an approach to federating collections that can bypass the various competing problems of quality (OCR vs. curated) and access (pay vs. public) and still yield meaningful results.
We believe this model would make distant reading an integral part of close reading by finding, exposing, and making searchable different kinds of relationships that exist between texts. Using an array of techniques, we will develop an environment—which we will reference from now on as the Intertextual Hub—that will situate a specific document in a broader context of intertextual relations, whether direct or indirect borrowings, shared topics with other texts or parts of texts, or some other kind of similarity. This project is predicated on using what we are calling a “smarter silo model,” which involves exposing to a harvester actionable data from heterogeneous collections for a wide variety of text mining tasks while also retaining the specific configurations required by individual text collections. In other words, we are proposing to merge disparate collections at the intersection between texts rather than merge all documents in one big database, or build harvested collections unified by a lowest common denominator.
We will test this model by applying it to a number of large text collections with specific focus on the 10 years of the French Revolution and more generally on 18th-century French resources. As shown here, this extraordinarily broad set of collections ranges from the great works of the French Enlightenment and large collections of 18th-century publications to the daily record of the legislative assemblies, published debates and pamphlets, and newspaper runs. To date these collections have been deployed as distinct textual databases since they are, by their nature, made up of different kinds of documents that need customized search and reporting environments. Some are very well curated collections, with a high degree of accuracy and excellent metadata. Others are based on uncorrected OCR and have relatively minimal metadata. These heterogeneous collections allow focus on Revolutionary developments at different rhetorical registers, from the formal debates and decrees of the legislatures to the broader discussions of issues in the popular press. The heuristics of intertextual linkage applied across these collections offer the possibility of a much better understanding of how revolutionary discourse developed under the pressure of events, within the larger cultural context of 18th-century intellectual traditions. This project will allow users to navigate intellectual currents across the disparate documents and collections.