Introduction

This section introduces a thing called the Distant Reader, its output called “study carrels”, and the Distant Reader Toolbox - a command-line tool for interacting with study carrels.

The Distant Reader (https://distantreader.org) is a Web-based system intended to supplement the traditional reading process and save the time of the student, researcher, or scholar. The Distant Reader makes it easier to analyze the large volumes of text today’s academic is expected to consume.

Given an almost arbitrary amount of text as input, the Distant Reader creates a corpus from the text, transforms it into plain text, does text mining and natural language processing against it, saves the results in a myriad of structured data files, distills the whole into a relational database file, summarizes the results, and compresses everything into a network- and computer-independent file affectionally called a “study carrel”. This file – a data set – is designed to be consumed by people as well as computers. It is sort of like a report, an index, a database, and an interactive tool all rolled up into one. For example, the Reader can consume and process things like:

  1. the whole of an undergraduate’s Philosophy 101 reading list

  2. all documents cited in a graduate student’s thesis or dissertation

  3. the researcher who needs to review the literature on mRNA, or

  4. the scholar who wants to compare and contrast the whole of Dickens’s works

In short, the Distant Reader takes sets of unstructured data (text) and transforms it into structured data amenable to analysis – “reading”.

The Distant Reader Toolbox is a command-line tool for interacting with study carrels. Because study carrels are sets of structured data, it is almost trivial to write software to query, sort, filter, group, visualize, and ultimately read – use & understand – a given corpus. The Toolbox is an example.

For more information see:

  1. the Toolbox’s home page and source code (https://github.com/ericleasemorgan/reader-toolbox)

  2. this documentation (https://reader-toolbox.readthedocs.io)

  3. the home page of the Distant Reader (https://distantreader.org)

Embrace information overload. Use the Distant Reader.