Supporting software

This Toolbox is really an amalgamation of other tools used to exploit Distant Reader study carrels. They are listed below:

  • Click - implements the command-line interface to the Toolbox, and wonderful because its framework makes the interface consistent

  • Datasette - used to implement the SQL interface to the Reader’s underlying SQLite database file, and nice because it provides so many output formats

  • MALLET - used by the tm subcommand to extract latent themes

  • Matplotlib - used in the cluster subcommand to visualize the results

  • Natural Langauge Toolkit (NLTK) - used in a number of places throughout the Toolbox, and makes it easy to tokenize a text into words, ngrams, sentences, and implementing the concordance

  • scikit-learn - used in the cluster subcommand for feature extraction, calculating distances, and multidimensional scaling

  • Scipy - used in the cluster subcommand to compute hierarchies

  • textacy - builds on the functionality of spaCy and provides support for outputting sentence fragments matching particular grammars

  • word2vec - a front-end to the venerable word2vec application, and provides the necessary support for the semantics subcommand