Supporting software¶
This Toolbox is really an amalgamation of other tools used to exploit Distant Reader study carrels. They are listed below:
Click - implements the command-line interface to the Toolbox, and wonderful because its framework makes the interface consistent
Datasette - used to implement the SQL interface to the Reader’s underlying SQLite database file, and nice because it provides so many output formats
MALLET - used by the tm subcommand to extract latent themes
Matplotlib - used in the cluster subcommand to visualize the results
Natural Langauge Toolkit (NLTK) - used in a number of places throughout the Toolbox, and makes it easy to tokenize a text into words, ngrams, sentences, and implementing the concordance
scikit-learn - used in the cluster subcommand for feature extraction, calculating distances, and multidimensional scaling
Scipy - used in the cluster subcommand to compute hierarchies
textacy - builds on the functionality of spaCy and provides support for outputting sentence fragments matching particular grammars
word2vec - a front-end to the venerable word2vec application, and provides the necessary support for the semantics subcommand