Quips¶

This is a list of remarks – quips – behooving the student, researcher, or scholar to keep in mind when using the Reader Toolbox. The quips are not necessarily presented in priority order.

Quip #1¶

To paraphrase John Firth, “You shall know a word by the company it keeps.”

This means a word is rather… meaningless unless it is given context. For example, if I say the word “love”, then all sorts of connotations might go through your head. On the other hand, if I say, “I love chocolate”, then the word “love” takes on a more specific connotation. Alternatively, if I say, “I love my country”, then the connotation of the word “love” is different still. Words are merely data sans context. After context is applied words begin to become information, and only then will knowledge begin to emerge.

Quip #2¶

As a general rule of thumb, increasingly double parameter values when modeling your study carrel.

For example, when creating a list of the most frequent ngrams, begin by returning only a single word. This will answer the question, “What is the most frequently used word?” Then double the number of words to two. Repeat this process so the number of words returned are eight, sixteen, thirty two, sixty four, etc. Using this process, especially if you visualize the result, the frequency of the words and their frequency compared to each other will be much more easily grasped.

When topic modeling, start out with a single topic and a single word. This will answer the question, “If I were to characterize this study carrel using a single word, then what might that word be?” Continue the topic modeling process by doubling the number of topics. Patterns will emerge. Some topics will subdivide, and others will continually dominate. Observing these trends will offer useful insights.

If, when collocating a study carrel, the number of edges is not one and half to two times greater than the number of nodes, then double (or halve) the collocation commmand’s -l or -f values. Similarly, if you use Gephi to visualize the output of the collocation process, then cluster the result, and if the number of resulting clusters is too large, then double the modularity parameter until the result is useful.

Incrementing or decrementing parameters by a single unit value is usually a waste of time.

Quip #3¶

Computers do not address questions regarding why nor meaning. Only you can do that.

Computers are really stupid, and they only do a few things. They take some sort of input and save it in RAM or on disk. Computers then apply some sort of process to the input, such as finding the square root of a number or determining the lemma of a word. Finally, computers output the result. It is up to the student, researcher, or scholar to determine the meaning of the output.

On the other hand, it is possible to answer most newspaper reporter type of questions, questions regarding who, what, when, where, and how many. Moreover, computers are very capable when it comes to answering questions regarding quantity, and many research questions are rooted in answers regarding number. Examples include: how did a given idea ebb and flow over time, how many diseases are represented in a given work and how can they be characterized, or to what degree do Shakespeare’s plays describe love?

Put yet another way, the Toolbox only outputs observations, and it is up to you to interpret the observations. In this way, the Toolbox is like a thermometer. Suppose you live in Miami (Florida, United States). Suppose it is the month of July. Suppose the thermometer outside reads 90° Farenheit. What might you think? Suppose it is February and the thermometer outside reads 32°. What might you think? Finally, suppose it is July again, and the thermometer reads 115°. In each case you may interpret the results differently. Some people might call the temperature “hot”. Other’s will called it “normal”. Still others might call it “cold”. Some people might even think the equipment is broken or climate change is being manifested. It is up to you to interpret the observations made by the Toolbox.

The Toolbox only outputs the most mundane of truths, but more sublime truths can be garnered through interpretation.

Quip #4¶

There are zero 100% correct ways to model a study carrel; model your study carrel in the hopes of telling a compelling story.

Quip #5¶

Modeling text – whether you use the Toolbox or not – is an iterative process: 1) articulate a research question, 2) identify content which might address the question, 3) obtain the content, 4) model (analyze) the content, 5) address the question, 6) go to Step #1. The process is never done.

Quip #6¶

The hardest part of effectively exploiting your study carrels is two-fold. First, you must articulate a research question. The question(s) can range from the mundane to the sublime. Examples might include:

Measured in number of words, how big is this study carrel and how difficult might it be to use and understand?

What are each of the items in my study carrel about? How can they be described?

Who or what is mentioned in this study carrel, what do they do, and how are they related to each other?

What is the defintion of love, honor, truth, and beauty?

How does St. Augustine define love, and how is his defintion different from Rousseau’s?

To a large degree, your ability to articulate research questions hinges on your ability to think critically.

Second, effectively exploiting study carrels is dependent on your ability to quickly and easily tranform one data structure to another. The majority of the time, to accomplish this task, you need to know how to:

redirect the Toolbox’s output to a file

edit the output; this can be done a text editor, a spreadsheet-like application, or programatically

save the edits

open the edits in a more specific analysis program or visualization application

Text editors and spreadsheet-like applications are indispensble for the person who does not know how to write software. You must know how to use the applications’ find/replace functions, add or delete columns of content, and export the results to any number of formats: CSV, TSV, plain text, other delimited formats, JSON, etc. All of this is true because more specific analysis programs and visualization applications (such as concordancers, word cloud generators, network analysis applications, topic modelers, etc.) require their input to be in a myriad of data structures.

To a large degree, your ability to transform one data structure to another is a matter of practice and attention to detail. Believe me, it is not rocket surgery.