This is a list of short remarks – quips – behooving the student, researcher, or scholar to keep in mind when using the Reader Toolbox. The quips are not necessarily presented in priority order.
To paraphrase John Firth, “You shall know a word by the company it keeps.”
This means a word is rather… meaningless unless it is given context. For example, if I say the word “love”, then all sorts of connotations might go through your head. On the other hand, if I say, “I love chocolate”, then the word “love” takes on a more specific connotation. Alternatively, if I say, “I love my country”, then the connotation of the word “love” is different still. Words are merely data sans context. After context is applied words begin to become information, and only then will knowledge begin to emerge.
As a general rule of thumb, increasingly double parameter values when modeling your study carrel.
For example, when creating a list of the most frequent ngrams, begin by returning only a single word. This will answer the question, “What is the most frequently used word?” Then double the number of words to two. Repeat this process so the number of words returned are eight, sixteen, thirty two, sixty four, etc. Using this process, especially if you visualize the result, the frequency of the words and their frequency compared to each other will be much more easily grasped.
When topic modeling, start out with a single topic and a single word. This will answer the question, “If I were to characterize this study carrel using a single word, then what might that word be?” Continue the topic modeling process by doubling the number of topics. Patterns will emerge. Some topics will subdivide, and others will continually dominate. Observing these trends will offer useful insights.
If, when collocating a study carrel, the number of edges is not one and half to two times greater than the number of nodes, then double (or halve) the collocation commmand’s -l or -f values. Similarly, if you use Gephi to visualize the output of the collocation process, then cluster the result, and if the number of resulting clusters is too large, then double the modularity parameter until the result is useful.
Incrementing or decrementing parameters by a single unit value is usually a waste of time.
Computers do not address questions regarding why nor meaning. Only you can do that.
Computers are really stupid, and they only do a few things. They take some sort of input and save it in RAM or on disk. Computers then apply some sort of process to the input, such as finding the square root of a number or determining the lemma of a word. Finally, computers output the result. It is up to the student, researcher, or scholar to determine the meaning of the output.
On the other hand, it is possible to answer most newspaper reporter type of questions, questions regarding who, what, when, where, and how many. Moreover, computers are very capable when it comes to answering questions regarding quantity, and many research questions are rooted in answers regarding number. Examples include: how did a given idea ebb and flow over time, how many diseases are represented in a given work and how can they be characterized, or to what degree do Shakespeare’s plays describe love?
Put yet another way, the Toolbox only outputs observations, and it is up to you to interpret the observations. In this way, the Toolbox is like a thermometer. Suppose you live in Miami (Florida, United States). Suppose it is the month of July. Suppose the thermometer outside reads 90° Farenheit. What might you think? Suppose it is February and the thermometer outside reads 32°. What might you think? Finally, suppose it is July again, and the thermometer reads 115°. In each case you may interpret the results differently. Some people might call the temperature “hot”. Other’s will called it “normal”. Still others might call it “cold”. Some people might even think the equipment is broken or climate change is being manifested. It is up to you to interpret the observations made by the Toolbox.
The Toolbox only outputs the most mundane of truths, but more sublime truths can be garnered through interpretation.
There are zero 100% correct ways to model a study carrel; model your study carrel in the hopes of telling a compelling story from the result.