Housekeeping:
- Blogs: somewhat informal, but still require proofreading and revision!
- Comments: a challenge for next week–avoid writing “I agree/dont agree” and instead point out a specific argument your peer made and ask them a concrete, thoughtful question about it.
- Overall, let’s be careful about distinguishing between “argument” (a perspective or point-of-view that requires evidence and debate) and “subject” (the contents of the book/article)
- Next week: Disciplinary Technologies presentation! Let’s review the instructions…
- Questions?
Moretti: what is his argument, after all? What is he trying to persuade his readers about?

Exercise #2: Start by reading this article on your device
Next: Look through the handout and try to analyze the visualization in it. Full-size versions: Exercise Images if you need them.
Text Analysis

- What are we looking at?
- Can you guess what text this is?
- What words jump out at you?
Text Analysis consists of two processes: analysis (where the computer breaks down information into smaller bits like words) and synthesis (in which the computer counts these units, manipulates them, and reassembles a new text)
(Sinclair and Rockwell 243)
- Text analysis systems can search large texts quickly. They do this by preparing electronic indexes to the text so that the computer does not have to read through the entire text. When finding words can be done so quickly that it is “interactive”, it changes how you can work with the text – you can serendipitously explore without being frustrated by the slowness of the search process.
- Text analysis systems can conduct complex searches. Text analysis systems will often allow you to search for lists of words or for complex patterns of words. For example you can search for the cooccurence of two words.
- Text analysis systems can present the results in ways that suit the study of texts. Text analysis systems can display the results in a number of ways; for example, a Keyword In Context display shows you all the occurrences of the found word with one line of context.
(http://tactweb.cch.kcl.ac.uk/doc/twwhatis.htm)
What can we do with this? (via Ted Underwood)
1) Categorize documents. You can “categorize” in several different senses.
- a) Information retrieval: retrieve documents that match a query. This is what you do every time you use a search engine.
- b) (Supervised) classification: a program can learn to correctly distinguish texts by a given author, or learn (with a bit more difficulty) to distinguish poetry from prose, tragedies from history plays, or “gothic novels” from “sensation novels.” (See “Quantitative Formalism,” Pamphlet 1 from the Stanford Literary Lab.) The researcher has to provide examples of different categories, but doesn’t have to specify how to make the distinction: algorithms can learn to recognize a combination of features that is the “fingerprint” of a given category.
- c) (Unsupervised) clustering: a program can subdivide a group of documents using general measures of similarity instead of predetermined categories. This may reveal patterns you don’t expect.
3) Trace the history of particular features (words or phrases) over time. This could be viewed as a special category of corpus comparison, where you’re comparing corpora segmented on the time axis.
4) Visualization. Perhaps this isn’t technically a form of analysis, but in practice it’s important enough that it deserves to be treated as a separate analytical step. It’s impractical to list all possible forms of visualization here, but for instance, results can be visualized:
a) Geographically — to reflect, for instance, density of references to different parts of the world.
b) As a network graph — to reflect strength of affinity between different entities (characters, or topics, or what have you).
c) Through “Principal Component Analysis,” if you have multidimensional data that need to be flattened to two dimensions for ease of comprehension.
Let’s play with a visualization….
You can play with Voyant on your own! Use any text that is freely available on the web (Project Gutenberg is a good go-to) and see what kinds of results it can yield!


Leave a Reply
You must be logged in to post a comment.