Wordsmith Tools is the name of a program for statistically analysing the vocabulary of large samples of language, to see how words are patterned; it was developed by a gentle genius, Dr Mike Scott. I thought it might be interesting to compare RLS’s essays with some other essays, just to get an inkling of some things that might make them distinctive. To do this, I made a word-frequency list of all of RLS’s essays and compared it with two control corpuses via a ‘keywords’ analysis. For Wordsmith ‘key words’ are those whose frequency is unusually high in comparison with a comparative corpus. (You can also look at the words that are unusually infrequent in comparison with the other corpus.) I did this all very quickly, so it’s only intended here as an amusing entertainment that might provoke thought.
1. Comparison of RLS’s essays with Sampson’s 1912 anthology ‘Nineteenth-century Essays’
Sampson’s Nineteenth-century Essays is a one-volume collection (so quite small) including: Carlyle, “On History”; Macaulay, “Ranke’s History of the Popes”; Bagehot, “Shakespeare — the Man”; Newman, “Literature”; Ruskin, “Sir Joshua and Holbein”; Arnold, “Marcus Aurelius”; and Stevenson, “A Penny Plain and Twopence Coloured” (which I omitted from the corpus file).
The most characteristic words in RLS (I took all his essays) compared with Sampson’s selection are (in descending ‘keyness’):
I MY UPON A ME YOU SOME YOUR AND SOMEWHAT ROAD MOMENT LAST ALTHOUGH ABOUT HIMSELF
and the words most characteristic of the Sampson corpus that are little used by RLS in his essays are (in ascending negative ‘keyness’):
PROTESTANTS SCRIPTURE DOCTRINE THEE CHRISTIANS THEREFORE LANGUAGE ROMAN SCIENCE POWER SPAIN CHRISTIANITY EUROPE THY PROTESTANT GREEK CHURCH CATHOLIC HISTORY THOU ROME WHICH
Who would have thought that ‘which’ was so little used by RLS in comparison with the six other essayists?
The two lists make an interesting random poem: RLS’s key words focussing on subjectivity, partiality (some, somewhat), concession (although), simply perceived phenomena (and), movement (road – one of only two nouns!), experience (moment); while the Victorian sages have those terribly heavy nouns and heavy links (therefore, which).
2. Comparison with Modern English Essays, edited by Ernest Rhys
Rhys’s substantial five-volume collection from 1922 contains RLS’s “Walking Tours” (in vol 2), which I removed from the corpus file, then made a word-frequency list, and used this to compare with the wordlist of Stevenson’s essays.
Interestingly the words that stood out as most unusual in comparison with the other texts were again suggestive of subjectivity and interpersonal relations, and once more we find ‘and’:
YOU I YOUR MY AND
The most characteristic Stevenson words include some proper nouns (Knox, Burns, Arethusa – I had included An Inland Voyage as an essay-like text), but also UPON (RLS tends to use this rather than ‘on’), SOME (13th position) and SOMEWHAT (in 20th place), as in the previous list. Other interesting words near the top of the list include: PLEASURE (14), PLEASURES (23); YET (15), the only conjunction in the top group; and, once again ROAD (24).
What about the words that were, instead, significantly more frequent in the five volumes of ‘Modern English Essays’? Here, the list contains a lot of proper names (MONTAIGNE, JAMES, GEORGE…) as Rhys’s selection tends towards critical essays, but once again the most frequent word in the control corpus in comparison with Stevenson is WHICH. Curious.