Literature: Word Usage
Application of Informatics
In the last decade, lots of effort has been spent digitizing historical documents and artifacts. This digital collection of historical data provides a wealth of information for understanding the past. In particular, language processing enables both qualitative and quantitative examination of historical texts; it can help unravel the underlying structure and provide insight to the interpretation and meaning given to the text.
Informatics in Action
In this simple example, we look at word usage in the King James version of the Bible. To get a sense of the emotional 'tone' of the Old and New Testaments, we try a simple count of the words "kill", "killed", "killeth" for each book of the bible, and compare against usage of '"love" and it's derivatives. This is a 2-step process; first we generate a set of word counts from the Bible text, then use a Javascript graphics library to chart the results in your web browser:
Let's now see the result for the world "love", "loved" and "loveth".
As you can see, St John seems to be the most approachable saint!
Resources
Data
Code [This code generates the word count data; a Javascript program is then used to generate the graphical chart above]
- View the Python source code
- Download the Python source code [Right click and Save As]
Other Resources
- python
- nltk
- Natural Language Toolkit - a free book on Natural Language Processing
- Project Gutenberg - a vast collection of free e-books
