Informatics

Literature: Word Usage

NLTK

 

Application of Informatics

In the last decade, lots of effort has been spent digitizing historical documents and artifacts. This digital collection of historical data provides a wealth of information for understanding the past. In particular, language processing enables both qualitative and quantitative examination of historical texts; it can help unravel the underlying structure and provide insight to the interpretation and meaning given to the text.

 

Informatics in Action

In this simple example, we look at word usage in the King James version of the Bible. To get a sense of the emotional 'tone' of the Old and New Testaments, we try a simple count of the words "kill", "killed", "killeth" for each book of the bible, and compare against usage of '"love" and it's derivatives. This is a 2-step process; first we generate a set of word counts from the Bible text, then use a Javascript graphics library to chart the results in your web browser:


Let's now see the result for the world "love", "loved" and "loveth".

 

As you can see, St John seems to be the most approachable saint!

 

Resources

Data
Code [This code generates the word count data; a Javascript program is then used to generate the graphical chart above]
Other Resources