Informatics

Art&Linguistics: Word Flowers

WordFlowers

 

Application of Informatics

What does classical literature like Shakespeare's Hamlet have to do with Informatics? In the last few years, partially due to the efforts of companies like Google, text, documents, and books (in fact almost anything paper-based) have been digitalised. Publishers are starting to sell electronic books, and most academic publications are available as digital documents. And there has been lots of effort in digitizing literary works (See Project Gutenberg). This not only provides us with better access to historical and notable literature, but also gives us opportunity to further analyze the content of those works by computational means.

A research area called computational liguistics has been growing rapidly in the recent years.
This discipline attempts to make sense of text by analyzing its content statistically or quantitatively.

 

Informatics in Action

The following example displays a simple statistical analysis of several classical texts from Project Gutenberg:

- Persuasion, by Jane Austen
- Sense and Sensibility, by Jane Austen
- Emma, by Jane Austen
- Bible King James Version
- Poems of William Blake
- Songs of Innocence and Experience, by William Blake
- The Man Who Was Thursday, by G. K. Chesterton
- The Ball and The Cross, by G. K. Chesterton
- The Wisdom of Father Brown, by G. K. Chesterton
- Paradise Lost, by John Milton
- The Tragedy of Julius Caesar, by William Shakespeare
- Macbeth, by William Shakespeare
- Hamlet, by William Shakespeare
- Leaves of Grass, by Walt Whitman

 

Using a Python program, we first analyse the occurances of the word "love" and its derivatives ( "loves", "love", "loved", "loveth", "lov", "loue", "loues" ). Next, we do the same with the word "kill" ( "kill", "killed", "kills", "murder", "murdered", "killeth", "killes" ). The results are displayed in the tables below.

Word Count for "kill"
Book Total Words Selected Words
austen-emma 192432 4
austen-persuasion 98191 1
austen-sense 141586 2
bible-kjv 1010735 224
blake-poems 8360 0
blake-songs 6849 0
chesterton-ball 97396 30
chesterton-brown 89090 55
chesterton-thursday 69443 14
milton-paradise 97400 2
shakespeare-caesar 26687 8
shakespeare-hamlet 38212 13
shakespeare-macbeth 23992 7
whitman-leaves 154898 19
 
Word Count for "love"
Book Total Words Selected Words
austen-emma 192432 153
austen-persuasion 98191 60
austen-sense 141586 98
bible-kjv 1010735 469
blake-poems 8360 19
blake-songs 6849 13
chesterton-ball 97396 30
chesterton-brown 89090 8
chesterton-thursday 69443 12
milton-paradise 97400 95
shakespeare-caesar 26687 34
shakespeare-hamlet 38212 50
shakespeare-macbeth 23992 12
whitman-leaves 154898 308
View data in XML format.   View data in XML format.

 

Now, we present those results in a more engaging form (See the visualization below, drag the mouse to rotate the tree). The title of book with higher percentage of a certain word is displayed in a bigger font, and in darker red. What we demonstrate here is the proportion of a particular word count from the total word count. Thus, although bible has the highest absolute count of the word "kill", it is proportionally not as high as Chesterton's The Wisdom of Father Brown (displayed in bold and dark-red below)

Visualization of Word "Kill" (bigger and darker title has higher percentage)

To view this content, you need to install Java from java.com

Visualization of Word "Love" (bigger and darker title has higher percentage)

To view this content, you need to install Java from java.com

 

Resources

Data
Code
Other Resources