David Kernot from the Australian National University talks with us about his research into using William Shakespeare’s texts for training an algorithm to uncover authors’ identities and personalities from their written words. His article “Using Shakespeare’s Sotto Voce to Determine True Identity From Text” was published in Frontiers in Psychology on March 15, 2018, co-authored with Terry Bossomaier and Roger Bradbury.

Uncovering Uncertain Identities - David Kernot
Uncovering Uncertain Identities - David Kernot
Uncovering Uncertain Identities - David Kernot Uncovering Uncertain Identities - David Kernot
@rwatkins says:
To close out our conversation, Doug and I were interested in learning what David might have learned which goes beyond his original purpose of uncovering people’s identities and personalities for national security purposes.
@rwatkins says:
Probably the most often-repeated aphorism made about statistical models is one from the 1970s attributed to British statistician George Box: "all models are wrong; some are useful," . A model, then, is not made better by continually adding to it, but rather by selecting only those features that contribute best to its explanatory power. In this context, David talked with us what amount of data his R-PAS algorithm requires to meaningfully make predictions.
@rwatkins says:
Beyond suggesting the identities of literary figures, David hopes that his algorithm may be applicable in other areas as well. Here, he describes some of the other potential applications for R-PAS.
@rwatkins says:
David initially used hierarchical clustering – a method for identifying groupings of data that are similar to one another – to differentiate the writings of William Shakespeare from his contemporaries, Christopher Marlowe and Elizabeth Carey. Ryan and I asked David to talk with us about what he found, and to suggest how we might interpret the results of his analysis.
@rwatkins says:
Richness is a measure of a person’s ability to use a large vocabulary in their speech and writing. This ability is related to an author’s age and education. As David already discussed, one theory of the authorship of Shakespeare’s works attributes Christopher Marlowe as his true author. Here, David explains how he tested this claim … and what he found.
@rwatkins says:
In machine learning, a single large dataset is typically divided into a “training set” from which a computer develops a model of the relationships among variables, and a “testing set” which is later used to determine the accuracy of that model. But while Shakespeare wrote a substantial amount of material, the contemporaries of his who are suspected as the potential authors of his work … didn’t. So David found it necessary to create smaller pieces of text by dividing them up randomly into equivalently sized “chunks.” Here, David describes how he went about verifying that this process didn’t decompose Shakespeare’s texts so much that it would be impossible to discern them as Shakespeare’s words.
@rwatkins says:
Of course, Doug and I accepted David’s invitation, wondering what kinds of “tells” R-PAS looks for to identify the “hidden voice” present in someone’s writing, and how these might suggest of an author’s mental state.
@rwatkins says:
Having explained the elements of his algorithm, Ryan and I asked David to take a step back for us and describe what the broader aims were of his various studies, and what performance gains his R-PAS algorithm accumulated over the course of his studies.
@rwatkins says:
David developed a computational algorithm which he dubbed “R-PAS,” named for the four features it considers as being indicative of an author’s identity and personality: the richness of their vocabulary, their use of personal pronouns, the degree of their activity in referring to the five senses, and their preference for adjectives referring to those senses. David shared what each of these four fingerprints of an author’s style might say about their psychology.
@rwatkins says:
David’s background is in criminal justice and computer science, specifically as they apply to national security. Here, he describes how his analysis of classic literature set out as a means of detecting those who may be susceptible to radicalization.
@rwatkins says:
For someone regarded as the greatest writer in the English language, it’s surprising how little is known about the life of William Shakespeare. He’s believed to have been born Gulielmus Shaksper in April of 1564, and that he authored nearly 40 plays and more than 150 sonnets in his 52 years of life. But over 200 years after his death, skepticism began to grow about the true authorship of various works attributed to him. Doug and I began our conversation with David by asking him to explain why determining Shakespeare’s identity continues to captivate researchers to this day.
{{svg_share_icon}}
Click bottom of waveform to add your comments


Subscribe: iTunes | Google Play | Android | RSS

Websites

Bonus Clips

Please note that on July 25th we will be moving bonus clips and access to download episodes to be exclusively available to our patron members. For as little as $1 per month you can become a patron of Parsing Science and continue to have access to these resources.

{{svg_share_icon}}
0

Exclusion of stage directions from dataset

{{svg_share_icon}}
1

Detection of depression in modern writers P. D. James vs. Iris Murdoch

{{svg_share_icon}}
1

Natural Language Processing (NLP) to identify gender via sensory modalities

{{svg_share_icon}}
0

Measuring depression via concreteness and imagery words

{{svg_share_icon}}
0

Analyzing sensory adjectives to determine identity and mental state

{{svg_share_icon}}
0

On Shakespeare’s coining of novel terms

{{svg_share_icon}}
0

Change in personal pronouns changed over time

{{svg_share_icon}}
0

Personal pronouns differentiate Shakespeare, Christopher Marlowe, and Elizabeth Carey

{{svg_share_icon}}
0

Emotion word use is independent of mood of the story

{{svg_share_icon}}
0

Depression among Lone Wolf school shooters and women returning from warzones

{{svg_share_icon}}
0

Tuning RPAS by comparing the algorithm against extant datasets

Hosts / Producers

Ryan Watkins & Doug Leigh

How to Cite

Watkins, R., Leigh, D., & Kernot, D.. (2018, May 17). Parsing Science – Uncovering Uncertain Identities (Version 1). figshare. https://doi.org/10.6084/m9.figshare.6281486.v1

Music

What’s The Angle? by Shane Ivers