Parsing Science

We set out talk with David Kernot from Australia’s Defence Science and Technology Group about William Shakespeare’s true identity, but soon discovered his work has implications on national security and suicide prevention, as well as diagnosing Alzheimer’s years before it can be otherwise identified. In episode 23 of Parsing Science, David talks with us about the many applications of his research into training algorithms to uncover peoples’ personalities from their written words. His open-access article “Using Shakespeare’s Sotto Voce to Determine True Identity From Text” was published in Frontiers in Psychology on March 15, 2018, co-authored with Terry Bossomaier and Roger Bradbury.

Uncovering Uncertain Identities - David Kernot

Uncovering Uncertain Identities - David Kernot Uncovering Uncertain Identities - David Kernot

@rwatkins says:
To close out our conversation, Doug and I were interested in learning what David might have learned which goes beyond his original purpose of uncovering people’s identities and personalities for national security purposes.

@rwatkins says:
Probably the most often-repeated aphorism made about statistical models is one from the 1970s attributed to British statistician George Box: "all models are wrong; some are useful," . A model, then, is not made better by continually adding to it, but rather by selecting only those features that contribute best to its explanatory power. In this context, David talked with us what amount of data his R-PAS algorithm requires to meaningfully make predictions.

@rwatkins says:
Beyond suggesting the identities of literary figures, David hopes that his algorithm may be applicable in other areas as well. Here, he describes some of the other potential applications for R-PAS.

@rwatkins says:
David initially used hierarchical clustering – a method for identifying groupings of data that are similar to one another – to differentiate the writings of William Shakespeare from his contemporaries, Christopher Marlowe and Elizabeth Carey. Ryan and I asked David to talk with us about what he found, and to suggest how we might interpret the results of his analysis.

@rwatkins says:
Richness is a measure of a person’s ability to use a large vocabulary in their speech and writing. This ability is related to an author’s age and education. As David already discussed, one theory of the authorship of Shakespeare’s works attributes Christopher Marlowe as his true author. Here, David explains how he tested this claim … and what he found.

@rwatkins says:
In machine learning, a single large dataset is typically divided into a “training set” from which a computer develops a model of the relationships among variables, and a “testing set” which is later used to determine the accuracy of that model. But while Shakespeare wrote a substantial amount of material, the contemporaries of his who are suspected as the potential authors of his work … didn’t. So David found it necessary to create smaller pieces of text by dividing them up randomly into equivalently sized “chunks.” Here, David describes how he went about verifying that this process didn’t decompose Shakespeare’s texts so much that it would be impossible to discern them as Shakespeare’s words.

@rwatkins says:
Of course, Doug and I accepted David’s invitation, wondering what kinds of “tells” R-PAS looks for to identify the “hidden voice” present in someone’s writing, and how these might suggest of an author’s mental state.

@rwatkins says:
Having explained the elements of his algorithm, Ryan and I asked David to take a step back for us and describe what the broader aims were of his various studies, and what performance gains his R-PAS algorithm accumulated over the course of his studies.

@rwatkins says:
David developed a computational algorithm which he dubbed “R-PAS,” named for the four features it considers as being indicative of an author’s identity and personality: the richness of their vocabulary, their use of personal pronouns, the degree of their activity in referring to the five senses, and their preference for adjectives referring to those senses. David shared what each of these four fingerprints of an author’s style might say about their psychology.

@rwatkins says:
David’s background is in criminal justice and computer science, specifically as they apply to national security. Here, he describes how his analysis of classic literature set out as a means of detecting those who may be susceptible to radicalization.

@rwatkins says:
For someone regarded as the greatest writer in the English language, it’s surprising how little is known about the life of William Shakespeare. He’s believed to have been born Gulielmus Shaksper in April of 1564, and that he authored nearly 40 plays and more than 150 sonnets in his 52 years of life. But over 200 years after his death, skepticism began to grow about the true authorship of various works attributed to him. Doug and I began our conversation with David by asking him to explain why determining Shakespeare’s identity continues to captivate researchers to this day.

Click bottom of waveform to add your comments

Subscribe: iTunes | Google Podcasts | Google Play | Spotify | RSS

Websites

David’s SciFi books on Amazon.
Supplemental materials from the article.
Complete works of William Shakespeare

Bonus Clips

Patrons of Parsing Science gain exclusive access to bonus clips from all our episodes and can also download mp3s of every individual episode.

Support us for as little as $1 per month at Patreon. Cancel anytime.

Patrons can access bonus content here.

We’re not a registered tax-exempt organization, so unfortunately gifts aren’t tax deductible.

Hosts / Producers

Ryan Watkins & Doug Leigh

How to Cite

Watkins, R., Leigh, D., & Kernot, D. (2018, May 17). Parsing Science – Uncovering Uncertain Identities (Version 1). figshare. https://doi.org/10.6084/m9.figshare.6281486

Music

What’s The Angle? by Shane Ivers

Uncovering Uncertain Identities – David Kernot