Can auditory errors and illusions better help us understand how the brain works? In episode 32 Mike Vitevitch from the University of Kansas talks with us about his research into the cognitive mechanisms underlying the Speech-to-Song auditory illusion. His article “An account of the Speech-to-Song Illusion using Node Structure Theory” was published with Nichol Castro, Joshua Mendoza, and Elizabeth Tampke in June 8, 2018 issue of the open-access journal PLOS One.
Websites and other resources
- Sample audio file from the study (4 words):
- Background on the discovery of the illusion by Diana Deutsch
- Mike’s crowdsourced speech error site
- “Repeating a word until it sounds weird is called semantic satiation“
- Verbal Transformation Effect demonstration
- About Deutsch’s ‘Sometimes Behave So Strangely‘ recording
- Example of Deutsch’s Speech-to-Song Illusion:
- “Psychology of Language” (left audio channel only):
Patrons of Parsing Science gain exclusive access to bonus clips from all our episodes and can also download mp3s of every individual episode.
Patrons can access bonus content here.
Hosts / Producers
Ryan Watkins & Doug Leigh
How to Cite
Watkins, R., Leigh, D., & S. Vitevitch, M.. (2018, September 19). Parsing Science – Speech-to-Song Illusion. figshare. https://doi.org/10.6084/m9.figshare.7109084
What’s The Angle? by Shane Ivers
Mike Vitevitch: It’s these same mechanisms that explain speech perception, speech production, that explains other kinds of speech errors as well.
Doug Leigh: this is parsing science the unpublished stories behind the world’s most compelling science as told by the researchers themselves. I’m Doug Leigh. Ryan’s out with a cold this week but he’ll be back next time.
Leigh: In 2003, the British American psychologist Diana Deutsch released the spoken word CD phantom words and other curiosities, which contained this sentence:
(Diana Deutsch’s voice): This sounds as they appear to you are not only different from those that are really present. But they sometimes behave so strangely as to seem quite impossible.
Leigh: While editing this recording in 1995, Deutsch accidentally left the phrase sometimes behave so strangely looping on a computer like this:
sometimes behaves so strangely
sometimes behaves so strangely
sometimes behaves so strangely
sometimes behaves so strangely
Leigh: Instead the phrase appearing to be spoken, Deutsch heard a melody in the sign-song cadence of her voice and dubbed this effect, the Speech-to-Song Illusion. Today, we’re joined by Mike Vitevitch from the University of Kansas. He’ll talk with us about his research into the cognitive mechanisms underlying the illusion. Here’s Mike Vitevitch.
Vitevitch: Hi! I’m Mike Vitevitch. I’m the chair and professor of psychology at the University of Kansas, and I’m a language researcher, I’m a speech researcher. Most of the people that had looked at this before were music cognition people. So, I was bringing just a very different perspective, very different lens, very different set of tools, very different set of theories to bear on this illusion. So, I think that was kind of a neat perspective that we were able to bring as language researchers. There’s been a lot of work looking at these connections between music and language. Seems like a lot of it comes from the music side. So well, why aren’t we, as you know language people, kind of doing a bit more to look at this, and then looking at that as a possible way to see how the system works? So, one of the things that really made this kind of interesting for me is — I really don’t know a lot about music and what other authors — Josh Mendoza was a composition major. He was in my intro psych class and when I did, I had you know did this as a demo, his eyes were like the size of saucers, his jaw was on the desk like all this is so cool. How do you do that, what guy like work on this. So, I was like yeah sure all right cool! So, it was nice to have somebody who really knew a lot about music, you know, working on this.Click here for more of the transcript
Leigh: Mike’s research interest is in how information pertaining to words is stored in memory, and how the organization of those words in memory enables us to access that information quickly and accurately. So, we began by asking Mike what got him interested in studying auditory illusions in the first place.
Vitevitch: Speech errors, performance errors, and illusions, they’re all sort of these opportunities to look into how a system is built. These are all sort of mistakes that our system makes and things don’t break randomly. They kind of break at the weakest points. So, things like auditory illusions and visual illusions are really helpful for seeing how the system is put together, same with speech errors and the tip of the tongue phenomenon and things like that. We’re pretty accurate most of the time, but once in a while, when we have that slip it’s an opportunity to sort of peek in behind the curtains. There to see how we actually are built. You know, it’s like their little gems that pop up that are quite useful and helpful for helping us figure out how things work.
Leigh: Node Structure Theory is a model that describes people’s perceptions and actions, and is comprised of three main processes: priming, activation, and satiation. Doug and I were eager to learn how the model conceptualizes these three processes.
Vitevitch: Nodes structure theory has these nodes, these detectors, for different sizes of speech. OK, so there’s a detector for a sound, a phoneme, there’s a detector for syllables, sort of groups of segments or groups of phonemes. There’s detectors for words, there’s detectors for the meaning of a word. Okay, so these are all nodes that represent those pieces of information, the priming is sort of what most sort of cognitive models talk about as like a spread of activation. So, there’s sort of this psychic energy that flows from one node to the next along these connections and sort of gets that detector going when a certain threshold is reached, like with a neuron. The neuron fires in the case of the detector, the detector is activated and that piece of information reaches conscious awareness. So, when a word detector finally gets primed enough it’s activated, oh I heard this particular word. The detectors are kind of like muscles, you keep using them and using them and using them, and they wear out, they tire out, right? and this is the idea of satiation the different nodes, the different detectors, are. Again, kind of like the different kinds of muscles you have, you have some muscles that are sort of endurance muscles, that just keep going and going and going. And then you’ve got other muscles that are good for short bursts, of energy in the night, you know, tire out. So, syllable nodes because the same syllable could be found in multiple words, these are more like your endurance muscles. So, they’ll keep firing and firing and firing whereas the word nodes, the word detectors, they fire and then they tire out really quickly.
Leigh: Node Structure Theory has been used as an account of everything from normal memory and language processing to dysfunctional processing, as well as differences in processing due to aging or certain cognitive deficits. Next, Mike describes how he and his team decided to test the theory’s ability to explain what happens in the Speech-to-Song Illusion.
Vitevitch: Initially, I had come across the speech, the song illusion in a journal or kind of just grabbed my attention, because most of the time when you think of illusions it’s usually visual illusions that people hear about, and read about, and stuff. So, the fact that there was an auditory one was kind of neat. The article had mentioned the one or two others as well and there’s something called the verbal transformation effect, which is hearing the same word over and over and over and over over again. And what happens is the word, yes. So, there’s a nice example online of the word flame in here: flame flame flame flame and after a while it starts to sound like blame and then same and lame and then it switches back to flame and then it becomes fame. So, there’s another article, sometime afterward, that had talked about that illusion and it talked about note structure theory as a possible explanation for it. I was kind of struck by well hey, there’s repetition involved in both of these, and it seems like it’s, you know, potentially doing the same thing. I wonder if this known structure theory could explain this speech – song illusion – and so we went looking for. Well, we don’t want cause the speech of song Lucien and really there wasn’t a good explanation out there for it. So, that started us thinking about, well, what could we do to sort of test node structure theory to see if this was actually a reasonable account for this phenomenon, the solution? And so, we just tried to take apart different parts of the system, so what happens if we somehow take out the word nose, what happens if we only activate the syllable nodes, how many words or syllables can intervene before the the nodes recover, before they are rested and are tired out, and can fire again. So, that was really kind of the general questions that we were interested in asking, and then it was just a matter of okay well, how do we test them with the stimuli we have available to us, and so that’s we’re using some non-words to try to remove the word detectors using words that you don’t know. So Spanish words, so people who don’t speak Spanish, kind of trying to take out the word detectors and seeing if we could still get this illusion.
Leigh: Mike and his team ended up conducting six separate — but related — experiments to test whether Node Structure Theory could explain the Speech-to-Song Illusion. We wondered how they went about designing these studies, as well as what it was that each experiment set out to discover.
Vitevitch: In normal speech, we have these sort of rhythmic prosodic, kinds of music-like things going on. So, we wondered, you know, do we need all that, and that was really the first experiment where we just stripped everything down to just four words that were sort of somewhat randomly selected, just concatenated together. If it’s really just these word detectors tiring out, then we should be able to get it with kind of a stripped-down stimulus like that. So, the first experiment was really our initial test into whether node structure theory would explain the speech-to-song illusion. So again, we tried to strip away things like prosody and just the meaning that you might haven’t even a short phrase and just basically kind of randomly put a handful of words together. And if the explanation, a node structure theory, was right then we should be able to get it even from this sort of stripped-down stimulus. And we did, and so that was really kind of like the first bit that OK, we might be onto something here, and allowed us to explore other aspects of the theory after that. So, that sort of sparked us to keep testing other parts of the node structure theory to see if this is really what is underlying the speeches, the song illusion. So, the second experiment we looked at words that either had a strong weak stress pattern. So, the stress is on the first syllable, or a weak strong stress pattern so the stress is on the second syllable. The strong weak stress pattern is more common in English, so it should elicit the illusion a little differently than the weak strong stress pattern. So, that was again just kind of playing around with those syllable detectors. In the third and fourth experiments, we tried to remove the word nodes altogether and just sort of stimulate the syllable nodes, and then we wanted to sort of play around with this idea of satiation, you know, how quickly can these detectors recover; and so that’s where the fifth and sixth experiments kind of come into play. Is it the word that really needs to the very or the number of syllables, and then sort of playing with both of them in the last experiment.
Leigh: Since people who don’t know a foreign language can’t form lexical nodes about those words, the team’s fourth experiment attempted to prime only syllable nodes by using words from a foreign language: Spanish. Ryan and I asked Mike how he and his team developed the Spanish words used in the experiment.
Vitevitch: Those actually came from a different experiment. A few years ago that we did, we just kind of had them lying around. So, hey let’s try them again. That other experiment was kind of a goofy thing too, so in Spanish there are masculine words and feminine words and it doesn’t really have anything to do with the object itself, it’s just a feature of the language. So, most masculine words had an O and then most of the feminine words had an “a,” ah and use una or la to describe them. So, we kind of wondered, well, one of the speakers, male… does that matter if he’s saying a female gendered word or a male gendered word? Does that come into play at all? And so, we did the study with a male and female speaker of Spanish, and they read off a list of masculine and feminine words, and when people were asked to identify the gender of the speaker, they were able to do that real easily, no problem. But, when they were asked to identify whether the word was masculine or feminine, and these were native Spanish speakers we recruited, if the gender of the speaker and the word matched, they were just like a little bit faster than if they mismatched. And we were just kind of blown away by this. So, we, you know, finally got published kinda thing and we just had these words sitting around. So, when we thought about trying to take out the word nodes from the speech-to- song illusion, in some way, we thought about using a foreign language and we happen to have these Spanish words around.
Leigh: The experiment provided evidence in support of the idea that the repeated priming of syllable nodes contributes to a song-like percept in the Speech-to-Song Illusion. So we followed up by asking Mike what’s new in the field of bilingual research.
Vitevitch: There’s some real interesting things going on with bilingual research. The whole field is kind of interesting to begin with, because American psychology thinks bilingualism is exceptional, that it’s a unique kind of thing, right? and it’s a weird kind of skill, right? you’re special if you go to Europe. Bilingualism is kind of exceptional there too, because most people speak like four or five languages. So, it’s kind of a weird field in many ways, because it’s the norm around the world usually, but here in the US it’s kind of a unique thing. It was kind of started out being studied as like a skill, like chess, right? and so, that’s where a lot of that early work was done. It kind of threw that through that lens but we’re really kind of exquisite creatures. We’re able to pick up on any kind of cue that’s out there, and even if you’ve got somebody who’s very fluent in both English and Spanish. You can pick up on those really fine-grained cues to know that oh this is a Spanish R or no this is an R like an English, or something like that. So, we tend to think there were like a light switch, we’re either speaking this language or that language, but we can pick up on those really fine cues and know what’s coming.
Leigh: Experiencing perceptual illusions can be a normal response when our perceptions don’t match what’s actually happening in the environment. Given this, we were interested in hearing what auditory illusions — such as the Speech-to-Song Illusion — say about how we interpret the world around us … as Mike explains after this short break.
ad: We Share Science
Leigh: When we left off, Mike was about to discuss what auditory illusions — such as the Speech-to-Song Illusion — say about how we interpret the world around us.
Vitevitch: We don’t realize how much we’re sort of just making things up really, you know, quite literally making things up, filling in the blanks, and stuff like that, until we have one of these errors kind of crop up and show us oh whoa okay there’s quite a bit here that I really don’t actually know. I’m just sort of filling in a lot of blanks with previous knowledge or some little bits of information out in the environment, and I think that is what makes these things so surprising and jarring, and makes them so interesting. Because, we do, I think, take for granted how we get around in the world, and how much really is not out there, right? I mean, yeah, we’re just kind of making stuff up in terms of our perception, and sometimes in our memory too. So, what happens in the speech-to-song illusion is that you’re continually stimulating those word detectors, those word nodes, and so they’re gonna get tired out, so tired that they’re not going to fire anymore, which leaves then the syllable nodes to sort of respond okay. And syllables carry the rhythmic information of language, and so the perception goes from the word, because that’s what’s reaching threshold and being activated. It’s no longer being activated, it’s tiring out. So, now you’re getting just at this sort of information from the syllable detectors, and so now you’re hearing it more as this rhythmic kind of music-like percept instead of the word that it was before. It’s these same mechanisms that explain speech perception, speech production, that it explained other kinds of speech errors as well. And so, we thought this was kind of a neat general theory that might be able to account for this kind of kooky phenomena in this kooky illusion.
Leigh: Mike and his team used intro psych students as subjects in their experiments. In episode 30 of Parsing Science we heard from Yune Lee about how deficiencies in hearing acuity may be linked to declines in people’s cognitive processing abilities, even among young people with normal hearing. So we were curious to hear how Mike and his team decided to select undergraduates to test Node Structure Theory’s ability to explain the Speech-to-Song Illusion, as well as whether they plan on investigating the phenomenon among older adults.
Vitevitch: Some of the earlier work by Diana Deutsch, who sort of discovered this by accident, initially used musically trained people, and then used people who were musical novices (are not trained), and it seems to occur in both groups. It’s not like you need musical training for this to happen or not. So, like most psychology departments, our intro psych students are asked to engage in the research process in some way, so they’re either asked to read current research and sort of write summaries of it, or to actually take part in research experiments that the faculty and graduate students and so on are conducting. So, that was our pool of participants for this particular set of studies, was our intro psych students, but older adults tend to experience the tip of the tongue phenomenon more often than younger adults. So, the tip of the tongue phenomenon is where you know a word but you just can’t bring it to mind, and so it’s on the tip of your tongue, right? So, like the thing you look through when you’re in a submarine, it’s not a microscope, it’s not a telescope, it’s all what does that think oh yeah periscope, right? So, the idea in Node Structure Theory is that basically the spread of activation between the nodes kind of is less effective as you get older, and so you can activate, in this case, sort of the detectors for the meaning of the word. But, activating the detectors for the sound of the word isn’t all that effective. If that’s true and if this theory really does a good job of accounting for the speech-to-song illusion, then we could make this prediction that older adults might experience the speech-to-song illusion less than younger adults. That’s on the drawing board, hopefully we’ll be able to test that out and see if that’s actually right. But, that’s one of the neat things about this theory. It allowed us to make some predictions like this that we hadn’t seen other people do before, in terms of trying to explain it or trying to make some predictions about how things might work. So, for the older adult study, there will probably have to recruit folks from the community or somewhere else.
Leigh: Among the countless memes that the internet brings us, it was the “Yanni or Laurel” auditory illusion that dominated the early summer of 2018. In it, a recording of the word “laurel” was heard as “Yanni” among 53% of the 500,000 people polled on Twitter, or “Laurel,” which the remainder of respondents reported hearing. Mike, however, heard “Lonnie.” Given that such auditory illusions aren’t always perceived by listeners, we asked him why this might be the case with the Speech-to-Song Illusion as well.
Vitevitch: This is one of those things where you either seem to get it or you don’t. We think that it’s a rough proxy of how quickly the word detectors recover. Why some people may and some people may not experience the illusion, if the presentation rate of the stimulus is kind of set to your recovery rate for your detectors? You might get it but if there’s a mismatch, you may not experience the illusion or experience that as strongly as somebody else. And so, this was sort of our way of doing that by manipulating the number of words you heard. If you heard the right number of words, you’d say she ate those word detectors and just hit the syllable nodes. But, if you had too many words, you basically give those word detectors a chance to recover because something else is being stimulated, and so the illusion would either not be experienced or not be as strong. And so, that’s kind of what we think is going on there. It’s sort of how quickly those word detectors can recover, so you’re getting a breather with more words basically because you’re not being hit as many times per second. Let’s say as if there were fewer words and there does seem to be kind of a sweet spot here. So, if the stimulus is too short, all right like one word, you’re not going to get this speech-to-song illusion. You may get some other auditory illusion, but you don’t get this speech to song illusion. And, if you go up to something like 10 words in it, again, you’re not going to experience the speech-to-song illusion. So, there seems to be this sweet spot around three or four words that gives you this speech-to-song illusion.
Leigh: “Effect size” refers to the extent to which variables investigated in a research study have a meaningful association with one another. In the 1960’s the statistician Jacob Cohen provided general guidelines which — for better or worse — have become the de facto standard for interpreting effects in light of their magnitude. As his study’s six experiments all had medium to large effects, we asked Mike his thoughts on the relevance of effect sizes when investigating relatively new phenomenon, such as the Speech-to-Song Illusion.
Vitevitch: You know, sometimes it’s not how big the effect is. It does it, even happen, right? I mean that’s sort of the initial stage that we’re at kind of what this does, this even happen this way. Once you know that the answer to that, then you can see how big the effect is, and that’s a different kind of question. You can also do like these sort of cost-benefit analyses. Well, you know if I do this intervention it is gonna have a big effect. But, how much does it cost to do this intervention? So yes, it has an effect, but it’s a small effect and it costs like a million dollars to do it. So, maybe we’ll invest that money somewhere else. These are all very valuable questions, very useful questions, but they’re different questions, and you ask them sort of at different stages of the research process. So, hopefully this illusion has potential impact in terms of trying to understand either speech and language disorders or problems with music perception, and then there are people that actually can’t really perceive music all that well. So, potentially we might be able to address some of those issues.
Leigh: In addition to his lab-based experiments, Mike collects speech errors that people may say or overhear at his website: http://spedi.ku.edu/. One contributor, for example, reported mishearing “I have my keys in my purse” as “I have my peas in my purse.” Another reported mis-stating “I’ve booked the tickets” as “I’ve ticked the buckets.” Since undergraduate psychology courses typically focus on formal experimental methods of research, we asked Mike what other approaches he uses in teaching research methods to his students.
Vitevitch: The research methods I’ve taught, I try to make the students do something other than an experiment. So, I teach a graduate level methods class. I am assuming that their advisor is doing a good job explaining how an experiment works. Because as psychologists, that’s what we do like 90% of the time, you know. Most of the journals are all laboratory based experiments, so like Amazon MTurk. That’s kind of a neat way to get out of a lamp, it’s still an experiment usually, but it’s a good way to get out of the lab. There’s lots of really neat methodologies to use that answer slightly different questions that an experiment does, but like I said before, what the difference is between, there is an effect versus the size of the effect versus a sort of cost-benefit analysis. We want to invest in trying to get this effect. These different research methods help you answer different questions, and they’re helpful for pushing you to ask more questions, or ask slightly different questions. The speech error stuff I love that, right? I mean you’re collecting observational data from the real world. People have developed ways to make you make mistakes in the laboratory, which match up pretty well with the naturally occurring ones. but that’s like Jane Goodall instead of like looking at the Apes, right? You’re looking at the speech that people mess up, it’s fantastic, it’s just right there in the real world. And there’s all sorts of case studies, right? HM is like the most written about individual in psychology. This is an individual who had his hippocampus removed, he had intractable epilepsy, and so the neurosurgeon removed the hippocampus because that was the source of the epilepsy and it basically sucked out his short-term memory. There’s thousands of papers about HM, and this, you know, lack of short-term memory. That’s one guy, the case study, and yet our understanding of how human memory works really grew exponentially because of this one guy. So, there’s all these methods out there that you can find examples of in psychology, but not as many as the sort of classic laboratory experiment. So, I really hope students go back to start using some of those other things, because they answer some pretty interesting questions to.
Leigh: Mike’s research into speech errors has the potential to inform how artificially intelligent systems such as Apple’s Siri in Amazon’s Alexa make sense of our speech. So, Ryan and I wrapped up our conversation by asking him his thoughts on the prospect of developing speech recognition systems which might rival that of humans.
Vitevitch: Yeah, we’ve gone to Skype for business for our phone system, which will do voice mail voice message, but it also sends you an email with a transcription of it. I have yet to read one of those that actually makes sense. I don’t know how they go about the speech recognition stuff, what their process is, but yeah some of them are comically so far off base. They’re just hilarious! So, there’s some neat work that is being done trying to see if the way that humans do speech recognition can be implemented in a machine and it doesn’t seem to work so well. We’re pretty special, we still haven’t figured out the special secret sauce there about how humans do speech recognition and do it so well. You know, we’re able to recognize a young child speaking to us and older adult, the male a female, somebody with a foreign accent. There was a great YouTube video of a guy with a Japanese accent speaking to Siri and it it was just horrible. He was just trying to get it, you know, please call my work and it just couldn’t even get that. And in a very constrained sort of situation, yeah, it was painful to kind of watch. But, you know, it sort of highlights how good humans are at this kind of stuff and how far we still have to go with some automatic speech recognition stuff.
Leigh: that was Mike Vitevitch, discussing the article “An account of the speech-to-song illusion using node structure theory,” which he published with Nicole Castro, Josh Mendoza and Elizabeth Tampkey in the June 8th 2018 issue of PLOS One. You’ll find a link to their paper at parsingscience.org/e32 along with bonus content and other material that he discussed during the episode.
Leigh: Reviewing parsing science on iTunes is a great way to help others discover the show if you haven’t already done so head over to parsingscience.org/review to learn how or if you have a comment or suggestion for future topics or guests visit us at: parsingscience.org/suggest or leave us a message toll-free at 1-866-XPLORIT.
Leigh: next time on parsing science we’ll be joined by John Lewis from the University of Alberta. He’ll talk with us about his research into the promise of using molecular tools to almost completely block the spread of deadly cancers.
John Lewis: What in fact looked like a two-fold increase or a three-fold increase was actually you know a thousandfold increase. So, when we went back and round the numbers this is like oh my god every single one inhibits over 99%. And so, that completely changed the temperature of the lab and our enthusiasm about getting the information out there.
Leigh: We hope that you’ll join us again.