In striving to develop expertise, are 10,000 hours of deliberate practice really required, and must it be guided by a teacher or coach? In episode 59, we’re joined by Brooke Macnamara from Case Western Reserve University. She’ll discuss her attempted replication of the study which led to the mantra popularized by Malcolm Gladwell that these criteria are necessary to master a task. Her article, “The role of deliberate practice in expert performance: revisiting Ericsson, Krampe & Tesch-Römer (1993)”, was published with Megha Maitra on August 21, 2019 in the journal Royal Society Open Science.
Websites and other resources
Press and Media
Patrons of Parsing Science gain exclusive access to bonus clips from all our episodes and can also download mp3s of every individual episode.
Patrons can access bonus content here.
Hosts / Producers
Ryan Watkins & Doug Leigh
How to Cite
Watkins, R., Leigh, D., & Macnamara, B. (2019, October 1). Parsing Science – Does practice make perfect?. figshare. https://doi.org/10.6084/m9.figshare.9937238
What’s The Angle? by Shane Ivers
Macnamara: It could be at this point, since people now know about the 10,000 hour rule that they could assume that they’re practicing more and that’s why they’re good.
Watkins: This is Parsing Science, the unpublished stories behind the world’s most compelling science, as told by the researchers themselves. I’m Ryan Watkins
Leigh: and I’m Doug Leigh today in Episode 59 of Parsing Science were joined by Brooke Macnamara from Case Western Reserve University. She’ll discuss her attempt to replicate the 1993 study, behind the mantra popularized by Malcolm Gladwell, that 10,000 hours of deliberate practice on a skill are necessary to become an expert in it. Here’s Brooke Macnamara.
Macnamara: Hi I’m Brooke Macnamara. I am currently an Associate Professor at Case Western Reserve University. I’m a cognitive psychologist in the department of psychological sciences. I was born in rural Virginia in the Shenandoah Valley. I’ve actually had a career before this current one. So when I was 16, I decided that I wanted to be an American sign language English interpreter. So I went to college for that initially, so my bachelor’s degree is in American Sign Language English interpretation. So I ended up getting interested in aptitude for interpreting. I did a master’s degree at Union Institute and University, which was sort of a non-traditional program you find your own mentors. And I got the research bug and I just couldn’t let it go. And there was nowhere else for me to go in terms of getting more research experience and education except a traditional PhD program. So I went to Princeton University and studied it under Andy Conway, who examines individual differences in cognitive abilities. And initially the idea was to get this PhD and then go back to interpreting but again research just kind of held on to me and wouldn’t let me go. So I stayed in the academic route and now I study a bit more broadly skill acquisition and expertise across domains.
Motivation for the replication study
Watkins: The importance of deliberate practice in developing expertise has been a long-held foundation of psychological literature ever since a 1993 seminal study by K. Anders Ericsson and his colleagues. Especially since Ericsson was brought on to the faculty at Florida State University while Doug and I were graduate students there, we were curious what motivated Brooke to investigate if she could replicate their results.
Macnamara: Individual differences research, so looking at individual differences and cognitive abilities. And skill acquisition and expertise research, not only are they often examining the same questions but they they don’t hardly acknowledge each other. And they used different tasks and that’s what I was interested in. So thinking with my background of simultaneous interpreting, sort of the idea of deliberate practice I realized would not really be all that useful. Because if you’re a freelance interpreter, so that means your day begins maybe you’re interpreting a surgery and then an accounting class and that a political rally and then a play. You really can’t deliberately practice anything because you’re probably never going to hear the same sentence twice. So if you actually tried to practice one specific thing over and over again that actually could even screw things up because it wouldn’t give you the flexibility to think ‘yeah this needs to be stated differently because there’s underlying context. So that actually has different meaning’. Things like that. And so with individual differences research I noticed that they tended to choose tasks where people didn’t have previous knowledge, trying to avoid practice effects because they weren’t interested in practice effects. They wanted to try to isolate differences in cognitive abilities. Whereas tasks that were showing up in the skill acquisition expertise literature, so things written by Ericsson and others, were the opposite. Were domains where, not always, but certainly domains where you could see that practice was certainly important. So I was interested in and I’ve continued to be interested in how characteristics of the task probably moderate how important cognitive abilities are versus practice.
[ Back to topics ]
The original study
Leigh: While the idea that experts spend at least 10,000 hours dedicated to deliberate practice was popularized by Malcolm Gladwell, in his book Outliers. Few people know much about what the original study actually involved. Here Brooke describes Ericsson’s original study.
Macnamara: The original study was conducted in the Music Academy of West Berlin. They went to the faculty at this music conservatory and asked the faculty to nominate who they thought were the best violin students who were most likely to have a career an international soloist career. And then they asked them to nominate a number of good violinists who were doing well in the program, but just not as good as the best violinists. And then they went to the music education department at the same conservatory, where the requirements to get in, sort of the performance requirements, were lessened. So these were kind of the less accomplished violinists. And then they asked them to rate a number of activities. They asked them some things about when they first started playing the violin, how many music teachers they had, and most importantly they asked them to estimate, beginning from when they first started playing the violin, how many hours per week they practiced alone. And then they did that per year and they multiplied by weeks in a year, and then summed across the time up to age 18. And they compared the best and good violinists, and found that the best violinists had practiced significantly more than the good violinists. And then they combined the best and good violinists into one group and compared them to the less accomplished music education students, and found that the combined best in good violinists had practiced more than the less accomplished violinists. So they concluded that deliberate practice largely explains differences in performance even at elite levels. They rejected the idea that innate talent was important in any way, except for height and body size in some sports. They said that their account was sufficient to explain differences and expertise. So they made lots of claims based on this study.
[ Back to topics ]
Watkins: Prior to seeking to replicate Ericsson study Brooke examined the combined effects identified in previous studies carried out by other researchers into deliberate practice. The domains in those these ranged from music to education, as well as a variety of professional and amateur sports. Doug and I were interested in hearing what she learned from her meta-analyses, as well as what eventually led her to seek to replicate Ericsson’s 1993 work.
Macnamara: What we found was that deliberate practice seems to be pretty important, but it wasn’t as important as had been claimed in the original paper. And something to be clear about with this is that there’s a difference between intra individual differences, so change within an individual, and inter-individual differences, the differences across people. And so yeah, after that meta analysis I got a fair amount of hate mail from people saying ‘How dare you say practice isn’t important. You know when I practice the piano I get better’, and that’s absolutely true. So if you’re you’re thinking about within an individual practice is going to be the key, you know, probably explaining almost a hundred percent of why you are improving. But that doesn’t mean that an hour of practice for one person is the same as an hour of practice for somebody else. It doesn’t explain as much between individuals, because everybody’s different. We have different starting points, we have different learning rates, we have different asymptotes, so generally we found that it varied by domain, but it didn’t largely explained variance across people as had originally been claimed. So I had done a few other meta-analyses from then, and then decided that I wanted to conduct this replication of the original study.
[ Back to topics ]
Defining deliberate practice
Leigh: Unambiguous statements of how researchers intend to define and measure the variables are necessary before setting about any research study. Well there might be multiple defensible ways that these operational definitions are crafted, part of Brooke’s concern was how the construct of deliberate practice was operationalized in Ericsson’s original study, as well as how she ought to define and measure it in her replication.
Macnamara: One of the things that was interesting in the original paper is that a lot of the claims about deliberate practice, and the definition that was given of deliberate practice in this kind of theoretical way, was that it’s practice designed by a teacher to improve specific aspects of performance. But then when asking the violin students about practice they asked them to estimate amounts of practice alone. So the operational definition of deliberate practice was practice alone, even though the theoretical definition was practice designed by a teacher. So the issue with that is, if you’re making claims about practice designed by a teacher but then you’re not measuring practice designed by a teacher, you operationalize deliberate practice as something else, in this case practice alone, then you can’t back up the claims that you made about a theoretical definition because you didn’t measure it. And this has been something interesting in the literature ever since. So I included in my replication a lot of quotes all by Dr. Ericsson and in some of them, he sometimes he by himself sometimes him and his colleagues, specifically says that in the original study they defined deliberate practice as practice designed by a teacher, and then other times they say the original study defined deliberate practice as practice by a teacher or the performers themselves. And both definitions have been used ever since to make arguments for deliberate practice. So for example Ericsson has argued against my work saying that I didn’t actually measure deliberate practice because it wasn’t designed by a teacher, so I thought that this replication could be a good way to test both of these definitions to see if they’re really very different, if they’re really capturing the same variance, and just sort of try to put to rest this debate about what deliberate practice.
[ Back to topics ]
Watkins: Brooke’s article points out that Ericsson’s original publication included bold claims such as it’s impossible for an individual with less accumulated practice at some age to catch up with the best individuals. If that were the case then only a single example of evidence contradicting impossibility is necessary to falsify it. Brooke shared her thoughts on the issue of why such strongly conclusions may capture headlines, but not the nuances of science.
Macnamara: Well you know saying that you’ve proved anything is banned. I teach our research methods class to undergraduates, so they’re not allowed to say that the hypothesis was proved or anything like that. And it it’s really hard, it’s sort of the language that you hear in the news and the media, and people want to say that when they mean evidence for. And I think a lot of the language should have been much more tempered and maybe that’s something that’s changing in the field as well. So for example, later in 2014 when Ericsson was defending is his view he noted that the 95% confidence interval around the mean hours of practice alone age 18 for the best violinists was under 3000 hours to almost 12,000 hours. And so interestingly, the mean of the good violinists at that time was about 5,000 hours so it falls right in that 95% confidence interval range. So it’s perhaps surprising that they found significant differences between the best and good violinists at the time but I think now that we’re giving effect sizes and confidence intervals things like that I think reviewers would now say well you probably need to temper these claims and that’s something that’s that’s changing unfortunately big claims or what gets attention. Things that give you attention make you more well-known, your work more well known, you can publish more, you know there are lots of benefits to that. So there’s there’s an incentive structure to make big claims and I believe that is also changing with open science, but slowly. When a scientist engages in open science and chooses to pre-register their study their data collection and analysis plans are specified in advance of gathering data. And they typically commit to those plans indelibly on public websites such as OSF, do pre-registration to separate exploratory data analysis from confirmatory hypothesis testing ones.
Leigh: As Brooke describes after this short break
Leigh: Here again is Brooke Macnamara.
Macnamara: That process was really enlightening because it really makes you think through every little detail and aspect, well at least the the format that I used, there are several options for pre-registration and some are relatively simple and straightforward, you just answer these seven questions. The one I chose was very detailed and so it made me think in very detailed terms how to go through. All of this, and the original paper is actually very very long, and there’s lots and lots of analyses that I also conducted as part of the replication. They are just in the supplemental materials because even in the original study it’s sort of unclear why there’s so much detail on fairly irrelevant factors that don’t really say anything. For example there were analyses on napping and how frequent people napped between hours of 10:00 and 12:00 and 12:00 and 2:00. You know, I mean it’s just everything was analyzed over and over again, and there were some things where there were certainly over ten analyses conducted on a single factor. There were a lot that I didn’t want to do but for the sake of completeness you know I went through through all of these, but it it did make me think just sort of why all of these analyses on factors where there had no hypotheses. I mean unless it was just sort of exploratory and they did, I’m sort of unsure, but I learned that in designing future studies that I don’t want to have just so many factors that aren’t contributing to the theory that then you then have to deal with. And a main thing that I learned is just how to be really careful with this so I can’t tell you how many times I read through the paper. And sometimes there would be sort of semi conflicting information you to really kind of think through between what was in the methods to figure out what was correct and what they did. And so it was an interesting process it was a fairly difficult study to run, I think not so much because it’s a replication but because it’s a special population. So I spent about four years of recruitment efforts trying to get the sample so it made me really appreciate the hard work that was that was put in the original as well.
[ Back to topics ]
Watkins: As in the original study, Brooke’s replication asked violinist to recollect their practice habits since beginning the violin. Since recalling events that transpired along ago can be fraught with bias, Doug and I were interested in hearing how Brooke increased confidence in the accuracy of participants responses to such questions.
Macnamara: Part of what we did is, along with estimates of practice the retrospective estimates from when they first started playing the violin at times it was four or five, we also asked in a current typical week. How much do you do all of these activities and hours? And then we sent them home with these diary logs and asked them to fill them out every day and code them for activities. And so we looked to see whether the diary matched what their estimates were, what a real week looked like compared to what they said a real week looked like. And like Ericsson, at all we found that kind of everybody overestimated so relative to their diaries they believed that they practiced more than they actually did, but essentially everybody overestimates. But importantly that didn’t interact with groups because if it had been the case that you know the best violinists overestimate more than the less accomplished violinist, then you couldn’t make any conclusions about these retrospective estimates. But the idea is that they all overestimate about the same amount so these retrospective estimates are probably inflated, but presumably the relative difference is still in place you know it could be. At this point, since people now know about the 10,000 hour rule they could assume that they’re practicing more and that’s why they’re good depending on their beliefs. Since that’s the most heavily believed one but there’s not good ways to do this right so obviously a longitudinal study would be great, but it would be so hard to find a bunch of four-year-olds who are practicing the violin and try to have enough in the sample that some actually make it to a music conservatory. You know it’s just expertise research kind of has these difficulties associated with it so obviously a longitudinal study would be ideal, but the feasibility of that is pretty rough. So we have these not great measures that we use in place of them.
[ Back to topics ]
What’s different in Brooke’s replicaton
Leigh: Brooke’s study is a direct replication of Ericsson’s original research though it makes a few exceptions from his original methods. This led Ryan and I to wonder how Brooke and her co-authors decided what aspects of the study should be updated in which should remain the same.
Macnamara: There were some really interesting decision points. The double-blind, the analyses, those were sort of you know just part of the idea of this study, it was that well would this replicate if we kind of fix these things. And then there were a few other things that were just sort of differences between now and 1993. So for example, in the original study they mailed these carbon copy diary sheets, with these self-addressed stamped envelopes to the participants for them to mail back everyday, you know, and we just used Excel and emailed them. You know things like that, we thought well we’re just gonna take advantage of the new technology and we did not believe that anybody would mail us back carbon copies of their diary sheets. You know we’re gonna make this easier. Things like that, we also changed a couple of the activities to rate. So in the original study they’re asked to to rate a number of musical activities for relevance, for improvement on the violin, enjoyment and effort, and then a number of everyday activities like household chores and schoolwork, and things like that. But we also added checking email and social media because that’s something that we know people spend their time every day that did not exist really in 1993. And so we felt that we needed to add that but the main difference from the original study is that we added teacher design practice in the music category. So we first asked participants to estimate practice alone exactly as Ericsson et al. did, and then after they had done that we asked them to then estimate amount of time practicing activities that were designed by a teacher. And we generally found that they accounted for similar amounts of variance. The violinists rated practice alone is more relevant to improvement on the violin than teacher design practice, which I thought was interesting. And there was a significant difference in that, but there was not a significant difference in the amount of variance explained by either practice alone or teacher design practice. In both cases it was roughly 25 percent of the variance. That’s essentially all coming from the difference between the best and good violinists and the less accomplished violinists. So we did not find significant differences between the best and good violinists, in fact in all cases numerically although not significantly, the good violinists had accumulated more practice than the best violinists.
[ Back to topics ]
Watkins: Analysis of Variance, or ANOVA for short, is a statistical technique used in the original study to compare the number of music competitions engaged in by the best versus good violinists, excluding less accomplished violinist. Ericsson then ran a second ANOVA between the best and good violinists, grouped together compared against the less accomplished violinists. But ANOVAs were probably not appropriate for the data Ericsson collected for this measure as Brooke explains next.
Macnamara: He used two ANOVAs consistently and they were a little odd. They would compare the best and good violinists, they also used the full degrees of freedom. So the full sample degrees of freedom, which of course makes it a little bit easier to find a significant effect because you’re sort of telling the model that there’s more that it has more power than it does. So that was sort of one thing that we didn’t do in this replication. We conducted one analysis, see if there was a main effective group and they did plan comparisons to see if the pattern of results follows what was claimed. And then in the original study the second ANOVA that they ran was combining the best and good as one group and comparing them to the less accomplished violinists, so that’s sort of untraditional. So we didn’t want to to follow that because that’s increasing chances of finding erroneous significance, so they’ve run these two ANOVAs with these sort of slicing and dicing, you know combining and and uncombined, and then say hence there is complete correspondence between the skill level of the groups, and their average accumulation of practice time alone with the violin. That statement really doesn’t follow from the types of analyses that they did, so we wanted to test it in a way that actually tested that statement. So I think there are parts of this paper that is just sort of indicative of the time. So for example there are no effect sizes because that’s sort of newer that people are giving effect sizes and confidence intervals, and things like that. So that’s not something you would expect from a 1993 paper but the two ANOVAs, running them separately, I don’t think was standard at that time either though. So in terms of what we can do better I think having open data is probably the most important thing it’s now so much easier also to be really clear about your methods because you can just post additional materials on the Open Science framework or somewhere else I think there’s a lot of ways that we can be more open. Which I think is really helpful for the field
[ Back to topics ]
Leigh: Though neither the original study nor Brookes replication measured feedback on performance, it’s often associated with developing mastery and even expertise of a skill. Brooke talked with us about what current research suggests about its effect on practice.
Macnamara: Going back to the beginning of this conversation with operationalization, right, so feedback has often been part of the theoretical definition of deliberate practice but it’s really unclear about what’s meant by that. So sometimes it’s talked about in terms of, you know, very explicit feedback from a teacher and then other times it’s talked about in terms of environmental feedback, right. So if you’re a surgeon you get the haptic feedback of the cut and you can see visually if your stitches are clear. So those are two very different types of feedback, it’s unclear what is important. I mean, I think, generally, getting feedback tends to be important. There’s some interesting studies looking at immediate feedback versus delayed feedback and it seems in you know many cases the media is really helpful but in other cases delayed feedback can be more helpful. If the the student or the learner has to think through things themselves as opposed to just hitting a button and getting the information, and then they usually don’t process it quite as well, but even what’s meant by feedback and and how it works probably also depends on the task, the person, and how knowledgeable they are. So I think there’s always so many factors that are left out, so something I’ve said about deliberate practice and and a few other theories, is I think it falls prey to the single cause fallacy. Human performance is complex a single factor explaining it all is probably not going to be accurate, because something simple usually doesn’t explain something complex. It’s not even just what kind of practice is best, it’s what kind of practice is best for this person for this task for their performance level for their age and for their cognitive profile and everything else. So I think all of these things interact and make it really complicated. It’s not is it nature is it nurture, it’s more, well there are genetic factors, environmental factors, their interactions, that kind of all need to be taken into account to really try to get a better handle on learning and expertise.
[ Back to topics ]
Pushback from luminaries
Watkins: Since Brooke’s study calls into question the research at the foundation of what has become a lucrative expertise building industry over the past thirty years, it’s of little surprise that some don’t appreciate her line of research. Since her initial meta-analysis through today, she’s encountered many vocal opponents to her findings and conclusion. So we closed out our conversation by asking her what it’s like to be embroiled in such debates with luminaries from her field.
Macnamara: For years, what would initially happen is I would read that paper and I get this feeling of dread and go ‘oh my gosh, oh I can’t believe we didn’t think of this or do this or whatever’ because some people are very good writers and and convincing. And then I would read it again a bit more carefully and go ‘what, wait a minute this is completely antithetical to what you have said before’. You know, so much of it is writing and how well you’re a persuasive writer. So much of what can be believed, as if you already believe it and want to believe it; then you have confirmation bias. And if someone is good at writing it then you just go ‘oh my gosh, yeah, they’re right and these other people are wrong’. And so to sort of avoid that I do have quite a number of chapters and commentaries and replies and things like that, and some of it is sort of trying to correct the record and bring up when there’s ambiguities in criteria. And when their moving goal posts and when terms are shifted to create different arguments. And so usually these commentaries are pretty easy to actually put together, and just sort of say well this is what we found based on criteria that you have put forth and when we use this other criteria, that you’ve also put forth, there’s no difference. And just spelling it out, I mean in that way it’s sort of nice to get to respond, because if people just think them to themselves and you never know; and that’s what everybody’s thinking, then you can’t kind of clarify your work. So I feel like it’s important because if I’m not doing it, then only the big names that you know are the ones that are willing to sort of go out there and talk about it, and that’s the only information people get.
[ Back to topics ]
Links to article, bonus audio and other materials
Leigh: That was Brooke Macnamara discussing her article The role of deliberate practice in expert performance: revisiting Ericsson, Krampe & Tesch-Römer (1993)”, which she co-authored with Megha Maitra and published in Royal Society Open Science on July 23rd 2019. You’ll find a link to their open access paper at parsingscience.org/e59, along with bonus audio and other materials we discussed during the episode.
Watkins: Interested in the latest developments in science, then sign up for our weekly roundup of the latest science news from across the disciplines at parsingscience.org/newsletter. Or if you would like to check out our first 56 issues just head over to parsingscience.org/news.
[ Back to topics ]
Preview of next episode
Leigh: Next time in episode 60 of parsing science will be joined by Michelle Hampson from Yale University. She’ll discuss her research suggesting the people suffering from obsessive-compulsive disorder and Tourette’s syndrome may benefit from real-time fMRI neurofeedback, both while inside a brain scanner as well as in the weeks following.
Hampson: You know if they had the same pattern as us where the subjects who responded to the intervention continued to respond and get better and better for a month. The time point they most need to sample is a month out, because that’s the time point of greatest power. So you’re investing millions of dollars into collecting data, but if they’re only sampling immediately afterwards they’re losing a humongous amount of their power and they could come up with a null result for an intervention that was actually effective.
Leigh: We hope that you’ll join us again.