
Mark Algee-Hewitt has a lot to say on this issue of combining data sets. As you can see, they rise in very similar fashion, though with a noticeable break where two data sets are joined together. Here I simply calculated the number of quotation mark (pairs) per novel and used bootstrapping to estimate the decade mean. I wondered whether this might be behind this change.īelow you see a second graph with the percentage of quotation marks per decade. Dialogue is notably “simpler” in structure with considerably shorter sentences, and potentially shorter words to capture spoken language. One question that immediately came to mind was the extent to which these scores are being driven by an increase in dialogue. Nevertheless, despite this the overall average is clearly moving up in significant ways. What this masks is a very high variability at the passage level. Error bars give you some idea of the variability around the mean per decade. Then for every decade I use a bootstrapping process to estimate the mean reading ease for that decade. The calculations are made by taking 20 sample passages of 15-sentences from each novel and calculating the Flesch reading ease for every passage. The higher the value the more “readable” (i.e. These novels are drawn from the Stanford Literary Lab collection and Chicago Text Lab. Below you see a plot of the mean readability score per decade for a sample of ca. The question that I began to ask was, have novels as well? W = # words, St = # sentences, Sy = # syllablesĪccording to Flesch’s measure, Rudyard Kipling’s The Jungle Book has a higher readability score (87.5) than James Joyce’s Ulysses (81.0). Presidential inaugural speeches have been getting more readable over time. Flesch reduced this insight into a single predictive, and somewhat bizarre formula: The longer a book’s sentences and the more long words it uses, the more difficult readers will likely find it. “readable”), the two most powerful predictors are a combination of sentence and word length. While there are many factors behind what makes a book or story comprehensible (i.e. Flesch’s insight, which was based on numerous surveys and studies of adult readers, was simple. government began to invest more heavily in adult education during the Great Depression. The study of “readability” emerged as a full-fledged science in the 1930s when the U.S. He ended up as a student in Lyman Bryson’s Readability Lab at Columbia University.

Today, there are over 30 such measures.įlesch was a Viennese immigrant who fled Austria from the Nazis and came to the U.S. They date back to the work of Rudolf Flesch, who developed the “Flesch Reading Ease” metric. They offer a very straightforward way of measuring textual difficulty, usually consisting of some ratio of sentence and word length.

I’ve been experimenting with using readability metrics lately (code for the below is here).
