SBR: Distributions, deviations, ranges, oh my!Andrea Hartzell
by Ted Carter, firstname.lastname@example.org
Recently, I shared an article on averages and how they can be misleading. This article will expand on that and talk about how the nature of the data being represented can have a huge impact on the averages being discussed.
Previously, we talked about the mean (the mathematical average calculated by adding up all the numbers and dividing by the count of numbers), the median (the middle number in a series of numbers), and the mode (the most commonly occurring number). The things we will talk about today could impact all three of these types of averages, but we are going to focus on the mean, since it is the most commonly reported average calculation.
When you see an average, it is natural to assume that this “middle” number has numbers arranged around it in some consistent fashion. This arrangement of numbers is called a “distribution,” and thinking that this distribution is consistent means we are assuming it is a “normal distribution.”
This is what a normal distribution looks like:
This one is based on hypothetical student scores with a mean of 52.4.
The normal distribution is sometimes called the bell curve, because as you can see, the bars resemble a bell – tall in the middle, then tapering on the sides.
But what if the distribution of your scores is not normal? Take a look at this chart:
This one still has a mean of 52.4, but as you can see the distribution of scores is much different, with a higher frequency of high scores. This one would be considered “negatively skewed” because the “tail” on the lower side is longer.
The one below, still with a mean of 52.4, is “positively skewed” because more scores fall in the lower half of the distribution:
So, when looking at reported averages, it is important to ask questions about how the numbers are distributed. The three charts above represent very different sets of student scores, but if you are only looking at the average (mean), you would never see these differences.
Along with thinking about how the scores are distributed on a curve, it is also important to think about the average amount by which each score differs from the reported average (mean). This is called the standard deviation. The standard deviation tells you how much variation there is in your numbers.
In the examples provided above, each with a mean of 52.4, the normal distribution has a standard deviation of 10.9, meaning each score differed from the mean an average of 10.9 points. The negatively skewed distribution, on the other hand, has a standard deviation of 22.8, and the positively skewed distribution has a standard deviation of 21.9, meaning on average these two sets of numbers differed more from the mean than the scores in the normal distribution.
Higher standard deviations indicate the average (mean) reported is less representative of any one number in the sample.
The third thing to keep in mind is the total range of scores. The range is the distance between the highest and lowest numbers in your sample. Again using our distributions, the range for the normal distribution is 29 to 73, 9 to 83 for the negatively skewed, and 21 to 95 for the positively skewed sets of numbers. In other words, our normal distribution represents a tighter range of scores than the other two.
This is not necessarily the case for all normal distributions, however. Consider the chart below left. This one is still normally distributed, and still has a mean of 52.4, but the range is 5 to 100, meaning the scores are much more spread out than they are in the first normal distribution we looked at. And because the range is longer, the standard deviation is also higher – it is 26.0 for this distribution compared to 10.9 for the other normal distribution.
Though I have presented each of these three notions separately, the distribution, standard deviation, and range all interact, and all three need to be considered when looking at reported averages.
Charts and graphs versus tables
One final note on the topic of charts and tables. It is easy to manipulate data sets to get a chart to look like you want it, like I have for this article. When being presented data in a chart or graph, it is always a good idea to see if you can also get to the data in tabular format. Charts and graphs can seem more intuitive, but they are usually created with a specific point in mind and can be misleading. Therefore, here is the data used for the four frequency charts.