Data Interpretation in L&D part III: Messing around with visuals

Disclaimer: This is NOT a guide! I’m sharing examples so you can recognize potential misleading data visualizations, not to create them;-)

This is part 3 on data interpretation in L&D. See here for part I and part II.

Lets Start with a test…

Question 1: What is the link between the the cover image of this post and the content?

Question 2: Have a look at the data visual below. Which of the 4 visuals provide an honest and accurate story?

I’ll answer question 1 at the end. For now, let’s focus on the second question.

Chart 1 presents the total learning hours per month between January and April. Everyone seeing this chart will be wowed with the amazing growth in learning activity in April that is almost 4 times the number of learning hours in March! There must have been an absolutely amazing learning campaign launched in that month to generate such a growth!

Chart 2 is a simple comparison between training A and B in evaluation results and we can clearly see that training A scores better than training B.

Chart 3 is a little bit more complicated as it displays the number of training titles within a specific range of duration; so how many titles in the learning catalogue have a duration under 15 minutes, how many between 15 and 30 minutes, how many between 30 and 60 minutes etc. It is a little bit of concerning chart if you are running a micro learning strategy as you can clearly see that the majority of training titles have a duration between 8 and 40 hours which is rather long!

Chart 4 provides the results of a survey amongst employees asking them what technical skills they want to learn in 2023. Data Analysis with 65% of the votes is a clear winner, followed (at a considerable distance) by Artificial Intelligence.

And the answer is…

What if I tell you that all charts have been manipulated to tell a specific story? And that the correct answer to question 2 is none of them? Would you be surprised?

Many of us will be. And sadly we’re exposed to these manipulations a lot through media, internet and even official government channels that all use data to support their version of a story. There is a reason that expressions like “if you torture the data long enough, it will confess to anything” (Ronald Coase) and “Lies, damned lies, and statistics” (Benjamin Disraeli). And it’s a reason I am writing this series on data interpretation in L&D in the first place: So that you can recognize the tricks that are being used and take that into consideration when you read any article, white paper, position paper that uses data.

It’s very sad to see the many ‘mistakes’ in L&D (one must always wonder if these are honest mistakes or decisions made with a clear purpose….). Only recently I flagged in incorrect data visualization shared on LinkedIn from a learning provider who displayed ‘how people want to learn new skills‘. At least one of the visuals was wrong as it displayed only a total of ~70% between the 4 given ways, leaving room for a possible fifth option that was not mentioned, but could have been the top contender!

Notice the middle bar chart on the left…. If you add all % using the scale below, you only come to ~70%. So where is the missing 30%? The bar chart in the bottom left also not add up to 100%. What is going on? Funny enough, the bar chart on the right adds up to way more than 100% That should also never be the case…

So let’s look a bit close at my examples and see what is wrong with each of them.

Chart 1: Manipulating the y-axis.

A very common trick to make fairly small difference look huge is to manipulate the y-axis of a bar chart (more accurately, the value axis as with horizontal bars it could be the x-axis that is manipulated!). In this example I’ve made the y-axis start at 3.000 learning hours rather than at zero. This causes the increased number of recorded learning hours in April to look like it’s almost 4x higher than March, while in reality it is ‘only’ a 37% increase if you read the labels correctly. Now a 37% increase for me is still pretty impressive! But it is far from the 400% increase that is suggested.

The typical response when commenting situations where the axis are manipulated (and trust me, you will recognize this a lot when you start to pay a bit more attention to all bar charts in newspapers, online media and TV) is that the chart is factually correct. It does not tell an untruth. And technically that is correct. It is just that very few people actually read labels in these cases and fully interpret the data through the bars and relative seize of the bars alone. Even more, a well designed dashboard or data visual should enable correct data interpretation with a minimum of labels or even none at all!

The correct way of displaying the data from chart 1 is simply to let the y-axis start at zero!

Chart 2: Using only a selection of the data

Chart 2 is an example I see a lot in L&D. We all want to display evaluation scores for our training to show that we do a good job. I always recommend to be very cautious with this as there is a major pitfall: In many cases the response rate of evaluations (unless you make it mandatory to complete) is very low. In the range of a few % of all people who completed the training, maybe up to 20%? You cannot draw the conclusion that a training was well received by just looking at the average evaluation scores without also looking at the response rate. If the response rate is low, the reliability of the average score is low as well due to the fact that you are missing most of the data (=responses). This is also related to the ‘survivorship bias’ I referred to in part I of this series: Always to any missing data into considerations.

In my opinion (and this is an assumption, not a fact!) people who are unfavorable towards a training (or anything for that matter) are less likely to comment. If you agree with that assumption and have data to back it up, a low response rate would actually be interpreted as a low evaluation score.

Using non-representative data

While the above is my assumption and should not blindly used in any analysis without a good think on it, concluding that your training was amazing based on just a few % respondents is for sure NOT the right conclusion. People sometimes bring the perspective to the table that even 5% could be considered a sufficiently large sample size for larger populations that allow the data to represent all participants. The sample size of 5% is correct (although I prefer 10% if possible), however, when we talk about samples representing the entire population, we really need a sample that is both sufficiently large, but also actually represents the entire population.

My hypothesis that people with a negative attitude towards a training are less likely to rate it or complete an evaluation does not come out of thin air. It is for sure not scientifically proven as far as I know, but there is evidence that suggests i might be right.

If you are not into statistics at all, you are more than welcome to skip this part and continue to read after the next image. If you still remember statistics 101…you might want to read on. A while back we did some statistical analysis on tens of thousands of course rating data points for a customer who wanted to assess the quality of their learning catalog. A course rating on a scale of 1-5 is the easiest way of getting some sort of evaluation data and therefore thought useful. The expected distribution of the course rating data with a representative sample of the full audience would be in the shape of a bell curve, a normal distribution as it’s called in statistics. See also the image below. A bell curve means that the number of more positive and more negative ratings are more of less equal; so around the same number of people evaluate the training better than the average as the number of people who rate it lower than the average. This has nothing to do with the average rating and whether it’s high of low. For both courses with a high and course with a lower rating you would normally expect this bell curve.

However, the data we analyzed (and this was data from all over the world, across a multitude of trainings and training types) instead showed a distribution that was heavily skewed towards the higher end of the scale. In some cases we even found only scores of 4 and 5 in specific data sets. In statistics, skewed data makes it more complex to perform all kinds of calculations on it, and when there is too much skewness in the data, you cannot simply use it as it to calculate the average as you would normally do.

So, always be very careful with any data analysis where there is only a limited supply of data (for whatever reason) and where the distribution of the data shows a high level of skew. It means for sure that you cannot use that data in the same way as you would use more balanced and distributed data. It possibly also suggests that you are missing data.

[source wikipedia] An bell curve or normal distribution like the curve in the middle is what you would expect when analyzing a large sample of course ratings. Typically the data however is skewed to the right (right image, called a negative skew). If this is the case, you cannot apply the same statistical calculations like an average rating without additional (complex) analysis. It is also an indication that you are possibly missing data as people with a more negative opinion of the training did not rate it.

For the non statisticians….

Without all the statistical justification and in case you skipped the above part, it’s sufficient to remember that any data that only represents a small fraction of the total population should always be evaluated and interpreted with care!

Different from the y-axis manipulation (that has a simple solution by always making sure the y-axis starts at zero) there is, I think, no easy way to correctly displaying partial data like evaluation data. You could for example create a threshold where you leave out all training that has less than a 25% response rate. But that could mean you potentially end up with very little data points. And I would argue that evaluation data is always informative and useful even when response rates are small. What I always recommend to do is, as a minimum, add the response rate to the dashboard to allow anybody who views it see what the response rate is and take that into consideration when exploring the data.

Chart 3: unequal comparisons

Chart 3 displays how many training titles have a duration that falls within a specified range. The ranges go from less than 15 minutes to more than 40 hours. The creation of the ranges is called binning. Binning is used when you have a long list of discrete values that would otherwise clog up your visual. Imagine we would display all available different values of ‘learning hours’ that we have in the system. We might look at 30 or more variations and that would make the chart unreadable or at the least very ugly like shown below.

The same data as chart 3 above, but without binning

The use of binning itself therefor is a common and useful practice. However, in chart 3 shown at question 2 at the start of this article, the binning has been somewhat arbitrary. This is something I see a lot in L&D. We tend to break up the hour into a few (typically 2 or 4) smaller parts, and as the training duration increases, we make the ‘bins’ much longer. Consider that the one-but-final bin 8-40 hours covers a whopping 32 hours, compared to the first bin that only covers 15 minutes, that is a whopping 128 times smaller! And like with the y-axis manipulation from chart 1, chart 3 is technically not incorrect. However as said before, many if not most people don’t really look at the x-axis values. And when interpreting the data, most people will not take into consideration the fact that some bins are 128 times larger than others. They will see a massive amount of training that has a very long duration and only a very small number that has a short duration. While this is in reality not the case (as you can already see that from the above image).

The solution is to always comply with rule that your bins must be of equal size. Even if that means that you end up with a lot of bins, or you have fairly large bins. In the image below that shows how charts 1 to 4 should look like, i used bins of 1 hour. This does limits our ability to look at training with a duration less than 1 hour. However, I have solved that by providing a tooltip that specifically looks at all titles with a duration of less than an hour when you ‘hover over’ the “<1 hr” bar in the chart. This will provide immediate additional information on that more detailed distribution. However with 1 mistake…..can you spot it?

Chart 4: Percentages not adding up

Chart 4 is possibly my favorite example. It’s also the one that I see almost on a daily basis, so it looks like it is not just me who loves messing around with percentages! Unfortunately this means that many people are either not taking the right data visualization rules into account, or, even worse, purposefully mislead. There are so many variations on ‘not getting percentages right’ that I sometimes feel I could turn it into a movie. Not calling it “A million ways to die in the west” (a great feel good movie by the way!), but “a million ways to get your percentages wrong. By accident or on purpose.” Or something like that….

A root cause of having over a 100% in your data visual is the use of questions that allow multiple answers. I used the example of the question: “what tech skill do you want to develop in 2023?” And gave people list of options to choose from. You could select one, you could select all of them. This method of data collection can be rather tricky and I would not recommend to use it for any other purpose than making it into an example. Because you can select multiple answers, you basically open the doors for misleading visuals. And the most common misleading visuals is displaying the % of people who provided a given answer.

Arguably the most famous example of misuse of percentages is Colgate. They used to claim in commercials that “80% of dentists recommends Colgate”. That until the Advertising Standards Authority in the UK in 2007 decided that Colgate was breaching the “Code of Non-broadcast Advertising, Sales Promotion and Direct Marketing”, reasoning that the advertisement suggested that “80% of dentists recommended Colgate over and above other brands and the remaining 20% of dentists would recommend different brands rather than Colgate”. This while in reality dentist each could mention multiple brands in their answer to an annual survey and Colgate was not even the brand that was mentioned the most!

These type of datasets could be too easily used to artificially enlarge or reduce the gap between options, especially and most notoriously when used in combination of y-axis manipulation, or (please don’t!) in combination with a pie chart!

In my example, it looks like data analytics is almost twice as popular as AI, the next skill in line. In reality this is not the case. A person could for example have indicated data analytics as last and therefor least important skill on his or her list of 4 or 5 skills, while another person could have voted for data analytics only. It’s a fact that not all respondents will provide the exact same number of responses. As said earlier, some might only select one, some might select all.

Even worse, because we cannot add the numbers to a 100%, we have no way of telling if data was omitted. Maybe there was a skill that scored higher than the 65% of learning analytics, but I decided to simply leave it out as it did not suit my narrative, like Colgate did. In that sense, the example of Future Learn shared earlier in the article is a rare example of how to not use percentages, as it contains graphs that contain less than 100% (on the left) as well as more than 100% (on the right).

The one and only reason to use percentages is to provide relative data and insights (as opposed to absolute data that shows the number): the portion of a variable against ‘the whole’, i.e. the 100%. If you do not know what the 100% is, you loose that vital piece of information that make reporting in % worthwhile!

The way to handle percentages is simply to ensure they always add up to 100%! In this case of questions with multiple answers, you can do that by taking the sum of all answers as the 100% and simply establish the % of all answers (not all people!) for each individual option. If you display the data like that, the skills of ‘data analytics’ will (naturally) still comes on top, but the difference with the next skill (AI) is somewhat smaller, and (most importantly) you can correctly display a grouped category “other” than contains all skills that did not make the cut to add all up to a 100%.

Better even would be to change the question and ask people to name their top three ranked from 1 (=most important), to 3 (=least important). That way you have all the data you need to do all the analytics you want.

Correcting the 4 most common misleading data visualization techniques

The examples shares are demonstrations of what I believe are the 4 most common examples of misleading data visualization, or honest mistakes made by people trying to convey a story using data visualization.

Shifting the value axis (mostly the y-axis)
Not mentioning (small) sample sizes with partial datasets
Not having equal bins
Not having percentages that add to a 100%

So if I would recreate the dashboard page from the introduction, I would produce something like this:

With these visuals I would be able to derive the following (more accurate) insights:

Chart 1: We saw a very healthy increase of ~25% in consumed learning hours in the month of April. Still pretty good and impressive! But nowhere near a triple growth.

Chart 2: Training A received a higher average evaluation score compared to Training B. however as the response rate was only 31% we cannot conclude that it’s a better received training.

Note: I actually constructed the dataset in such a way that all people who rated Training A, also rated Training B, but with a higher average rating! That would suggest that the evaluation score of Training A with a 100% response rate would be lower. This could be a possible follow up analysis as part of a more in debt investigation…

Chart 3: Training titles with a duration of less than 1 hour and between 1-2 hours are well represented in our catalogue, as are training titles with a duration of 3-4 hours (half a day) and 6-7 hours (a full day excluding breaks). Any conclusion driven from these numbers depend on what your strategy is…

Chart 4: Data Analysis is still the most popular ‘skill to work on in 2023’ followed by AI. However, there is an even bigger ‘bucket’ of “other” skills that makes the #1 Data Analysis sit in a completely different context. It’s way less dominating than before and even leaves room for a skill that could have received more than the 19% that ‘data analysis’ received.

Most likely it contains a lot of different other skills that each received less than 6% of the votes. And that is perfectly fine. Skills taxonomies are after all rather huge these days. But by including the ‘other’ category, the percentages add up to 100%. Which not just tell a story that is much more clear and accurate.

More importantly, it provides a story that is more credible: By making sure the percentages add up to 100% you show that nothing is left out.

On the slice of pie…

You must be wondering by now what the slice of pie has to do with all of these.

Well…

There’s an infamous data visualization called pie charts. Pie charts are among the most misused, misleading and incorrectly used data visualization techniques out there.

To quote Bernard Marr, a world leading business, tech and data leader: “The reason we put charts and graphs in our reports and presentations is to present information in a way that’s easier to understand. In general, charts and graphs should make data easier to understand, make it easier to compare different data sets, and do it all without increasing the complexity of what’s being presented. Pie charts stink at that.” (bernardmarr.com)

Other experts like Stephen Few advise to “save the pies for desert“.

You only have to google “why pie charts are bad” to see lists and lists of examples. One of the most famous is an illustration used by Fox in the 2012 elections shown below. It ‘nicely’ shows the major objection to using pie charts. The fact that surface area’s are distorted depending on how you position the chart making it hard if not impossible to estimate and compare the sizes of each segment. You will be forced to add the numbers, otherwise all of us will think the 60% green segment for Romney is far bigger than the 70% red segment for Palin. Few rightfully explains that any data visualization where you are basically forced to add numbers and labels to make people understand what they are seeing completely missed the point of being a self explanatory visual and can much better be expressed simply in a table form. Fox ‘kindly’ also included a second often made mistake which is the percentages not adding up to a 100% (sound familiar, right?).

If So for many, the pie chart is the ultimate manifestation of how data is being misused to tell a (false) story. Unfortunately, many people still insist on using pie charts because they are so easy to create. My advise: when you see a pie chart, be suspicious…

Conclusion

Pie charts could be the popular choice when people by mistake or on purpose tell a misleading story with data. But many other techniques exist and I hope I was able to explain how data visualizations can easily be misread. And more important, how you can make sure that all visuals that you design and create tell a true story!

It is sad to see that respected media, and respected brands (also in L&D) make these mistakes. And whether they were honest mistakes or not, in a world where data is more important then ever before, it becomes a crucial skill to recognize flawed data visualizations. At best a badly designed visual distracts you from the key message. At worst it’s designed to make you think something different from what a correct analysis would show.

While I was finishing this article, I stumbled upon the book “Calling Bullshit” from University of Washington professors Carl Bergstrom and Jevin West. They also see the challenge of a data-driven “world awash in bullshit“:

Politicians are unconstrained by facts. Science is conducted by press release. Higher education rewards bullshit over analytic thought. Startup culture elevates bullshit to high art. Advertisers wink conspiratorially and invite us to join them in seeing through all the bullshit — and take advantage of our lowered guard to bombard us with bullshit of the second order. The majority of administrative activity, whether in private business or the public sphere, seems to be little more than a sophisticated exercise in the combinatorial reassembly of bullshit.
We’re sick of it. It’s time to do something, and as educators, one constructive thing we know how to do is to teach people. So, the aim of this course is to help students navigate the bullshit-rich modern environment by identifying bullshit, seeing through it, and combating it with effective analysis and argument.
Calling Bullshit: The Art of Skepticism in a Data-Driven World (2020; Carl T. Bergstrom & Jevin D. West)

While the book does not cover any examples from L&D or education, I was glad to notice that some of the examples cover the exact same situations that I refer to in this blogpost and earlier parts of the series on (correct) data interpretation in L&D.

I’ve made a pledge to start calling out ‘bullshit’ whenever I come across it online (especially LinkedIn). Not to name and shame, that is not who I am, but to educate. If we want others, especially the c-suite, to take L&D serious we must eliminate bullshit in our industry. And that is no easy thing to do as, also called out by Bergstrom and West, “creating bullshit is easy and fast, correcting bullshit takes a lot of time and effort”. But for me it’s a cause worthy of that effort.