You cannot do Learning Analytics without Data

“Data science doesn’t make any sense without data” – Cassie Kozyrkov, Head of Decision Intelligence, Google

During years I have been working in learning analytics, I continue to be amazed by the many times I was in situations where people wanted to do learning analytics on elements they had no data on.

The first step of learning analytics is setting your business and learning objectives (please make them SMART), the second step is about making sure you can get the right data to be able to prove your program is working and reaching your objectives. When you do not have the right data, you cannot do the required analytics. It’s that simple. So here are a few pointers to be considered in the early stages of your program design that can help you to ensure, once the program is rolled out, you can get your hands on the data you need. A bit of a spoiler alert: as an L&D professional you are much more in a part of the solution than you might realize!

1. Availability

Your business and learning objectives define what data you need for analytics.

If you want to analyze skills, make sure you clearly identify the skills you want to build and link these (ideally with proficiency level) to your learning programs and experiences.
If you want to report on learning hours, the better you accurately define the learning hours for your programs and the more learning activity you capture, the more data you will have and the more accurate your analytics will be.
If you want to analyze your audience to understand their needs; the more data you have on them, the better your understanding of the audience needs will be.

One of the best examples of ‘critical’ data not really being available is the still widely spread practice of paper based evaluation forms. Great to have evaluation forms. Even better is to make them digital. But for sure the best thing you can do is making evaluations digital and record the outcomes in a single location in a consistent way that makes it possible to collate the results across programs and compare them to understand what works and what doesn’t.

Another example that is very hip and trendy’ is analytics on skills. A key question we ask ourselves a lot these days is what knowledge and skills our employees should have and what they are acquiring through all the available learning experiences. What keep surprising me is that is most L&D organizations do not consistently link their learning programs and experiences with skills and proficiency levels. Assuming you know what skills are handled in the program and at what level, it is a fairly simple matter of recording that information as data with your program in whatever learning platform your have (LMS or LXP). If we do not record this, we will not have a lot of data to do the analysis. I personally think this is one of the easier actions we as learning professionals can undertake ourselves to make sure we start capturing the data we need for skills analytics: link each learning experience (ideally as granular as possible), with a skills and proficiency level.

If there is too much data…no worries…good data analysts can isolate the data they need. But if you are not capturing the data necessary to do the analytics, you should not expect data scientists to give you any actionable insights, because they cannot. You will first have to figure out how to capture the data. Maybe update your ways of working, or create/update processes, getting new tools? Or simply enable your existing tool to capture more data. This brings us to the next pointer: ownership.

2. Ownership

Who is responsible for capturing and recording the data that is relevant for learning analytics? This is not necessary an easy question to ask. People data (name, function, location etc) is often owned by HR, or even better by the teamlead/manager or individual employees. Outdated people data can have a great impact on analytics. I’ve had plenty of examples where leaders came to me telling that the learning dashboard is wrong because it contained people who were no longer working for the company. In all these cases it turned out that the core HR system still had these people as employees. So its worthwhile agreeing who should then actually take the action to update these records. Business performance data is something we have no control over, so likely we take what we get. I always recommend to try (as early as possible) to consult and support the business as much as possible in how to capture the right data for performance analysis and monitoring. Naturally only when do are not (yet) doing this themselves. By considering the data aspect of your program early on, you have much better assurance that once you start the program, you will also start to generate the right learning business performance data.

There is however a significant blind spot within learning and development that I experience a lot….We are responsible for making sure we capture the right data on our learning programs; to design our programs in such a way that they generate the data we need for analytics; for accurate and high quality titles, objectives, audiences, levels, skills, topics, time investment, type and categories of learning. I’ve developed several data strategies for learning at large organizations and the question on ownership has always been a challenging one. I am not sure why. We want to produce high quality, high impact learning experiences wright? So why don’t we provide equally high quality and high impact metadata that accurately and understandably describes the experience?

3. Accessibility

Technically, text in a word file (learning needs analysis for example) is data. The same goes for excel lists in your local drive containing the results of course evaluations. However it will be hard to extract and process this data, or at least it will take a lot of time to collect and bring everything together. So rather than keeping data ‘locked’ in local files on local computers or ‘hidden’ in text, you should consider creating a centralized infrastructure to capture the data you need for your analysis. Your learning tools are one option, but also think of applications like teams/sharepoint or ServiceNow. By bringing all that data together in a single location in such a way that it can be easily structured, aggregated, analyzed and compared, you will all of a sudden be able to perform data analysis that you could have never done before!

A special mention must be given here to all your 3rd party content providers. We have so many to choose from these days, but I almost never see clear agreements on data exchange as part of contracts. And with that I mean clear agreements on (1) what data will be made available for you, (2) what level of granularity, (3) what format and (4) what frequency. Sure, they all provide reports themselves, but mostly only aggregated data on for example country or topic level. This level of reporting is interesting for the service itself, but of very little use once you want to start providing an overview across all internal and external training providers. In order to do this king of analytics, you must have access to data at the level of each individual and each activity. Ideally in a structure that allows easy integration with your other learning data. If you do not make contractual agreements on this, you could end up in a situation where you do not have access to the right data and this will prevent you from doing analytics.

4. Quality, Quality, Quality

Did I mention quality? The cost of poor data quality is estimated by IBM to be 3.1 trillion US$ per year in the US alone! These costs have several components

Poor data quality increase the risks of poor outcomes in decision making and automation:
- Poor data quality pretty much renders your Machine Learning and other AI tools useless. ML/AI drives completely on data. If the data is bad, outcomes driven by ML/AI will be bad. Think about ML based learning recommendations. The quality of these recommendations is only partially based on the quality of the algorithms. Even the best algorithm will no work well with poor data
- Poor data quality slows down decision making. Poor data will lead to poor statistics and analytics, and poor statistics and analytics can easily be recognized by leaders who then start to distrust the data. Distrust in data leads to more discussions, more ‘second opinions’ and more (manual) checks and re-checks.
Poor data quality requires data scientists to spend up to 80% of their time on cleaning the data. Learning analytics experts are very scarce. Its a new skillset that we need to build. Do you really want these few experts (provided that you can find them) to waste their expertise cleaning up poor quality data that is often the result of a lack of governance, processes and discipline? You might even run the risk that they leave the company or the team for an opportunity where they can do more real analytics!
Poor data quality reflects very bad on your customers. Would you buy online from a webshop that has procurement codes in the product titles? That is inconsistent with the currencies of prices? That has no clear filters that allow you to select specific product brands or categories? I for sure would not, and my guess is that you would not either…unless its the only webshop that has a product you really really want. So why, I wonder, do we structurally do such a poor job at describing our learning programs? Why do we use titles that do not make sense? Why do we record training durations in different formats? Why do we not link our training with a clear and user friendly taxonomy of topics? No wonder employees are getting lost and turn their back from your learning platforms.

2 final things to consider on data quality

First is that data quality is not only the result of us making sure we correctly and completely describe (and enter) training programs in our learning tools, but also how accurately we execute processes related to assigning training, registering participants and managing completions. LMS and LXP data is surprisingly useful to evaluate the quality of your core learning processes! You can use active data quality management not just for spotting data entry errors, but also to identify process errors and then fix them.

Secondly, I already mentioned that the way we design our learning programs has a huge impact on the data we are able to collect and use for analytics. I refer to this as data driven learning design. You can also use data to test and monitor the quality of your programs. SLT Consulting is working with an innovative start up company that is developing a script you can use to expand the data extracted from SCORM modules. This script will generate additional data around for example interactions that can help you to see where people drop off, where they are stuck and what questions are mostly answered incorrectly.

With all of the considerations above, it makes it worthwhile bringing data more to the heart of your learning organization and strategy. It even makes sense to build a data strategy where all of the above can be more formalized and that help you get the necessities for investing in data management and analytics capabilities so you can really start doing amazing things!