whatsapptwitterteamslinkedinfacebookworkplace

Blog post

WHY DATA SCIENCE NEEDS BOTH MACHINE LEARNING AND CAUSAL DATA ANALYSIS

Machine learning answers the question what is a business target (sales, number of customers, churn of customers) going to be from past data on inputs and business targets. The machine learning algorithms find very exactly any associations between inputs and business targets. We get very precise and complex functions, mapping inputs to business targets. However, these types of questions are only a small part of business decision making. A large part of business decision making are what if questions. What if we increase one of the inputs e.g. spent more on marketing, are we going to improve on important business targets like higher sales from existing customers or attracting more new customers. To those questions machine learning is typically not the right answer. Following illustration of why data science needs both machine learning and causal data analysis. Your goal is to create a personalized advertisement algorithm to better target your customers. You begin collecting past data about product sales, marketing efforts and your customer base. Based on this data you train a machine learning algorithm that sends personalized advertisements to each customer, which should maximize the sales. After developing, you show your results to your business colleagues. They are somewhat skeptical and challenge you to show that your approach outperforms their business rules.

Causal data analysis offers a solution to test the performance of the two approaches against each other. You set up a random assignment of customers to a treatment group, which receives the personalized advertisement, or into a control group, which is targeted according to the old business rules. After comparing the sales of those two before and after introducing the new personalization algorithm in the treatment group, you can confidently point out to business the benefits of your algorithm.

But which techniques can data science use to answer what if questions for business decision making. In my illustration, I have already talked about random experiments, but many business situations experiments are either not possible or to expensive. Alternative techniques from computer science like casual graph analysis or techniques from the social sciences like quasi natural experiment approaches (instrumental variables, discontinuity design, difference-in-difference) can help in those situations to gain insights into causal effects from non-experimental data. Causal data analysis shifts one more important aspect. Not only does it answer what if questions instead of what questions, it also fosters a closer collaboration between business and data science. E.g. in a causal graph analysis a data scientist encodes a causal graph, which represents the business knowledge about the question, to estimates the effects of key inputs on targets. In quasi natural experiments, a data scientist uses sources of external variation of key inputs unrelated to the target, to understand the causal effect of key inputs on those targets. All techniques require a healthy dose of business knowledge and data knowledge.

I hope that in this short article I showed you the benefit to look at causal data analysis to answer what if questions. Causal data analysis is a collection of techniques like experiments, quasi natural experiments and causal graph analysis useful to know next to machine learning to help business discussion making. Any questions? Don’t hesitate to contact me: sergej.kaiser@keyrus.com

newsletter.svg

Never miss an insight

Stay updated on the latest articles, events, and more

Your email address is only used to send you the Keyrus newsletter. You can use the unsubscribe link in each newsletter sent at any time. Learn more about the management of your data and your rights.

Continue reading

Blog post

DEEP LEARNING FOR UNSTRUCTURED DATA? YES, YOU CAN !

August 9, 2021

Today, you take a picture of a paper bill and it gets suddenly processed by your banking app without you doing anything but confirming through Face Id recognition. Today, you speak to your microphone’s car while driving and it starts calling someone from your contact list. Today, you are probably old-fashioned if you never used google translate to process some sentence in another language, right?

Expert's opinion

UPGRADE OF A SEMARCHY XDM SOLUTION

August 9, 2021

In 2014, one of our clients (leading provider of packaging worldwide) sought a solution to bring structure to their customer base. They reached out to Keyrus who designed and developed the Customer Data Integration (CDI) tool.

Blog post

BE MORE EFFECTIVE THAN DOLLY PARTON ON OPEN BANKING

August 9, 2021

Appropriate action is a combination of marketing automation and of the personal touch by your frontline staff. Make it data driven.

Blog post

RISE OF THE CITIZEN DATA SCIENTIST

August 9, 2021

And why you still can’t replace your employees with software completely...

Blog post

DATA SCIENCE EXPLAINED BY BAKING CAKE

August 6, 2021

A few times I have been asked what it is I do exactly as a data-scientist, and managers and potentials data-scientists especially are interested in the common struggles we as data-scientists have to deal with. Just listing all issues we comes across would not result in an interesting read, so I will present it to you in the form of an analogy you’re all familiar with: baking cake.

Blog post

Data Visualization and Decision Making

August 6, 2021

“In 2019, one of the leading actors in the Oil Industry, was assessing different possibilities for the implementation of a mobile payment solution in their B2B segment. In order to be able to take data driven decisions, they reached out to Keyrus to set-up a data visualization solution.”

Blog post

DATA AND ANALYTICS: THE FOUR TYPES OF DATA ANALYTICS

August 6, 2021

You might have heard the saying Data is the new oil. This mainly refers to their potential value and in both cases this value is not merely in the raw product but rather results from the way it is processed. In this article we present a commonly used classification of data and analytics into descriptive, diagnostic, predictive, and prescriptive analytics. We’ll discuss each of these separately including some of the commonly used methods. Thereafter follows how these four types of data analytics relate to each other. First however we’ll explain what we exactly mean with data and analytics.

Blog post

THE HUMAN BEHIND THE DATA (PART 3 OF 3)

August 3, 2021

You’ve made it to the third and final part in our series ‘The human behind the data’. This will all be about (illusionary) patterns and the importance of some good old probability theory.

Blog post

THE HUMAN BEHIND THE DATA (PART 2 OF 3)

August 3, 2021

In part 1 of this series you read all about the difficulties to stay objective when selecting the data you want to work with. Simpson’s paradox, multicollinearity, Robinson’s paradox, survivorship bias, and cherry picking were all issues showing how important your decisions as a processor of data are. In this second part we’ll show that you yourself can become data which will seriously influence the outcome of your research, and we’ll also show how critical it is to choose the right measurement tool.

Blog post

THE HUMAN BEHIND THE DATA (PART 1 OF 3)

August 3, 2021

Humans are not very rational beings, even though we think we are. This impacts our personal as well as our professional lives, and the latter is of particular importance if you often work with data. If you work in business intelligence, data science, or any related field, people expect you to deliver them an objective truth. In this article we’ll discuss a lot of pitfalls that undermine this goal. Knowledge of these will help you avoid these mistakes and also to spot them in other people’s work. Many of the topics covered in this article involve pitfalls that can be classified as biases or paradoxes.