Data Strategy – A Roadmap for Successful Implementation of Data Science

The maturation of the Data Science field has brought the realization of the importance of having a well-grounded data strategy. Why? The spell is broken: big investments are not enough for success in Data Science. The challenges in becoming a data-driven company, let alone even being able to harness the power of data and transform it into measurable outcomes, are well documented.

For example, in a survey from NewVantage, executives from various major companies were asked about their perceptions of Artificial Intelligence investments and outcomes. Their answers are striking: 77% of them reported finding the adoption of Big Data or Artificial Intelligence initiatives very challenging. There appears to be a great dissonance between their perceived urgency on keeping or increasing investments in the AI field (88% see this matter as urgent and 92% report that their companies are accelerating their AI investments); and the actual results that these investments have brought. 53% report that their companies do not see data as a business asset and 52% do not see their companies as real competitors on data or data analytics. There is a clear trend, fewer firms dare to identify themselves as data-driven: 37% in 2017 vs 32.4% in 2018 and only 31% in 2019.

What are the perceived reasons for this lack of success? 93% of the surveyed executives point out that problems with people and processes are one of the main obstacles. Data has not become a part of the culture of the company and this leads to resistance and lack of business adoption. Moreover, companies are still struggling with the basics that allow data analysis to be possible, for example, data (extraction, handling and ensuring quality), deciding on the right methods or ensuring the right technological infrastructure.

This, however, does not need to lead to despair. On the contrary, by changing perspective, we can focus on what the other 30 percent of the companies are doing right, the ones that are harnessing the power of data. We have a long experience working with both big and small companies at varying levels of development in their data strategy. In our experience, the challenge of creating a successful data analytics implementation can be tackled with a clear data strategy framework that addresses the following points:

Setting flexible but with clear objectives motivated by business strategy
Clarity about all data-related responsibilities (funding, sponsorship, leadership)
Right understanding of the process of Data Science implementation
Appropriate technological infrastructure that allows an efficient implementation of models, experiments, reporting systems.
Right set-up for building a Data Science Team which means both determining the best place to seed the analytic activities within the company and also the right structure and mix of roles and responsibilities inside the Data Team

We want to make this guide truly useful, so we developed it as a two-part series. In this first part, we explore the first 4 points mentioned above. In the second part of this data strategy series, we dive deep into all the considerations needed to build the right data science team.

Data Strategy – It All Starts with the Right Motivation

Data Science and Artificial Intelligence have gone mainstream. Although relatively few truly understand what they encompass, everyone has heard of them and everyone wants to be part of them. Companies are scared of being left out, and a lot have started Data Science Teams without a clear view of what their objectives should be. AI has also a certain status. It is not uncommon to see cases of companies that make big investments on Data Science Teams just to please investors. These cases never end well.

A successful data strategy always starts with an alignment to concrete business problems. At its minimum, there should be a clear understanding of the causal relationship between what the companies seek as outcomes and the right way of measuring it. Before starting any data project, it should be clear how it will contribute to core business objectives and how success will be concretely defined in each case.

The right motivation should be accompanied by a deep understanding that choosing to set your company in the data path means deep change. In particular, there should be willingness to accept new processes for decision-making. Again, here there seems to be some dissonance between wishes and reality. In a survey from Accenture, (link: https://www.accenture.com/us-en/~/media/accenture/conversion-assets/dotcom/documents/global/pdf/industries_2/accenture-building-analytics-driven-organization.pdf) 62% of the companies that participated stated that they consider data-driven decision-making as quicker or more effective, however, only 25% reported relying upon data insights in the daily business.

The change needed is big, but taking it step by step is usually a good idea. This means that more than trying to change everything at once, a lot of times it makes sense to start by identifying specific departments, areas or projects in which data-driven transformation is easier and then, by capitalizing on their success, start spreading the data-culture. People will tend to be more open if they see concrete cases of success where the benefits of data-driven decisions were quantified and made visible.

Clarity in data strategy framework – definition of responsibilities

Making the transition into having data truly have an impact on business outcomes requires more than hiring a given number of Data Scientists and grouping them into a Data Science Team. Work from behind the scenes is needed. That is, the data strategy framework needs to define some crucial responsibilities that, although not involved in the process of data analysis per se, allow to build a stable basis for it:

Leadership refers to defining the position(s) within the company that will be responsible for the implementation of the vision defined by the data strategy. One of the main responsibilities of this role is to promote the culture of data and hold people accountable for the results. This person will also be responsible for setting the guidelines concerning data governance and ethics (privacy concerns, data security). In big companies, there is a growing tendency to create a special data job description for this role, who is usually a C-level role. With titles like “Chief Analytics Officer” or “Chief Data Officer”. In smaller companies, this role is usually taken by the senior Data Scientist / Data Analyst.

Sponsorship is a key role that often gets overseen. This role is outside the Data Science team but spreads and cheers for the adoption of the data strategy vision. It complements the leadership role in the sense that, from an outside perspective, articulates the benefits of adopting a data-driven decision-making perspective. Usually is someone with enough seniority in the company.

Funding is not a role that is necessarily embodied by a person. However, any complete data strategy needs a definition of how the Data Analysis and Data Science efforts are going to be financed. The structure of the departments within a company will largely determine how the financing is conceived. For models in which Data Science Teams are centralized and a unit in itself, the main funding model is to have direct funding from the enterprise. In other cases, where each department makes its analytics efforts, their costs tend to be included within the department’s budget.

The definition of responsibilities also includes a clear strategy on the way that the analytics activities are going to be nested within the company. A detailed description of the different models is developed in the second part (link) of the series. But it is important to understand that the way that the way the Data Team is organized within the company – whether for example it is centralized or spread out within the different departments and business units – is going to have a huge impact on the dynamics it has and in the satisfaction of both the Data Scientists themselves and all the people involved. There is no unique right way of organizing data analysis activities. In each case, the characteristics of the company and its dynamics will play a huge role.

Understanding the Data Science Process

A fundamental part of any data strategy is an understanding that any Data Science project has a life cycle and how this will be implemented within the company. The original model that dealt with the life cycle of Data Analysis is the Cross Industry Standard Process for Data Mining (CRISP-DM). In this original model developed in the 90s, the idea was to define 6 different phases of data analytics projects: 1) Business Understanding, 2) Data Understanding, 3) Data Preparation, 4) Modeling, 5) Evaluation and 6) Deployment. This is a very simple model, but for a lot of cases, it is enough.

However, in the case of more complicated applications of Data Science – in particular when dealing with Machine Learning Projects – the CRISP-DM model is not enough. Microsoft developed the Team Data Science Process (TDSP). Like its predecessor, it conceives any Data Science project lifecycle in different phases: 1) Business Understanding, 2) Data Acquisition and Understanding (to emphasize that data acquisition has gotten more complex and important than before), 3) Modeling, 4) Deployment and 5) Customer Acceptance.

For a company that is starting with a Data Analytics implementation, it is advisable to start with simple projects that can be achieved in a foreseeable amount of time. Data projects have an iterative nature, so it is especially important at the beginning to be able to quickly identify problems of any kind: in the definition of your data strategy, the definition of the problem to solve, its possible (dis)alignment to business problems, issues in getting data or problems with its quality, etc. After a company has gathered some experience, it is the more suited to tackle more complex applications of Data Science.

Data Technology and Data Framework – Having the right infrastructure

This is the one point that is mostly misunderstood by business stakeholders, but that is quite clear by (sadly often frustrated) Data Scientists. Data Science Models are conceived to be very iterative by nature. This means that your data infrastructure needs to be agile, to support high failure rates. Data Scientists enjoy spending time modeling and dealing with data, but if their efforts are slowed down when a company does not have the right data infrastructure in place; they waste productivity and precious talent. For companies handling big amounts of data, very strict organization is needed since a lot of data projects will be directly tested in the production environment – the costs of keeping a development environment become prohibitive when data amounts increase – In this case, this clarity also means that there is control over a multiplicity of simultaneous processes where the user experience is never neglected.

Within this view, one key point that the data strategy must prioritize is finding the best ways to ensure data quality. The most complex Data Science algorithms will suffer if the quality of the data is not good. On the other hand, very simple methods will result in very reliable results if the data is accurate and trustworthy. This simple idea might give a key competitive advantage over competitors, so it is worth spending time finding ways to optimize data quality.

Roadmap to a successful data strategy – A checklist

In this first part of the series, we presented the basic framework for a successful data strategy. To make this knowledge more useful, we conclude with a checklist that should help you define the data strategy that is adequate for your company. Especially for beginners, you do not need to answer each question in detail. Perfecting your data strategy is a process of trial and error. However, all these points have proven very important for success. It is therefore very advisable to keep them in mind.

Definition of the motivations for your data strategy.
Who owns the leadership? What exactly are the responsibilities assigned to this role?
Definition of data governance and ethics concerning data (privacy and security of data)
Who owns the sponsorship role? What exactly are the responsibilities assigned to this role?
How will Analytics activities be nested in the company? Within departments or as a centralized team?
How will the supply and demand for analytics activities be handled and prioritized?
How is the funding of your Data Activities ensured?
Design of your Data Science Process: adaptations of CRISP-DM / TDSP? How do you track your learnings in each iteration?
Design of your data framework: is it scalable, is it flexible, is it feasible, is it in coherence with the volume of data you have and expect?
How will your company optimize data variety and quality?

We are specialists in finding the right data talent according to your data strategy. Contact us for more information.