Over recent years technology has had a huge impact on data teams. As a result the data community is discussing what a modern data team should look like and how they can best use “the modern data stack”.
I was keen to explore this topic further by looking at;
- What has changed?
- What does this mean for data teams in the future?
What has changed?
Ease of Ingestion – It has never been easier for companies to Ingest their data due to new application and tools; these tools are also far more reliable. With the ease of the “Extract” and “Load” process within ELT it makes less sense for the most technical members of a data team “Data Engineers,” to continue to own this task.
Data Transformation – dbt is a total gamechanger for data transformation! Again, the need for data engineers is reduced as anyone with a solid grasp of SQL can write dbt models. Software engineers are increasingly involved in the “transform” stage of ELT, this diffusion is also resulting in less need for data engineers to be involved in ELT.
The rise of the semantic layer – Companies are using multiple and different data tools (BI and Visualisation) depending on the needs of the individual team(s) or the problem they are trying to solve. For this to work data functions are opting for version-controlled data that sync’s to the various tools.
Increased data literacy and desire for self-service – The increased use of data over recent years has resulted in every day employees seeing the benefits of engaging with analysis. This coupled with the ease of tools in the semantic layer has resulted in a desire for employees to self-service their data.
So what does this mean for data teams. What data roles will we have in the future and how should data teams interact with the business?
What data roles will we have in the future?
Historically data teams in their most simple form can be broken down into
- Data Engineers and Architects – responsible for building the data pipeline.
- Data Analysts – responsible for creating reports and dashboards
- Data Scientist – Responsible for advanced analytic problems.
If we look at each of these roles individually.
Data Engineers – Long term the Data Engineering role could become extinct!
We have already seen many forward-thinking companies introduce the newest data role – “Analytics Engineers.” With more and more companies seeing the benefits of dbt I see the analytics engineer role becoming ever more important and common.
The role of an analytics engineer is still open to debate but I currently see it as someone who can own the entire data stack. Setting up the data pipeline services (stitch or fivetrain) to ingest data, keeping the data warehouse tidy and writing complex data transactions in SQL using “dbt”. They are also involved with training and upskilling the wider data teams and stakeholders on the tools within the semantic layer to promote self-service.
We will also have people in roles focused around data Infrastructure and data platforming where a large component of the role will be building custom tools.
Data Analysts – With an increase in self-service data, analysis will be more involved in governance and best practice of data tools. We will also see analysts doing more analysis, driving real insight and benefit for organisation.
Data Science – With less requirement for statistical minds to be involved in day-to-day analysis. I believe the role of data scientist will be broken down into two areas;
Research Scientists – Looking at what future problems that can be solved and monetized.
Machine Learning Engineers – Focusing on productionising the models. I see MLE becoming far more specialised and product focused in 2023 and beyond.
How should data teams interact with the wider business?
It will become more important for data professionals “analytic engineers” to ensure software engineering teams understand the side effects of changes they are making, whether this is mutated or new data. This will require data professionals to be more strategic and business minded when interacting with the wider business.
Data Contracts will become the norm and more common. Many still find the term “data contracts” a polarising topic but I see an ever-increasing need for the data contracts to reduce some of the side effects of changes to projects and software.
Centrality of the data warehouse will require data teams to place far more emphasis on focusing on the quality of the data within the data warehouse.
As companies continue to see the benefit of data teams, with many senior data professionals finding a spot in the C-suite, I see data continuing to play an ever-important role in day-to-day business decisions.