Self-Service Data: How to Avoid Pitfalls

Vincent Loucel

September 2024 - 7 min read

The demand for self-service in data has skyrocketed in recent years. After discovering the potential of data utilization, business users increasingly express the need to access data autonomously, quickly, and simply.

What is self-service data?

Self-service data is the ability for anyone within a company, whether they have a technical background or not, to autonomously access data, query it, transform it, and visualize it themselves. An appealing idea at first glance, but it requires good practices to avoid certain pitfalls.

Solid Data Governance

The first step for effective self-service is to have solid data governance. In simple terms, data governance is the set of processes governing the treatment, collection, and use of data within an organization. In other words, it’s about answering two key questions.

- What is the data ? (having a precise and shared definition of the data across the organization, as well as its lineage).

- Who can access this data ? (setting appropriate permissions to allow certain groups to access specific data).

Many tools exist to facilitate data governance, but it remains a significant challenge. All data team members, whether Data Engineers, Data Analysts, or BI and Analytics Engineers, must collaborate to ensure rigorous governance. The earlier it’s implemented in the company, the better, as the situation can quickly get out of control without it.

Why is it important for self-service? First, sensitive data should not be accessible to everyone. For example, financial or personal data should be restricted to authorized personnel. Secondly, the idea behind self-service is to provide fast and efficient access to data that the user may not fully understand. To calculate a KPI, it must be simple to trace relevant components across all available data.

At What Granularity Level Should Data Be Accessible?

Giving everyone the access to all kind of data is not something to do. A business user likely doesn’t need access to raw, uncleaned data or highly granular levels, and he should not have access to it . Overloading users with unnecessary data can cause slowdowns and frustration.

What to do then? It’s essential to create a layer of pre-processed and pre-modeled data to make it easier for business users to work with. This may include aggregated data or pre-joined tables (dimensions and facts), ready for use.A strong data modeling and preprocessing team is needed to handle this. Typically, this is the role of an Analytics Engineer or BI Engineer.

Training and Data Culture

Two essential elements of self-service are:

- The tools used.

- Training on those tools, as well as best practices for SQL, data visualization, and data culture.

For non-technical users, having user-friendly tools like Tableau or PowerBI, which allow for the creation of visualizations via an intuitive interface, is crucial. However, beyond the tools, users must be trained to understand how to ask the right questions and how to interpret the results. Misinterpreting data can lead to incorrect decisions. For example, it’s important to fully grasp the concepts of aggregation and granularity. Most visualization tools aggregate the KPIs they are asked to display. If the granularity of the tables used is not well understood, there’s a risk that the resulting visualizations will be incorrect (e.g., summing values at an incorrect aggregation level leading to inflated numbers), resulting in faulty conclusions.

Challenges and Risks of Self-Service

Despite its advantages, self-service data presents significant challenges. First, it’s crucial to limit access to critical or sensitive data, such as financial or personal information. Secondly, a lack of training can lead to misinterpretations of data, inefficient queries, or poorly designed dashboards, ultimately affecting the system’s overall performance. In fact, it’s not uncommon in some companies for business users to create dashboards displaying the same KPIs but with different figures. This dissonance between multiple dashboards representing the same data undermines the credibility of the data among users. Often, the issue is simply due to differences in definition and context.

The Role of AI and Smart Tools

The arrival of artificial intelligence in the field of self-service data opens new perspectives. Tools like ThoughtSpot or even virtual assistants based on ChatGPT allow users to ask questions in natural language and receive answers in the form of visualizations. This further reduces technical barriers and allows non-specialist users to easily access data.

But what is the Value of Self-Service Data?

The key question is: What value will my self-service data bring? Will it allow for reliable, maintainable, and coherent reporting over the long term? Probably not without proper training. A brief introduction to the basics of data visualization and SQL won’t be enough to produce complex dashboards, such as those needed to monitor a company’s financial health. However, for users who are experts in their field, self-service allows for quick ad hoc analyses, helping them make informed decisions based on both their expertise and the data.

Conclusion

Self-service data is becoming more widespread and enables businesses to make faster, more rational decisions. With the advent of artificial intelligence, this trend will accelerate further. However, it’s essential to avoid common pitfalls that could create more complexity than value. A solid governance framework, appropriate tools, and training in data culture are the keys to successfully transitioning to self-service data.

If you have any questions or needs, feel free to contact me :)