ProSiebenSat.1: Building a Unified Data Lake

Successfully migrating a complex data platform to the AWS Cloud

Successful Journey of a Complex Data Platform Into the AWS Cloud

Fragmented data landscape and limited scaling? It's the past! Together with b.telligent, ProSiebenSat.1 migrated its complex data platform to the AWS cloud and created a unified data lake with databricks. The result: centralized data, seamless scalability, and lightning-fast analyses — for better insights, faster decisions and maximum efficiency!

Content

Initial Situation & Challenge

Motivation & Solution

Voices From the Project

b.telligent Services at a Glance

Results & Successes

Technologies

Quick Facts About the Project

Location & Sector: Germany, Entertainment

Company size: Group

Project duration: 16 months

Project type: Cloud Migration

Technology: AWS, Databricks

About the client

Initial Situation & Challenge

As a central unit, ProSiebenSat.1 Tech & Services provides a data platform and, in this context, offers services for implementing data-driven use cases. For migration of the entire data platform to the AWS cloud, ProSiebenSat.1 relies on the support of b.telligent, thus counting on the close and cooperative cooperation via which many different projects have already been successfully realized. Seven teams from ProSiebenSat.1 and b.telligent employees are jointly working on this extensive migration project. Responsible for migrating the unified data lake, the lake team connects all source systems centrally and makes these available to the product teams in a prepared form.

Motivation & Solution

With cloud migration, ProSiebenSat.1 is creating the basis for a future data landscape with over 1000 direct users. The starting point was an on-premises Hadoop-based platform with limited scaling options and requiring high effort for operation and further development. In addition, there were other data lakes and smaller platforms with partly redundant data storage. The new platform and the technologies chosen for it create free capacities for effective use of data to generate specialized added value.

The change in technology is accompanied by modernization of the actual data architecture and centralization of the areas of BI/DWH, data science and AI. This will also reduce the cost of operating the platform and create opportunities for future expansion as well as new use cases. The central challenge is the multiplicity and diversity of more than 80 source systems, including their historical data comprising over 600 TB.

In combination with native AWS services, Databricks was chosen as the technological basis for a unified data lake. A code-based infrastructure serves as the basis for the actual data architecture. Implementation on AWS as ProSiebenSat.1's preferred cloud provider ensures that collected data can be consumed easily, securely and efficiently throughout the company.

Architecture of a unified data lake on AWS using Databricks: Includes control and compute plane, integrated databases (RDS, DynamoDB), IAM and KMS for security, supporting AWS services, data lake stages (acquisition, ingestion, standardization), and development workspaces (DEV, TEST, PROD).

Due to the number of sources, the focus was on an optimally standardized data architecture with a modular structure, oriented towards software development standards. The unified data lake takes care of the connection of source systems as well as their technical and functional harmonization. As a result, centrally usable and well-prepared data are available to the product teams and specialized departments.

Overview of a metadata-driven ELT framework using Apache Spark and Delta Lake: shows data flow from acquisition to ingestion and standardization, integration of diverse data sources (e.g., REST, sFTP, Google Analytics, SAP), support for AI use cases with MLflow, and BI tools for reporting and planning. Central services include governance, CI/CD, data quality, and scheduling.

At the heart of processing is a metadata-based ELT framework based on Spark. On the basis of connectors, new source systems can be connected simply through configuration. After integration and associated harmonization into the delta format, technical rules (data types, casting, naming, transformations...) are applied in a final step. The technological basis for the framework is an AWS RDS Aurora Postgres for managing the operational metadata and an AWS DynamoDB for the source configurations.

In addition to providing the actual data, the platform offers other services for use by the product teams:

Central governance based on Unity Catalog for granular control of permissions, as well as documentation of content such as categorization of personal data.
Cross-system orchestration of data pipelines with Airflow.
Ensuring data quality through fully automated tests in SODA, including alerting through integration in Slack and Jira.
AWS DataSync to ensure 1:1 migrations and risk-free transition from the legacy system, source by source.

The first recipients are the DWH product teams who are to subsequently establish a core layer modeled according to Data Vault using its Builder in Snowflake. Final deployment is then performed via tools such as Tableau, Longview or direct database access, depending on the use case.

Voices From the Project

As a product owner, I envision the unified data lake as a confluence where data from multiple sources merge into a single source of truth that powers AI models and enables BI insights for every data consumer in our media ecosystem.

Vijay Kumar Nagaraj

Product Owner Unified Data Lake at ProSiebenSat.1 Media

Even initial skeptics were quickly convinced of the new platform and its benefits thanks to the determined and competent work of the integrated team with b.telligent, and today we have highly satisfied product teams and customers of our Unified Data Lake.

Gerhard Niederbrucker

VP Data Platforms bei ProSiebenSat.1 Media

b.telligent Services at a Glance

Cloud Migration

Successful migration of a complex Hadoop-based data platform to AWS Cloud for better scalability and efficiency.

Data Architecture & Centralization

Modernization of the data architecture with a code-based infrastructure and unification of BI, Data Science, and AI.

Data Integration & Harmonization

Connection of over 80 source systems with standardized processes for data ingestion, transformation, and provisioning.

Automated Governance & QA

Implementation of the Unity Catalog for access control and automated data quality checks using SODA.

Enablement & Knowledge Transfer

Close collaboration with internal teams to empower employees and ensure sustainable use of the new platform.

Data Pipeline Orchestration

Development of a cross-system orchestration with Airflow for automation and management of data processes.

Results & Successes

Fast implementation: After four months, the first sources became available.

Efficient and cost-effective: Scaling, standardization, and optimization reduced infrastructure costs under on-prem and cloud calculations.

High acceptance and sustainability: The platform relieves teams, promotes innovation and strengthens internal independence.

Results
Four months after the start of the project, the first sources became available to the product teams on the new platform and have been running steadily ever since. In the meantime, all planned sources have been migrated. The final phase of the project has focused on expanding monitoring and alerting capabilities as well as consolidating operational processes. At the same time, additional teams are already being boarded on the platform. This involves, among other things, migration of AI use cases which employ central, pre-processed data and are no longer dependent on a separate connection.

Efficient and cost-effective: The technological setup has already proven itself several times during migration. The project has accordingly benefited from scaling possibilities in merging historical data during step-by-step migration of sources. Due to consistent standardization and optimization of the loading steps, infrastructure costs are lower than the costs of an on-premises solution, and even lower than the originally planned costs of the new cloud setup. The advantages of Databricks can also be seen in central governance via Unity Catalog, as well as the high-performance Spark-based processing which achieves significant improvements, especially for previously lengthy loading jobs.

Successes
In addition to the advantages described, the project's success is particularly evident in the acceptance of the new platform. Further teams started boarding the unified data lake already during the migration phase. As originally hoped, the central added value is a reduction of expenses in operation of infrastructure and processing. This intensifies the focus on value-generating data products which can be realized by the product teams in the form of data modeling or implementation of AI use cases. The integrated team configuration jointly with b.telligent made it possible to establish the capacities required for migration. External expertise played a decisive role in the design of the architecture as well as implementation and use of the metadata-based framework. At the same time, close cooperation has enabled internal colleagues so as to allow a gradual phase-out of b.telligent.

The Tech Behind the Success

Amazon Web Services (AWS)

As an Advanced Partner of AWS, b.telligent supports its customers in the migration and setup of data platforms in the AWS cloud. More information here!

Databricks

Databricks gives data teams the ability to process large amounts of data in the cloud and apply it to cutting-edge AI technologies. Learn more!

Mann unterhält sich lächelnd am Tisch mit einer Frau

Download the Full Story

Want a handy PDF version of our success story? Whether you need it for yourself or to introduce the project to your team, download it now and explore the full success story. Enjoy reading!

Download story

Klaus-Dieter Schulze

Managing Director

Inspired?

Did our success stories spark your interest? If you're facing similar challenges in data, analytics and AI and look for expert support, let’s talk. A brief call can reveal how we can help you move forward.

Get in touch

ProSiebenSat.1: Building a Unified Data Lake

Successful Journey of a Complex Data Platform Into the AWS Cloud

Content

Quick Facts About the Project

Initial Situation & Challenge

Motivation & Solution

Voices From the Project

Vijay Kumar Nagaraj

Gerhard Niederbrucker

b.telligent Services at a Glance

Results & Successes

The Tech Behind the Success

Amazon Web Services (AWS)

Databricks

Download the Full Story

Klaus-Dieter Schulze

Inspired?

info@btelligent.com

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Vienna – Postal address

Vienna – Visitor address

Cluj

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich