


ProSiebenSat.1: Building a Unified Data Lake
Successfully migrating a complex data platform to the AWS Cloud
Initial Situation & Challenge
As a central unit, ProSiebenSat.1 Tech & Services provides a data platform and, in this context, offers services for implementing data-driven use cases. For migration of the entire data platform to the AWS cloud, ProSiebenSat.1 relies on the support of b.telligent, thus counting on the close and cooperative cooperation via which many different projects have already been successfully realized. Seven teams from ProSiebenSat.1 and b.telligent employees are jointly working on this extensive migration project. Responsible for migrating the unified data lake, the lake team connects all source systems centrally and makes these available to the product teams in a prepared form.
Motivation & Solution
With cloud migration, ProSiebenSat.1 is creating the basis for a future data landscape with over 1000 direct users. The starting point was an on-premises Hadoop-based platform with limited scaling options and requiring high effort for operation and further development. In addition, there were other data lakes and smaller platforms with partly redundant data storage. The new platform and the technologies chosen for it create free capacities for effective use of data to generate specialized added value.
The change in technology is accompanied by modernization of the actual data architecture and centralization of the areas of BI/DWH, data science and AI. This will also reduce the cost of operating the platform and create opportunities for future expansion as well as new use cases. The central challenge is the multiplicity and diversity of more than 80 source systems, including their historical data comprising over 600 TB.
In combination with native AWS services, Databricks was chosen as the technological basis for a unified data lake. A code-based infrastructure serves as the basis for the actual data architecture. Implementation on AWS as ProSiebenSat.1's preferred cloud provider ensures that collected data can be consumed easily, securely and efficiently throughout the company.

Due to the number of sources, the focus was on an optimally standardized data architecture with a modular structure, oriented towards software development standards. The unified data lake takes care of the connection of source systems as well as their technical and functional harmonization. As a result, centrally usable and well-prepared data are available to the product teams and specialized departments.

At the heart of processing is a metadata-based ELT framework based on Spark. On the basis of connectors, new source systems can be connected simply through configuration. After integration and associated harmonization into the delta format, technical rules (data types, casting, naming, transformations...) are applied in a final step. The technological basis for the framework is an AWS RDS Aurora Postgres for managing the operational metadata and an AWS DynamoDB for the source configurations.
In addition to providing the actual data, the platform offers other services for use by the product teams:
- Central governance based on Unity Catalog for granular control of permissions, as well as documentation of content such as categorization of personal data.
- Cross-system orchestration of data pipelines with Airflow.
- Ensuring data quality through fully automated tests in SODA, including alerting through integration in Slack and Jira.
- AWS DataSync to ensure 1:1 migrations and risk-free transition from the legacy system, source by source.
The first recipients are the DWH product teams who are to subsequently establish a core layer modeled according to Data Vault using its Builder in Snowflake. Final deployment is then performed via tools such as Tableau, Longview or direct database access, depending on the use case.
Voices From the Project
b.telligent Services at a Glance
Cloud Migration
Successful migration of a complex Hadoop-based data platform to AWS Cloud for better scalability and efficiency.
Data Architecture & Centralization
Modernization of the data architecture with a code-based infrastructure and unification of BI, Data Science, and AI.
Data Integration & Harmonization
Connection of over 80 source systems with standardized processes for data ingestion, transformation, and provisioning.
Automated Governance & QA
Implementation of the Unity Catalog for access control and automated data quality checks using SODA.
Enablement & Knowledge Transfer
Close collaboration with internal teams to empower employees and ensure sustainable use of the new platform.
Data Pipeline Orchestration
Development of a cross-system orchestration with Airflow for automation and management of data processes.

Results & Successes
Fast implementation: After four months, the first sources became available.
Efficient and cost-effective: Scaling, standardization, and optimization reduced infrastructure costs under on-prem and cloud calculations.
High acceptance and sustainability: The platform relieves teams, promotes innovation and strengthens internal independence.
Results
Four months after the start of the project, the first sources became available to the product teams on the new platform and have been running steadily ever since. In the meantime, all planned sources have been migrated. The final phase of the project has focused on expanding monitoring and alerting capabilities as well as consolidating operational processes. At the same time, additional teams are already being boarded on the platform. This involves, among other things, migration of AI use cases which employ central, pre-processed data and are no longer dependent on a separate connection.
Efficient and cost-effective: The technological setup has already proven itself several times during migration. The project has accordingly benefited from scaling possibilities in merging historical data during step-by-step migration of sources. Due to consistent standardization and optimization of the loading steps, infrastructure costs are lower than the costs of an on-premises solution, and even lower than the originally planned costs of the new cloud setup. The advantages of Databricks can also be seen in central governance via Unity Catalog, as well as the high-performance Spark-based processing which achieves significant improvements, especially for previously lengthy loading jobs.
Successes
In addition to the advantages described, the project's success is particularly evident in the acceptance of the new platform. Further teams started boarding the unified data lake already during the migration phase. As originally hoped, the central added value is a reduction of expenses in operation of infrastructure and processing. This intensifies the focus on value-generating data products which can be realized by the product teams in the form of data modeling or implementation of AI use cases. The integrated team configuration jointly with b.telligent made it possible to establish the capacities required for migration. External expertise played a decisive role in the design of the architecture as well as implementation and use of the metadata-based framework. At the same time, close cooperation has enabled internal colleagues so as to allow a gradual phase-out of b.telligent.
The Tech Behind the Success

Amazon Web Services (AWS)
As an Advanced Partner of AWS, b.telligent supports its customers in the migration and setup of data platforms in the AWS cloud. More information here!

Databricks
Databricks gives data teams the ability to process large amounts of data in the cloud and apply it to cutting-edge AI technologies. Learn more!

Download the Full Story
Want a handy PDF version of our success story? Whether you need it for yourself or to introduce the project to your team, download it now and explore the full success story. Enjoy reading!
Inspired?
Did our success stories spark your interest? If you're facing similar challenges in data, analytics and AI and look for expert support, let’s talk. A brief call can reveal how we can help you move forward.
