Apache Spark training for data engineering

Big data enthusiasts with no/little prior knowledge

3 days

Learn about Apache Spark for Data Engineers

The “Apache Spark for Data Engineering” course is a three-day training course that covers the fundamentals and practical applications of Apache Spark in big data projects. Participants will learn the differences to traditional databases and gain insights into using Spark for distributed data processing, performance tuning and implementing ETL processes.

The course includes techniques for connecting other technologies, working with the DataFrame API in Python and Scala, and topics such as structured streaming and machine learning. Debugging, monitoring and testing Spark applications are also part of the program.

Next Training

On Request

Location

Online

Duration

3 days

Costs

from 1,495 EUR

Information About the Training

Working with big data technologies differs significantly from working with conventional database technologies. The Apache Spark framework opens up many new possibilities in the field of data engineering. The 3-day course “Apache Spark for Data Engineering” teaches the necessary methods and procedures for using Spark for data engineering.

In addition to the necessary technical background knowledge, the different types of application and the special features of data processing with Apache Spark on distributed systems, the work using the DataFrame API in Python (and partly in Scala) is presented using various practical examples.

‍

The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects

What is Apache Spark and what position does it occupy in the big data universe?
Where is it used for which use cases?
Connectivity of Spark with other technologies
Concepts and consequences of distributed processing with Spark
Possibilities for running Spark (Notebooks, Shell . ..)
DataFrames, Spark SQL
Performance factors and possibilities of performance tuning
Debugging and monitoring of applications via the Spark UI
Configuration of Spark jobs
Implementation of ETLs based on the DataFrame API
Creating dynamic queries with the DataFrame API
Structured streaming
Testing of Spark jobs
High-level concepts of machine learning based on Spark

‍

The course is aimed at all those interested in the fields of big data, data engineering and data science with little or no previous knowledge who want to use Apache Spark to perform ETL tasks. Prerequisites are:

Confident handling of SQLProgramming
experience in Python or Scala (see Jump Start Python pre-course)

‍

A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.

‍

The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects

What is Apache Spark and what position does it occupy in the big data universe?
Where is it used for which use cases?
Connectivity of Spark with other technologies
Concepts and consequences of distributed processing with Spark
Possibilities for running Spark (Notebooks, Shell . ..)
DataFrames, Spark SQL
Performance factors and possibilities of performance tuning
Debugging and monitoring of applications via the Spark UI
Configuration of Spark jobs
Implementation of ETLs based on the DataFrame API
Creating dynamic queries with the DataFrame API
Structured streaming
Testing of Spark jobs
High-level concepts of machine learning based on Spark

‍

Confident handling of SQLProgramming
experience in Python or Scala (see Jump Start Python pre-course)

‍

A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.

‍

Request The Training Now

This training consists of six in-depth topic modules, each of which can also be booked separately. If you are interested in a module that has already taken place, please register and we will contact you to arrange a date. We have been a multi-certified Microsoft partner for over 15 years.

Make an appointment now

The top of an office building on a bright day

Training sessions

Natoque enim cursus in eget commodo. Elementum suspendisse necnulla sapien amet orci varius dignissim. Lorem magna etiam massa lectus tempus nibh nisi amet. Sed cursus ut dui tempus.

Meet Our Trainers

Stefan Seltmann

Management Consultant

Stefan is a graduate psychologist and consultant with a focus on data science & AI. He is an expert in machine learning based on various languages and technologies and is already successfully using Apache Spark in various projects with Python and Scala. As an experienced data science practitioner with more than 15 years of project experience, he knows the world of relational databases as well as big data technologies and is looking forward to making course participants want to use Apache Spark.

More Trainings

Power Apps & Power Automate

Power App & Power Automate users and developers

1 day per session

Learn more

Power BI Training

Power BI user

1-2 days per session

Learn more

Introduction to Data Modeling

For beginners and advanced professionals in Business Intelligence (BI), Data Management, Data Analysis, and IT

2 Tage

Learn more

Databricks and Spark

Analysts, Data Engineers

1 day

Learn more

Keen to Learn More?

In addition to our training courses, we have a range of other exciting resources for you, such as white papers, webinars, and our blog. Find out how we can support you with practical expertise on all aspects of data & analytics.

Drei b.telligent Mitarbeiter lachen und arbeiten gemeinsam am Laptop

Expertise on Demand

Discover white papers and webinars designed to make a difference: practical insights, best practices and the latest trends in data & analytics, provided by experienced experts.

To the white papers and webinars

Zwei b.telligent Mitarbeiterinnen arbeiten gemeinsam am Laptop

Tips & Trends in Our Blog

From data strategy to AI - you will find practical tips and in-depth insights that will help you get ahead. Get inspired and discover solutions for your data-driven future.

Browse the blog

Apache Spark training for data engineering

Learn about Apache Spark for Data Engineers

Information About the Training

Content

Goals

Requirements

Bring with you

Content

Goals

Requirements

Bring with you

Request The Training Now

Training sessions

Meet Our Trainers

Stefan Seltmann

More Trainings

Power Apps & Power Automate

Power BI Training

Introduction to Data Modeling

Databricks and Spark

Keen to Learn More?

Expertise on Demand

Tips & Trends in Our Blog

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Cluj

Vienna – Postal address

Vienna – Visitor address

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich