Apache Spark training for data engineering

Apache Spark training for data engineering

Big data enthusiasts with no/little prior knowledge

3 days

Learn about Apache Spark for Data Engineers

The “Apache Spark for Data Engineering” course is a three-day training course that covers the fundamentals and practical applications of Apache Spark in big data projects. Participants will learn the differences to traditional databases and gain insights into using Spark for distributed data processing, performance tuning and implementing ETL processes.

The course includes techniques for connecting other technologies, working with the DataFrame API in Python and Scala, and topics such as structured streaming and machine learning. Debugging, monitoring and testing Spark applications are also part of the program.

Calendar icon

Next Training

On Request

Map pin icon

Location

Online

Clock icon

Duration

3 days

Cash icon

Costs

from 1,495 EUR

Information About the Training

Working with big data technologies differs significantly from working with conventional database technologies. The Apache Spark framework opens up many new possibilities in the field of data engineering. The 3-day course “Apache Spark for Data Engineering” teaches the necessary methods and procedures for using Spark for data engineering.

In addition to the necessary technical background knowledge, the different types of application and the special features of data processing with Apache Spark on distributed systems, the work using the DataFrame API in Python (and partly in Scala) is presented using various practical examples.

The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects

  • What is Apache Spark and what position does it occupy in the big data universe?
  • Where is it used for which use cases?
  • Connectivity of Spark with other technologies
  • Concepts and consequences of distributed processing with Spark
  • Possibilities for running Spark (Notebooks, Shell . ..)
  • DataFrames, Spark SQL
  • Performance factors and possibilities of performance tuning
  • Debugging and monitoring of applications via the Spark UI
  • Configuration of Spark jobs
  • Implementation of ETLs based on the DataFrame API
  • Creating dynamic queries with the DataFrame API
  • Structured streaming
  • Testing of Spark jobs
  • High-level concepts of machine learning based on Spark

The course is aimed at all those interested in the fields of big data, data engineering and data science with little or no previous knowledge who want to use Apache Spark to perform ETL tasks. Prerequisites are:

  • Confident handling of SQLProgramming
  • experience in Python or Scala (see Jump Start Python pre-course)

A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.

Working with big data technologies differs significantly from working with conventional database technologies. The Apache Spark framework opens up many new possibilities in the field of data engineering. The 3-day course “Apache Spark for Data Engineering” teaches the necessary methods and procedures for using Spark for data engineering.

In addition to the necessary technical background knowledge, the different types of application and the special features of data processing with Apache Spark on distributed systems, the work using the DataFrame API in Python (and partly in Scala) is presented using various practical examples.

The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects

  • What is Apache Spark and what position does it occupy in the big data universe?
  • Where is it used for which use cases?
  • Connectivity of Spark with other technologies
  • Concepts and consequences of distributed processing with Spark
  • Possibilities for running Spark (Notebooks, Shell . ..)
  • DataFrames, Spark SQL
  • Performance factors and possibilities of performance tuning
  • Debugging and monitoring of applications via the Spark UI
  • Configuration of Spark jobs
  • Implementation of ETLs based on the DataFrame API
  • Creating dynamic queries with the DataFrame API
  • Structured streaming
  • Testing of Spark jobs
  • High-level concepts of machine learning based on Spark

The course is aimed at all those interested in the fields of big data, data engineering and data science with little or no previous knowledge who want to use Apache Spark to perform ETL tasks. Prerequisites are:

  • Confident handling of SQLProgramming
  • experience in Python or Scala (see Jump Start Python pre-course)

A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.

Request The Training Now

This training consists of six in-depth topic modules, each of which can also be booked separately. If you are interested in a module that has already taken place, please register and we will contact you to arrange a date. We have been a multi-certified Microsoft partner for over 15 years.

The top of an office building on a bright day

Training sessions

Natoque enim cursus in eget commodo. Elementum suspendisse necnulla sapien amet orci varius dignissim. Lorem magna etiam massa lectus  tempus nibh nisi amet. Sed cursus ut dui tempus.

Meet Our Trainers

Stefan Seltmann

Stefan Seltmann

Management Consultant

Stefan is a graduate psychologist and consultant with a focus on data science & AI. He is an expert in machine learning based on various languages and technologies and is already successfully using Apache Spark in various projects with Python and Scala. As an experienced data science practitioner with more than 15 years of project experience, he knows the world of relational databases as well as big data technologies and is looking forward to making course participants want to use Apache Spark.

More Trainings

Power Apps & Power Automate

User icon

Power App & Power Automate users and developers

Calendar icon

1 day per session

Learn more

Infrastructure as Code With Terraform and Azure

User icon

Cloud Engineer, Cloud Architect

Calendar icon

1 day

Learn more

Genesee Academy

User icon

Data Modelers, Data Architect, Business Analysts

Calendar icon

2-3 days

Learn more

Longview Analytics Application Design

User icon

Analytics and reporting managers

Calendar icon

1 day

Learn more

Keen to Learn More?

In addition to our training courses, we have a range of other exciting resources for you, such as white papers, webinars, and our blog. Find out how we can support you with practical expertise on all aspects of data & analytics.

Drei b.telligent Mitarbeiter lachen und arbeiten gemeinsam am Laptop

Expertise on Demand

Discover white papers and webinars designed to make a difference: practical insights, best practices and the latest trends in data & analytics, provided by experienced experts.

Zwei b.telligent Mitarbeiterinnen arbeiten gemeinsam am Laptop

Tips & Trends in Our Blog

From data strategy to AI - you will find practical tips and in-depth insights that will help you get ahead. Get inspired and discover solutions for your data-driven future.