Apache Spark training for data engineering
Big data enthusiasts with no/little prior knowledge
3 days
Learn about Apache Spark for Data Engineers
The “Apache Spark for Data Engineering” course is a three-day training course that covers the fundamentals and practical applications of Apache Spark in big data projects. Participants will learn the differences to traditional databases and gain insights into using Spark for distributed data processing, performance tuning and implementing ETL processes.
The course includes techniques for connecting other technologies, working with the DataFrame API in Python and Scala, and topics such as structured streaming and machine learning. Debugging, monitoring and testing Spark applications are also part of the program.
Next Training
On Request
Location
Online
Duration
3 days
Costs
from 1,495 EUR
Information About the Training
Working with big data technologies differs significantly from working with conventional database technologies. The Apache Spark framework opens up many new possibilities in the field of data engineering. The 3-day course “Apache Spark for Data Engineering” teaches the necessary methods and procedures for using Spark for data engineering.
In addition to the necessary technical background knowledge, the different types of application and the special features of data processing with Apache Spark on distributed systems, the work using the DataFrame API in Python (and partly in Scala) is presented using various practical examples.
The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects
- What is Apache Spark and what position does it occupy in the big data universe?
- Where is it used for which use cases?
- Connectivity of Spark with other technologies
- Concepts and consequences of distributed processing with Spark
- Possibilities for running Spark (Notebooks, Shell . ..)
- DataFrames, Spark SQL
- Performance factors and possibilities of performance tuning
- Debugging and monitoring of applications via the Spark UI
- Configuration of Spark jobs
- Implementation of ETLs based on the DataFrame API
- Creating dynamic queries with the DataFrame API
- Structured streaming
- Testing of Spark jobs
- High-level concepts of machine learning based on Spark
The course is aimed at all those interested in the fields of big data, data engineering and data science with little or no previous knowledge who want to use Apache Spark to perform ETL tasks. Prerequisites are:
- Confident handling of SQLProgramming
- experience in Python or Scala (see Jump Start Python pre-course)
A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.
Working with big data technologies differs significantly from working with conventional database technologies. The Apache Spark framework opens up many new possibilities in the field of data engineering. The 3-day course “Apache Spark for Data Engineering” teaches the necessary methods and procedures for using Spark for data engineering.
In addition to the necessary technical background knowledge, the different types of application and the special features of data processing with Apache Spark on distributed systems, the work using the DataFrame API in Python (and partly in Scala) is presented using various practical examples.
The course participants first receive all the essential information about Apache Spark. Then practical skills are taught to successfully implement data engineering projects
- What is Apache Spark and what position does it occupy in the big data universe?
- Where is it used for which use cases?
- Connectivity of Spark with other technologies
- Concepts and consequences of distributed processing with Spark
- Possibilities for running Spark (Notebooks, Shell . ..)
- DataFrames, Spark SQL
- Performance factors and possibilities of performance tuning
- Debugging and monitoring of applications via the Spark UI
- Configuration of Spark jobs
- Implementation of ETLs based on the DataFrame API
- Creating dynamic queries with the DataFrame API
- Structured streaming
- Testing of Spark jobs
- High-level concepts of machine learning based on Spark
The course is aimed at all those interested in the fields of big data, data engineering and data science with little or no previous knowledge who want to use Apache Spark to perform ETL tasks. Prerequisites are:
- Confident handling of SQLProgramming
- experience in Python or Scala (see Jump Start Python pre-course)
A laptop with a VirtualBox client and at least 8 GB RAM is required for the practical exercises.
Request The Training Now
This training consists of six in-depth topic modules, each of which can also be booked separately. If you are interested in a module that has already taken place, please register and we will contact you to arrange a date. We have been a multi-certified Microsoft partner for over 15 years.
Training sessions
Natoque enim cursus in eget commodo. Elementum suspendisse necnulla sapien amet orci varius dignissim. Lorem magna etiam massa lectus tempus nibh nisi amet. Sed cursus ut dui tempus.
Meet Our Trainers
Stefan Seltmann
Management Consultant
Stefan is a graduate psychologist and consultant with a focus on data science & AI. He is an expert in machine learning based on various languages and technologies and is already successfully using Apache Spark in various projects with Python and Scala. As an experienced data science practitioner with more than 15 years of project experience, he knows the world of relational databases as well as big data technologies and is looking forward to making course participants want to use Apache Spark.
More Trainings
Power Apps & Power Automate
Power App & Power Automate users and developers
1 day per session
Infrastructure as Code With Terraform and Azure
Cloud Engineer, Cloud Architect
1 day
Genesee Academy
Data Modelers, Data Architect, Business Analysts
2-3 days
Longview Analytics Application Design
Analytics and reporting managers
1 day
Keen to Learn More?
In addition to our training courses, we have a range of other exciting resources for you, such as white papers, webinars, and our blog. Find out how we can support you with practical expertise on all aspects of data & analytics.
Expertise on Demand
Discover white papers and webinars designed to make a difference: practical insights, best practices and the latest trends in data & analytics, provided by experienced experts.
Tips & Trends in Our Blog
From data strategy to AI - you will find practical tips and in-depth insights that will help you get ahead. Get inspired and discover solutions for your data-driven future.