Parallel Database Systems: Running SQL in LUA Scripts

Blog

Parallel Execution of SQL Statements in LUA Scripts

Andreas Ambühl

Published on

25.8.2021

3.4.2025

Updated on

3.4.2025

Data Platform & Data Management

Parallel Execution of SQL Statements in LUA Scripts

Exasol is a leading manufacturer of analytical database systems. Its core product is a high-performance, in-memory, parallel processing software specifically designed for the rapid analysis of data. It normally processes SQL statements sequentially in an SQL script. But how can you execute several statements simultaneously? Using the simple script contained in this blog post, we show you how.

Inhaltsverzeichnis

The Pros and Cons of Parallelizing Statements

Exasol is known for its massive parallel processing capabilities. Queries are distributed across all the nodes in a cluster using optimized parallel algorithms that process data locally in each node's main memory.

ETL processes often rely on LUA scripts or user-defined functions (UDFs). Within these scripts, SQL statements are executed in order to perform the various tasks of the ETL process. Examples of this include creating tables and importing data. These steps normally have to run sequentially to ensure that, for example, tables are loaded only after they have been created.

When loading tables or exporting data, it is possible to carry out certain steps in parallel. When LUA scripts are used in an ETL process, however, the execution of the instructions within the script is not parallelized: all the instructions are executed in a serialized order. Because Exasol's computing potential is not being fully utilized, this results in longer execution times.

What are the Requirements?

Executing multiple instructions in parallel requires a database cluster with multiple nodes. You will therefore need an enterprise version of Exasol with multiple nodes to implement this solution. Also, the solution is only suitable for instructions that do not require a specific execution sequence, as they are executed in random order.

How Can Multiple Commands Be Executed in Parallel?

To circumvent the serialized query execution limitation within LUA scripts, a two-step approach is necessary. In the first step, all of the instructions to be executed are written out in a purpose-designed table, which can be created in advance or during execution of the script. In the second step, a Python script is invoked that connects to the database and executes the instructions received. This Python script is of the set and emit type, which makes it possible to distribute the instructions to the different nodes and thereby execute them in parallel.

Example Code

The following code uses an example to demonstrate the solution. In our example, all views of the schema EXA_STATISTICS are to be exported to separate CSV files.

Creating a Connection to Exasol

First, create a connection to the Exasol database. It makes sense here to address the full IP range of all the nodes. This is shown in the code below with the 11..14 for the 4 nodes in the cluster.

CREATE OR REPLACE CONNECTION exa_connection
                TO '192.168.6.11..14:8563'
                USER 'tech_user' 
                IDENTIFIED BY 'secret';

Implementing the Python Script

The second step is to implement the Python script that will execute the statements in parallel. To establish a connection, we need to use the pyExasol package. This is needed to establish contact with the cluster via the previously defined connection. DSN, username and password should be supplied for this purpose. A client name must also be specified. This can then be used as a filter on the instructions to check for correct execution. The script will return two values for each statement executed. If all is well, it will show "Success" and the number of lines involved; if there is a problem, it will show "Failed" along with the statement that failed when executed. This script can be enabled from various LUA scripts in order to execute statements in parallel.

--/
CREATE OR REPLACE PYTHON SET SCRIPT SCRIPTS.PYT_PARALLEL_EXECUTION(stmt VARCHAR(20000)) 
EMITS(succes_state VARCHAR(10), outputs VARCHAR(20000)) AS 
import pyExasol

# Create Exasol connection
con = exa.get_connection("exa_connection") 
C = pyExasol.connect(dsn=con.address, user=con.user, password=con.password, autocommit=True, encryption=True, client_name="PARALLEL_EXECUTION")

def run(ctx):

 while True:    
  stmt = ctx.stmt
  try:
   retrun_stmt = C.execute(stmt)   
   ctx.emit("Success", str(retrun_stmt.rowcount()))
   C.commit()
  except:
   ctx.emit("Failed", stmt)
   C.rollback()
  if not ctx.next(): break
 C.close()
/

Implementing the LUA Script

The final step is to implement the LUA script. We now have to create a table and populate it with statements in order to export the EXA_STATISTICS tables. We then invoke the Python script to execute these instructions. Finally, we delete the table containing the export statements. Bear in mind that the Python script opens a new session and is terminated before the LUA script. You therefore have to create COMMITS in the LUA script to ensure that the changes made by the Python script are not overwritten.

CREATE OR REPLACE LUA SCRIPT SCRIPTS.LUA_PARALLEL_EXECUTION () AS
require "string"
nClock = os.clock()
-- Example case: export all stats tables    

    -- create tmp table 
	local suc, res = pquery([[CREATE OR REPLACE TABLE TABLES.TEMP_TABLE (i VARCHAR(20000))]])
	if suc == true then
		output("Temp table created")
		output("Execution time" .. os.clock()-nClock)
	elseif suc == false then
		output("ERROR: It was not possible to create the temp table")
		output("Script stopped")
		output("Execution time" .. os.clock()-nClock)
		exit()
	end
    -- Generate export statements
	-- Get Table names
   	local suc, res = pquery([[SELECT OBJECT_NAME FROM EXA_SYSCAT WHERE SCHEMA_NAME = 'EXA_STATISTICS']])
	
	-- Fill tmp table with statements for parallel execution	
	for i=1, #res do
		exp_stmt = "EXPORT"..res[i].."INTO LOCAL CSV FILE '/tmp/"..res[i]..".csv'"
		local suc, res1 = pquery([[INSERT INTO TABLES.TEMP_TABLE VALUES(:s)]], {s=exp_stmt})
		if suc == true then
	        suc_sum = suc_sum + 1	        
	    elseif suc == false then
	        output("WARNING: It was not possible to create the following import statement: ")
	        output(exp_stmt)
	    end
	end
    output(suc_sum.." Insert statements created and saved")


    -- Execute python script to parallelize import
    res = query([[SELECT SCRIPTS.PYT_PARALLEL_EXECUTION(i) FROM TABLES.TEMP_TABLE GROUP BY Iproc()]])
    
    -- Return total number of exported rows
    total = 0
    for i=1, #res do
            local stmt_return = res[i][1]
            if stmt_return == "Failed" then
                    output("WARNING: It was not possible to execute the following statement")
                    output(res[i][2])
                    output(stmt_return)
            elseif stmt_return == "Success" then
            	total = total + stmt_return
            end		
    end

   -- drop temp Table
	local suc, res = pquery([[DROP TABLE TABLES.TEMP_TABLE]])
	if suc == true then
		output("Temp table dropped ")
		output("Execution time" .. os.clock()-nClock)
	elseif suc == false then
		output("ERROR: Temp table could not be dropped")
	end   
	

	output("Number of exported row " .. total)
	output("Script finished successfully")
	output("Execution time" .. os.clock()-nClock)

The statements i, which are held in the temporarytable TEMP_TABLE, are sent to the function. The statement GROUP BY Iproc(), distributes the statements to the number of nodes, the Python function is started on each node and the statements are finally executed in parallel.

Done! ... And Now?

Now that everything is set up, we can execute the LUA script and export the data in parallel.

EXECUTE SCRIPT SCRIPTS.LUA_PARALLEL_EXECUTION();

By parallelizing SQL statements within LUA scripts, we have succeeded in reducing the runtime of our regular import and export statements by 90%. This is a significant improvement on the time it takes to execute SQLs within LUA statements in serialized mode.

Our example shows how you can execute SQL statements in parallel within LUA scripts. The template can, of course, be adapted to deal with many different use cases and is not restricted just to the export of data.

Want To Learn More? Contact Us!

Caspar von Stülpnagel

Managing Director

Who is b.telligent?

Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.

Get to know us

The top of an office building on a bright day

SAP Data Integration Into Microsoft’s Azure Cloud

22.10.2024

3.4.2025

Data Platform & Data Management

SAP Data Integration Into Microsoft’s Azure Cloud

General Requirements of Enterprises

Many companies with SAP source systems are familiar with this challenge: They want to integrate their data into an Azure data lake in order to process them there with data from other source systems and applications for reporting and advanced analytics. The new SAP notice on use of the SAP ODP framework has also raised questions among b.telligent's customers. This blog post presents three good approaches to data integration (into Microsoft's Azure cloud) which we recommend at b.telligent and which are supported by SAP.

First of all, let us summarize the customers' requirements. In most cases, enterprises want to integrate their SAP data into a data lake in order to process them further in big-data scenarios and for advanced analytics (usually also in combination with data from other source systems).

23.7.2024

3.4.2025

Data Platform & Data Management

Data Platform Migration

As part of their current modernization and digitization initiatives, many companies are deciding to move their data warehouse (DWH) or data platform to the cloud. This article discusses from a technical/organizational perspective which aspects areof particularly important for this and which strategies help to minimize anyrisks. Migration should not be seen as a purely technical exercise. "Soft" factors and business use-cases have a much higher impact.

Iot Data Processing – Part 2: Azure Stream Analytics & Functions

8.12.2021

3.4.2025

Data Platform & Data Management

Iot Data Processing – Part 2: Azure Stream Analytics & Functions

Architecture recommendation and data-processing techniques with Azure Stream Analytics and Azure Functions. In this article, we provide two architecture recommendations, show how they can be implemented, and visualize the data acquired via IoT Central in a Power BI dashboard. Here you can read part 1.

In addition to data ingestion, data processing in the Industrial Internet of Things (IIoT) is still a major challenge for many companies. How companies successfully implement IoT projects on the basis of a 6-point plan, can be read here. An easy start in connecting industrial devices to the cloud has been described here. Also shown is how IoT Central can be used to read an industrial robot's data from an OPC-UA server, and deposit the data in Azure Blob Storage.

All posts

No previous post

No next post

Parallel Execution of SQL Statements in LUA Scripts

Inhaltsverzeichnis

The Pros and Cons of Parallelizing Statements

What are the Requirements?

How Can Multiple Commands Be Executed in Parallel?

Example Code

Creating a Connection to Exasol

Implementing the Python Script

Implementing the LUA Script

Done! ... And Now?

Want To Learn More? Contact Us!

Your contact person

Caspar von Stülpnagel

Who is b.telligent?

info@btelligent.com

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Vienna – Postal address

Vienna – Visitor address

Cluj

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich

Parallel Execution of SQL Statements in LUA Scripts

Inhaltsverzeichnis

The Pros and Cons of Parallelizing Statements

What are the Requirements?

How Can Multiple Commands Be Executed in Parallel?

Example Code

Creating a Connection to Exasol

Implementing the Python Script

Implementing the LUA Script

Done! ... And Now?

Want To Learn More? Contact Us!

Your contact person

Caspar von Stülpnagel

Who is b.telligent?

Related Posts

SAP Data Integration Into Microsoft’s Azure Cloud

General Requirements of Enterprises

Data Platform Migration

Iot Data Processing – Part 2: Azure Stream Analytics & Functions

Munich

Basel

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Vienna – Postal address

Vienna – Visitor address

Cluj

Basel

Zurich

Nürnberg

Frankfurt

Düsseldorf

Hamburg

Berlin

Munich