All connectors include an execute function which assumes an SQL statement as a string parameter and which can be executed on the part of the database. However, the use of Python doesn’t really make sense until SQLs are generated dynamically and data-driven.
It is at this point that I would like to interrupt and demonstrate various alternatives – beginning with the most simple but also the least intelligent methods – and end with a best practice for how SQL strings should be transmitted.
To begin with, I first have to make it clear that every alternative except for the final one constitute potentially dangerous gaps in security. It is possible, if no other security precautions have been taken, that sensitive data may be retrieved or even deleted.
The Most Naive And Most Dangerous Approach: String Concatenation
First of all, we create a test database in SQLite. The reasons I am using SQLite for the demonstration are that SQLite comes with Python, it is possible to create databases directly in the memory for the runtime of the script meaning that the examples can therefore be replicated for each one. However, I can only guarantee error-free execution of the examples starting with Python 3.x
import sqlite3
db = sqlite3.connect(':memory:')
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
db.execute("INSERT INTO staff (person_id, lastname) VALUES (1, 'Pavlov') ")
db.execute("INSERT INTO staff (person_id, lastname) VALUES (2, 'Skinner') ")
In the sample code, a nameless database was initialised in the memory by entering ':memory:' as the storage location in the connect command after importing the module.
Then an employee spreadsheet named "staff" is generated and filled with the first set of data.
So far, so good. But we don’t want to have to write an entire insert command for each and every employee that we are going to add to the spreadsheet.
If the names of the employees are already available as a list, then this really lends itself to using a loop.
First Attempt:
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
for person_id, lastname in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES (" + person_id + ", '" + lastname + "') ")
Despite all the good intentions, it failed. The error notification "TypeError: Can't convert 'int' object to str implicitly" implies that we forgot to cast the data type of the person_id from integer to str. Although Python is flexible in almost everything, it is still a very strongly typified language and immutable strings cannot be combined with integers.
For this, the compiler will have to get up earlier. Next attempt:
db.execute("CREATE TABLE staff (person_id int, lastname CHAR);")
for person_id, lastname in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES (" + str(person_id) + ", '" + lastname + "') ")
Well, it is running but it doesn’t look nice at all. Especially when a great many of these concatenations are used in the code, then the statement is very fragmented. What’s more, I always have to take care of the type conversion myself.
The Old Approach: String Templates With %s
When leafing through Python literature, even the more current stuff, and reading various forum entries in Stack Overflow or anywhere else, techniques can be seen that look like this in our example:
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
for person_id, lastname in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES (%d, '%s') " % (person_id, lastname))
Works like a dream. "%d" is the placeholder for a digit and "%s" is the placeholder for a string.
If, however, this value is required multiple times when using this notation, it becomes rather jumbled again. Let us imagine an example in which we check various conditions in a query.
sql = """SELECT lastname , CASE WHEN %d > 10 THEN 'greater' WHEN %d = 10 THEN 'equal' WHEN %d < 10 THEN 'lesser' END vergleich FROM staff WHERE lastname <> '%s' and %d > 0 """ % (person_id, person_id, person_id, lastname, person_id)
As soon as we enter another placeholder here, the risk of error grows and we have to count off the positions in the code every time. Here it is better to select appointed placeholders.
sql = """SELECT lastname , CASE WHEN %(person_id)d > 10 THEN 'greater' WHEN %(person_id)d = 10 THEN 'equal' WHEN %(person_id)d < 10 THEN 'lesser' END vergleich FROM staff WHERE lastname <> '%(lastname)d' and %(person_id)d > 0 """ % {'person_id': person_id, 'lastname': lastname)
Wonderful! The code is now much easier to read because we can now see what we insert and where we insert it right away.
This notation poses only one small problem, namely that is considered obsolete and, at least in Python 3, has been replaced by a better one. About 3 or 4 years ago, I read multiple times in forums that this notation was even considered deprecated. This means that it should not be used any more because its continuance in later Python versions is not guaranteed. These days, however, it hasn’t been abandoned yet – presumably because it is still very widespread in modules.
The New Approach: String Templates With {}
The new official notation uses curly brackets. Not only does it look different, it also harbours much more potential as regards formatting options. When the new notation is adopted anyway because it can do more, why not use it consistently?
Let us look at the simple version first:
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
for person_id, lastname in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({}, '{}') ".format(person_id, lastname))
It is important to note here that when the appearance plays no role as regards comma placement or leading zeros, etc., then there does not need to be a difference between strings or numerical values for the placeholders. Yes, it is even possible to use tuples, for example:
db.execute("SELECT * FROM staff WHERE person_id in {}".format((1,3,4)))
The format function of the string calls the __str__ method of each object. This then corresponds to each str(object).
There is notation with labels here too but no dictionary is transmitted. Instead, the allocation is written in the form of functional parameters.
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
for person_id, lastname in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({person_id}, '{lastname}') ".format(person_id=person_id, lastname=lastname))
Everyone who is lazy about writing can also use tuple packing and tuple unpacking for themselves.
db.execute("CREATE TABLE staff (person_id int, lastname CHAR); ")
for row in enumerate(staff_names):
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({}, '{}') ".format(*row))
row is a tuple in every cycle of the loop because enumerate() returns two values as a tuple. These are then forced into the variable row. With the notation ".format(*row)" the tuple can be unpacked again and the values can be called in the appropriate order.
The same works with dictionaries.
data = {'lastname': 'Mustermann', 'person_id': 12345}
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({person_id}, '{lastname}') ".format(**data))
The most extreme example for wanting to avoid typing is this theoretical example of an insert function.
def insert_staff(person_id, lastname, other_parameter1, other_parameter2):
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({person_id}, '{lastname}') ".format(**globals())
Here, the format command simply retrieves the data from the variables defined in the namespace of the function call, in this case, the functional parameters. I have used this example myself in Python 2.7. In Python 3.x, however, this does not work any more and I believe that it is better this way.
Now, let’s imagine that we are getting a new employee, the fiftieth, named OʼReilly. The name is added quickly:
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({person_id}, '{lastname}') ".format(person_id=50, lastname="O'Reilly"))
SQLite complains here "sqlite3.OperationalError: near "Reilly": syntax error". What is that all about? Quite obviously, the name is added to the SQL statement but the apostrophe in the name is interpreted as the end of the name string, that is " , 'O'Reilly') ".
It is possible to find relief in the middle of a coding frenzy here by deleting the apostrophe. This varies according to database. In SQLite, the apostrophe must be doubled.
db.execute("INSERT INTO staff (person_id, lastname) VALUES ({person_id}, '{lastname}') ".format(person_id=50, lastname="O'Reilly".replace("'", "''")))
Well, that works for now, but it's just a cheap workaround.
So, now we are getting another new employee, the fifty-first, named Mr "');DROP TABLE staff;". Weird name but if that’s how the user entered it then it must be right.
If we wouldn’t be deleting various characters from the event with Mr O’Reilly and if we wouldn’t be working with SQLite, a programme that does not allow 2 statements in an execute string, the query string would appear as follows.
INSERT INTO staff (person_id, lastname) VALUES(51, '');DROP TABLE staff;
The entire spreadsheet was deleted here thanks to unsanitised user input. These SQL injections are real dangers for the security of the database. Just a few years ago, while I was trying things out, I stumbled across major online retailers that didn't even sanitise the apostrophe in their product search.