Update 2015-05-25: see the video tutorial at the end of this article.
A dynamic SQL query is a query that operates accordingly through parameters.
Sometimes it is necessary to query the database filtering the rows in base at the user input. Like in a search, an user would choose several filters and not others.
The examples are in PostgreSQL and Python, but can be easily adapted to any database and language.
Imagine a table of customers:
CREATE TABLE "customers" ( "id" serial PRIMARY KEY, "name" varchar(64) NOT NULL, "active" bool NOT NULL, "country" varchar(32) );
The classic approach
The classic approach is to compose the SQL joining strings and conditional constructs, for example:
# Function that performs the query # The parameters have null defaults def customers_filtered(active=None, country=None): # This is the beginning of the query query = ' SELECT * FROM customers WHERE' filters = [] # Add the active filter, if not null if active is not None: filters.append('active=%(active)s') # Add the country filter, if not null if country is not None: filters.append('country=%(country)s') # Join all the filter queries chunks with an AND query += ' AND '.join(filters) # Prepare the parameters parameters = { 'active': active, 'country': country } # Execute the query and return the results return execute(query, parameters)
The resulting code is long, the query takes times to write and is unclear to read.
The database have to make an execution plan before executing the query. The execution plan is re-used when the SQL query is static, but this query is dynamic so the execution plan will be not re-used a second time and making the query a bit slower.
Optional WHERE clause
There is a solution with a negligible impact on perfomances, much easier to write and to read. The trick is to use NULL conditionals, so the query will be:
# Function that performs the query # The parameters have the null defaults def customers_filtered(active=None, country=None): # This is the full query! query = ' SELECT * FROM customers WHERE (%(active)s IS NULL OR active=%(active)s) AND (%(country)s IS NULL OR country=%(country)s)' # Prepare the parameters parameters = { 'active': active, 'country': country } # Execute the query and return the results return execute(query, parameters)
The trick is to validate the right part of the OR clause only if the parameter is not NULL.
Optional JOINs
A similar approach can be applied for the JOIN clauses. First, create the “orders” table:
CREATE TABLE "orders" ( "id" serial PRIMARY KEY, "customer_id" int NOT NULL, "date" timestamp NOT NULL, "value" int NOT NULL );
It is possible to use a dynamic JOIN in order to get or not expensive data to calculate. The following SUM will be performed only if needed, setting the “flag” as true:
SELECT u.id, u.name, u.active, SUM(o.value) AS total FROM customers AS u LEFT JOIN orders AS o ON (true=%(flag)s AND o.customer_id=u.id) GROUP BY u.id
Also in this case the performance impact is risible and the query remains very clear to read.
Conclusions
The classic method of composing the queries can be a source of bugs and headaches, this should be avoided. These solutions can reduce the code and increase the readability, especially when the ORM cannot be used or with the CQRS pattern (Command Query Responsibility Separation).
A video tutorial
Nat Dunn has created a nice video tutorial inspired by this article, it is part of his Python course:
Discussion on
Hacker News: https://news.ycombinator.com/item?id=8842677
Reddit: http://www.reddit.com/r/Database/comments/2rh0ak/dynamic_pure_sql_where_clauses_and_optional_joins/
Twitter: https://twitter.com/davmuz_en/status/552295916385087489