SQL string constant gotcha

There's a little SQL gotcha involving string constants that had me scratch my head for a while. I explain what it is and how to avoid it.

I was working on a project that uses PostgreSQL and as part of my task I needed to write a migration. The migrations are written as plain SQL and applied using the migrate library. The migration itself was not that complex, some rows needed to be updated where a column matched one of the values in a list.

In my rush, I’ve opted for a rather simple query that went something like this:

UPDATE some_table SET some_column = 'some_value'
WHERE some_other_column IN (
    'value_1',
    'value_2',
    'value_3',
    'value_4',
    -- ...
    'value_26',
    'value_27'
    'value_28',
    'value_29'
);

When the migration got executed on the testing environment, no errors were reported and all looked well. However, after some time I noticed that the migration failed to update some rows that I had anticipated.

The observant readers probably noticed that there’s a missing comma after value_27 in the previous snippet and this is indeed what caused the issue. What surprised me is that the query did not result in an error. Why is that? Let’s explore.

I’ll create a simple table and insert a few entries.

CREATE TABLE sessions(id SERIAL, browser varchar(50));

INSERT INTO sessions(browser)
VALUES ('chrome'), ('brave'), ('firefox');

Running the following query returns 3 rows, as expected.

SELECT * FROM sessions 
WHERE browser in (
	'chrome',
	'brave',
	'firefox'
);

-------------
-- id|browser|
-- --+-------+
--  1|chrome |
--  2|brave  |
--  3|firefox|

However, if I fail to include a comma after brave the resulting query returns a single row:

SELECT * FROM sessions 
WHERE browser in (
	'chrome',
	'brave'
	'firefox'
);

-------------
-- id|browser|
-- --+-------+
--  1|chrome |

The query does not result in an error and still yields some results. To better understand what is happening we can further simplify this example to a simple SELECT:

SELECT 'brave'
'firefox';

-- -------------
-- ?column?    |
-- ------------+
-- bravefirefox|

The original query failed to include the sessions with brave and firefox browsers because brave and firefox got concatenated resulting in the following query:

SELECT * FROM sessions 
WHERE browser in (
	'chrome',
	'bravefirefox'
);

What I find odd is that this behavior only exists if you include at least one newline between the two string constants. This means that

SELECT 'brave' 'firefox';
results in a syntax error.

This behavior is documented in the PostgreSQL documentation:

This slightly bizarre behavior is specified by SQL; PostgreSQL is following the standard.

After scratching my head for a bit, I then went on to try and figure out a way to minimize the chance of this happening to me in the future. If you’re working with strings using ANY and providing an array as a parameter to it should keep you safe:

SELECT * FROM sessions 
WHERE browser = ANY('{
  "chrome",
  "brave",
  "firefox"
}');

Should you omit a comma it will let you know that you’ve got a malformed array literal.

Comments