I was working on a project that uses PostgreSQL and as part of my task I needed to write a migration. The migrations are written as plain SQL and applied using the migrate library. The migration itself was not that complex, some rows needed to be updated where a column matched one of the values in a list.
In my rush, I’ve opted for a rather simple query that went something like this:
UPDATE some_table SET some_column = 'some_value'
WHERE some_other_column IN (
'value_1',
'value_2',
'value_3',
'value_4',
-- ...
'value_26',
'value_27'
'value_28',
'value_29'
);
When the migration got executed on the testing environment, no errors were reported and all looked well. However, after some time I noticed that the migration failed to update some rows that I had anticipated.
The observant readers probably noticed that there’s a missing comma after value_27
in the previous snippet and this is indeed what caused the issue. What surprised me is that the query did not result in an error. Why is that? Let’s explore.
I’ll create a simple table and insert a few entries.
CREATE TABLE sessions(id SERIAL, browser varchar(50));
INSERT INTO sessions(browser)
VALUES ('chrome'), ('brave'), ('firefox');
Running the following query returns 3 rows, as expected.
SELECT * FROM sessions
WHERE browser in (
'chrome',
'brave',
'firefox'
);
-------------
-- id|browser|
-- --+-------+
-- 1|chrome |
-- 2|brave |
-- 3|firefox|
However, if I fail to include a comma after brave
the resulting query returns a single row:
SELECT * FROM sessions
WHERE browser in (
'chrome',
'brave'
'firefox'
);
-------------
-- id|browser|
-- --+-------+
-- 1|chrome |
The query does not result in an error and still yields some results. To better understand what is happening we can further simplify this example to a simple SELECT:
SELECT 'brave'
'firefox';
-- -------------
-- ?column? |
-- ------------+
-- bravefirefox|
The original query failed to include the sessions with brave
and firefox
browsers because brave and firefox got concatenated resulting in the following query:
SELECT * FROM sessions
WHERE browser in (
'chrome',
'bravefirefox'
);
What I find odd is that this behavior only exists if you include at least one newline between the two string constants. This means that
SELECT 'brave' 'firefox';
This behavior is documented in the PostgreSQL documentation:
This slightly bizarre behavior is specified by SQL; PostgreSQL is following the standard.
After scratching my head for a bit, I then went on to try and figure out a way to minimize the chance of this happening to me in the future. If you’re working with strings using ANY
and providing an array as a parameter to it should keep you safe:
SELECT * FROM sessions
WHERE browser = ANY('{
"chrome",
"brave",
"firefox"
}');
Should you omit a comma it will let you know that you’ve got a malformed array literal
.