I ran into an error today. The query is as follows:
["WITH distinct_location AS(
SELECT MIN(city.city_id) AS city_id, location::TEXT as location_str FROM choicelist.city
WHERE city.name ~ ?
GROUP BY location::TEXT
)
SELECT * FROM choicelist.city AS city
INNER JOIN distinct_location USING (city_id)
ORDER BY city.city_id;" "EU Only)"]
Which gives this very concerning error message: org.postgresql.util.PSQLException: ERROR: invalid regular expression: parentheses () not balanced
This sounds an awful lot like sql injection...indeed, the input somehow leaked into the SQL query; what library is it?
It's generated by HugSQL
And the query indeed works fine as long as the parameter doesn't have unbalanced parentheses
So, the chain is HugSQL -> jdbc.next -> PostgreSQL JDBC
have you looked into this? https://www.hugsql.org/#faq-sql-injection
Yes, and according to that, value parameters get deferred "to the underlying database library to perform SQL parameter binding to prevent SQL injection issues" which would be jdbc.next in this case
In order to debug this, can you provide the original HugSQL query and the call to the generated HugSQL function?
Sure, the original HugSQL query is:
-- :name get-city-by-name :? :1
WITH distinct_location AS(
SELECT MIN(city.city_id) AS city_id, location::TEXT as location_str FROM choicelist.city
WHERE city.name ~ :name
GROUP BY location::TEXT
)
SELECT * FROM choicelist.city AS city
INNER JOIN distinct_location USING (city_id)
ORDER BY city.city_id;
And the call: (get-city-by-name db {:name "EU Only)"})
(or, for the sake of getting that intermediate output, (get-city-by-name-sqlvec {:name "EU Only)"})
)
So, the error is in the regular expression syntax expected by Postgresql. invalid regular expression: parentheses () not balanced
Ok, so no injection going on then
That's a relief
correct
I would've been very surprised by the way, this would be such a basic thing, pretty sure this is exactly what the parametrized queries are for in the first place
Now to find out why PostgreSQL wants to treat it as a regex... The database field type is CITEXT
, a case-insensitive text field
Ah lol, of course, because of the ~
operator
Alright, thank you very much for helping me figure things out!
you’re welcome!