Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MATCH operator for faster IN lookups #35

Open
cldellow opened this issue Jul 19, 2018 · 1 comment
Open

add MATCH operator for faster IN lookups #35

cldellow opened this issue Jul 19, 2018 · 1 comment

Comments

@cldellow
Copy link
Owner

See discussion at #34.

SQLite handles SELECT * FROM tbl WHERE col IN (a, b, c) by unioning three queries (col = a, col = b, col c). The overhead to seek to a rowgroup and uncompress a large batch just for a single match is inefficient.

Instead, let's implement the MATCH function. eg:

-- for a numeric field
SELECT * FROM tbl WHERE col MATCH '1,2,3'

-- for a string field
SELECT * FROM tbl WHERE col MATCH '''a'',''b'',''c'''

This should support dynamically constructing the clause, e.g.:

SELECT * FROM tbl WHERE col MATCH (SELECT group_concat(quote(value)) FROM lookup WHERE pred)

Ideally we'd support integers, doubles and strings. Integers are highest priority since doubles are less likely to be used as a join key and strings have reasonable support via LIKE already.

Also it'd be best to support this at both the row group and row level.

@cldellow
Copy link
Owner Author

cldellow commented Jul 20, 2018

By providing a single string that specifies multiple values, this would also make it possible to express variadic IN style queries in datasette's canned query feature.

ATM you need to do a slow/hacky workaround: instr(',35300027,35300028,', ',' || uid || ',') -- would be much cleaner to write uid MATCH '35300027,35300028'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant