Joining two tables on a list of values
Hello, I would like to know if is there some (efficient!) way to join two tables on a list of values contained in (likely a single column) of the joined table, whatever form could have this list of values ? For example, I have a main table containing: main_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | 0 | | | to_join | int(11) | NO | | 0 | | | <other fields, irrelevant> | | || | +------------+------------------+------+-----+------------+----------------+ … and a secondary table containing: joined_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | | auto_increment | | int_list | <list_of_ints> | YES | | (empty) | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+ We could suppose, here, as a simple test case, that main_table has five rows where to_join contains respectively 1, 2, 3, 4, and 5, and the joined_table has a single row with the single column int_list combining 2 and 4 in any form (but of course, this list could actually have any length). I'd like to join these tables using a statement like: SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join <some_operator to> J.int_list; … in such a way that main_table would be joined to joined_table using any value contained in int_list, and the statement is expected to give two rows as a result. I tried to set int_list as a TINYTEXT containing "2,4" and use: SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join IN (J.int_list); … but this doesn't work, of course, my comma separated list is seen as a whole single value. I also tried, still using TINYTEXT, the list as "[2][4]" and use: SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON J.int_listLIKE CONCAT('%[', M.to_join , ']%'); This works (more or less), but it is excruciatingly slow, especially knowing that main_table can actually contain millions of rows, and joined_table several thousands. Is there a better solution for doing that? As the final goal implies, somewhere later, associating to_join in main_table, via int_list, to the index auto increment primary key of the joined_table, using multiple rows there is not an option. Regards, Gingko
Well not really an answer given the requirement you posted but you can break the list in the second table in a temporary table and join that one. so you will go from index, list to index list[0] index list[1] index list[2] ... I have a stored procedure somewhere to split the list but it is slow to do in MariaDB but it will be ok if the number of rows is not very big (some thousands). For these kind of task I prefer to break to awk or shell in general. Hope that helps On 9/13/24 5:09 PM, Gingko via discuss wrote:
Hello,
I would like to know if is there some (efficient!) way to join two tables on a list of values contained in (likely a single column) of the joined table, whatever form could have this list of values ?
For example, I have a main table containing:
main_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | 0 | | | to_join | int(11) | NO | | 0 | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
… and a secondary table containing:
joined_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | | auto_increment | | int_list | <list_of_ints> | YES | | (empty) | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
We could suppose, here, as a simple test case, that main_table has five rows where to_join contains respectively 1, 2, 3, 4, and 5, and the joined_table has a single row with the single column int_list combining 2 and 4 in any form (but of course, this list could actually have any length).
I'd like to join these tables using a statement like:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join <some_operator to> J.int_list;
… in such a way that main_table would be joined to joined_table using any value contained in int_list, and the statement is expected to give two rows as a result.
I tried to set int_list as a TINYTEXT containing "2,4" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join IN (J.int_list);
… but this doesn't work, of course, my comma separated list is seen as a whole single value.
I also tried, still using TINYTEXT, the list as "[2][4]" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON J.int_listLIKE CONCAT('%[', M.to_join , ']%');
This works (more or less), but it is excruciatingly slow, especially knowing that main_table can actually contain millions of rows, and joined_table several thousands.
Is there a better solution for doing that?
As the final goal implies, somewhere later, associating to_join in main_table, via int_list, to the index auto increment primary key of the joined_table, using multiple rows there is not an option.
Regards,
Gingko
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
Hi, You could use JSON_TABLE() <https://mariadb.com/kb/en/json_table/> if you store the values as JSON. The nice thing about JSON is that MariaDB already has a lot of functions to work with it. The syntax for converting each list in a row into its own row is IMO a bit tricky to get right the first time but it seems to work. Here's an example that I tested it with: CREATE TABLE main_table(id INT PRIMARY KEY NOT NULL, to_join INT NOT NULL); CREATE TABLE joined_table(id INT PRIMARY KEY NOT NULL, int_join JSON NOT NULL); INSERT INTO main_table VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5); INSERT INTO joined_table VALUES (123, '[2,4]'), (456, '[3,5]'), (789, '[1, 5, 3]'); -- This one splits the join table's values into individual rows SELECT jt.*, js_t.* FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t; -- This one joins it to the main table SELECT mt.id AS main_table_id, jt.id AS joined_table_id, js_t.id AS joined_on FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t JOIN main_table mt ON (js_t.id = mt.to_join) ORDER BY jt.id; The SQLFiddle for it can be found here: https://sqlfiddle.com/mariadb/online-compiler?id=5f6d5772-815f-48a4-977c-990... Markus Mäkelä On 9/13/24 17:09, Gingko via discuss wrote:
Hello,
I would like to know if is there some (efficient!) way to join two tables on a list of values contained in (likely a single column) of the joined table, whatever form could have this list of values ?
For example, I have a main table containing:
main_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | 0 | | | to_join | int(11) | NO | | 0 | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
… and a secondary table containing:
joined_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | | auto_increment | | int_list | <list_of_ints> | YES | | (empty) | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
We could suppose, here, as a simple test case, that main_table has five rows where to_join contains respectively 1, 2, 3, 4, and 5, and the joined_table has a single row with the single column int_list combining 2 and 4 in any form (but of course, this list could actually have any length).
I'd like to join these tables using a statement like:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join <some_operator to> J.int_list;
… in such a way that main_table would be joined to joined_table using any value contained in int_list, and the statement is expected to give two rows as a result.
I tried to set int_list as a TINYTEXT containing "2,4" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join IN (J.int_list);
… but this doesn't work, of course, my comma separated list is seen as a whole single value.
I also tried, still using TINYTEXT, the list as "[2][4]" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON J.int_listLIKE CONCAT('%[', M.to_join , ']%');
This works (more or less), but it is excruciatingly slow, especially knowing that main_table can actually contain millions of rows, and joined_table several thousands.
Is there a better solution for doing that?
As the final goal implies, somewhere later, associating to_join in main_table, via int_list, to the index auto increment primary key of the joined_table, using multiple rows there is not an option.
Regards,
Gingko
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
-- Markus Mäkelä, Senior Software Engineer MariaDB Corporation
Hello, and thank you very much. Actually I thought to that one already. But I have reasons to think this doesn't have a lot of chances to be more efficient than the “LIKE” method against TINYTEXT. How could I expect MariaDB to use any index with JSON_TABLE()? Regards, Gingko
*De :* Markus Mäkelä via discuss [mailto:discuss@lists.mariadb.org] *Envoyé :* samedi 14 septembre 2024 à 9:19 AM *Pour :* discuss@lists.mariadb.org *Objet :* [MariaDB discuss] Re: Joining two tables on a list of values
Hi,
You could use JSON_TABLE() <https://mariadb.com/kb/en/json_table/> if you store the values as JSON. The nice thing about JSON is that MariaDB already has a lot of functions to work with it.
The syntax for converting each list in a row into its own row is IMO a bit tricky to get right the first time but it seems to work. Here’s an example that I tested it with:
CREATE TABLE main_table(id INT PRIMARY KEY NOT NULL, to_join INT NOT NULL); CREATE TABLE joined_table(id INT PRIMARY KEY NOT NULL, int_join JSON NOT NULL); INSERT INTO main_table VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5); INSERT INTO joined_table VALUES (123, '[2,4]'), (456, '[3,5]'), (789, '[1, 5, 3]');
-- This one splits the join table’s values into individual rows SELECT jt.*, js_t.* FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t;
-- This one joins it to the main table SELECT mt.id AS main_table_id, jt.id AS joined_table_id, js_t.id AS joined_on FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t JOIN main_table mt ON (js_t.id = mt.to_join) ORDER BY jt.id;
The SQLFiddle for it can be found here: https://sqlfiddle.com/mariadb/online-compiler?id=5f6d5772-815f-48a4-977c-990...
Markus Mäkelä
On 9/13/24 17:09, Gingko via discuss wrote:
Hello,
I would like to know if is there some (efficient!) way to join two tables on a list of values contained in (likely a single column) of the joined table, whatever form could have this list of values ?
For example, I have a main table containing:
main_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | 0 | | | to_join | int(11) | NO | | 0 | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
… and a secondary table containing:
joined_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | | auto_increment | | int_list | <list_of_ints> | YES | | (empty) | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
We could suppose, here, as a simple test case, that main_table has five rows where to_join contains respectively 1, 2, 3, 4, and 5, and the joined_table has a single row with the single column int_list combining 2 and 4 in any form (but of course, this list could actually have any length).
I’d like to join these tables using a statement like:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join <some_operator to> J.int_list;
… in such a way that main_table would be joined to joined_table using any value contained in int_list, and the statement is expected to give two rows as a result.
I tried to set int_list as a TINYTEXT containing "2,4" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join IN (J.int_list);
… but this doesn’t work, of course, my comma separated list is seen as a whole single value.
I also tried, still using TINYTEXT, the list as "[2][4]" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON J.int_list LIKE CONCAT('%[', M.to_join , ']%');
This works (more or less), but it is excruciatingly slow, especially knowing that main_table can actually contain millions of rows, and joined_table several thousands.
Is there a better solution for doing that?
As the final goal implies, somewhere later, associating to_join in main_table, via int_list, to the index auto increment primary key of the joined_table, using multiple rows there is not an option.
Regards,
Gingko _______________________________________________
discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
-- Markus Mäkelä, Senior Software Engineer MariaDB Corporation
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
Hi, Any chance you can split that out into a related table? This would also allow adding foreign key constraints, which implicitly enforces you to have appropriate indexes loaded. Kind regards, Jaco On 2024/09/14 14:59, Gingko via discuss wrote:
Hello, and thank you very much.
Actually I thought to that one already.
But I have reasons to think this doesn't have a lot of chances to be more efficient than the “LIKE” method against TINYTEXT.
How could I expect MariaDB to use any index with JSON_TABLE()?
Regards,
Gingko
*De :* Markus Mäkelä via discuss [mailto:discuss@lists.mariadb.org] *Envoyé :* samedi 14 septembre 2024 à 9:19 AM *Pour :* discuss@lists.mariadb.org *Objet :* [MariaDB discuss] Re: Joining two tables on a list of values
Hi,
You could use JSON_TABLE() <https://mariadb.com/kb/en/json_table/> if you store the values as JSON. The nice thing about JSON is that MariaDB already has a lot of functions to work with it.
The syntax for converting each list in a row into its own row is IMO a bit tricky to get right the first time but it seems to work. Here’s an example that I tested it with:
CREATE TABLE main_table(id INT PRIMARY KEY NOT NULL, to_join INT NOT NULL); CREATE TABLE joined_table(id INT PRIMARY KEY NOT NULL, int_join JSON NOT NULL); INSERT INTO main_table VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5); INSERT INTO joined_table VALUES (123, '[2,4]'), (456, '[3,5]'), (789, '[1, 5, 3]');
-- This one splits the join table’s values into individual rows SELECT jt.*, js_t.* FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t;
-- This one joins it to the main table SELECT mt.id AS main_table_id, jt.id AS joined_table_id, js_t.id AS joined_on FROM joined_table AS jt CROSS JOIN JSON_TABLE( jt.int_join, '$[*]' COLUMNS( id INT PATH '$' ) ) js_t JOIN main_table mt ON (js_t.id = mt.to_join) ORDER BY jt.id;
The SQLFiddle for it can be found here: https://sqlfiddle.com/mariadb/online-compiler?id=5f6d5772-815f-48a4-977c-990...
Markus Mäkelä
On 9/13/24 17:09, Gingko via discuss wrote:
Hello,
I would like to know if is there some (efficient!) way to join two tables on a list of values contained in (likely a single column) of the joined table, whatever form could have this list of values ?
For example, I have a main table containing:
main_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | 0 | | | to_join | int(11) | NO | | 0 | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
… and a secondary table containing:
joined_table: +------------+------------------+------+-----+------------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+------------+----------------+ | index | int(11) | NO | PRI | | auto_increment | | int_list | <list_of_ints> | YES | | (empty) | | | <other fields, irrelevant> | | | | | +------------+------------------+------+-----+------------+----------------+
We could suppose, here, as a simple test case, that main_table has five rows where to_join contains respectively 1, 2, 3, 4, and 5, and the joined_table has a single row with the single column int_list combining 2 and 4 in any form (but of course, this list could actually have any length).
I’d like to join these tables using a statement like:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join <some_operator to> J.int_list;
… in such a way that main_table would be joined to joined_table using any value contained in int_list, and the statement is expected to give two rows as a result.
I tried to set int_list as a TINYTEXT containing "2,4" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON M.to_join IN (J.int_list);
… but this doesn’t work, of course, my comma separated list is seen as a whole single value.
I also tried, still using TINYTEXT, the list as "[2][4]" and use:
SELECT <anything> FROM main_table AS M LEFT JOIN joined_table AS J ON J.int_list LIKE CONCAT('%[', M.to_join , ']%');
This works (more or less), but it is excruciatingly slow, especially knowing that main_table can actually contain millions of rows, and joined_table several thousands.
Is there a better solution for doing that?
As the final goal implies, somewhere later, associating to_join in main_table, via int_list, to the index auto increment primary key of the joined_table, using multiple rows there is not an option.
Regards,
Gingko _______________________________________________
discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
-- Markus Mäkelä, Senior Software Engineer MariaDB Corporation
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
_______________________________________________ discuss mailing list --discuss@lists.mariadb.org To unsubscribe send an email todiscuss-leave@lists.mariadb.org
This scales badly but one possible option depending on use case. int_list would be comma separated values stored as a string ("1,2,3,4"). And couldn't really find any info is there a limit on the length of the string find_in_set can handle (tried with int_list containing 6000 numbers comma separated and worked). SELECT t1.index,t1.to_join,t2.index FROM main_table AS t1 LEFT JOIN joined_table AS t2 ON FIND_IN_SET(t1.to_join,t2.int_list)>0 -- Kaj
participants (5)
-
Gingko
-
Jaco Kroon
-
kpesio@gmail.com
-
Markus Mäkelä
-
Vassilis Virvilis