Sat 29 Jul 2006
Those of you who have been using MySQL for sometime now would know that the MySQL 5.0 Online Reference Manual is not just a manual but also a repository of user comments exploring and solving common and/or deeper problems. If you are stuck with a particular problem that you find unable to frame a SQL for, the comments on the manual pages would usually have a solution.
For example, consider a common scenario: You have some data in table1 and table2, and you want to find the data in table1 which does not exist in table2. This is a typical outer join problem, and an anonymous user comment on the MySQL Manual page for JOIN syntax gives the solution:
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL
Now suppose you want to search not from the entire table2, but from a subset of table2 (say CityID = 1). How do you do that? You can try and see for yourself that the following SQL is not the solution:
SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id
WHERE table2.id IS NULL
and table2.CityID = 1
Note that this is because we are looking for records that do not exist in table2 and hence cannot have a where clause on table2 data.
Read the manual page a little further, and another user comment points out:
Conditions for the “right table” go in the ON clause.
Conditions for the “left table” go in the WHERE clause,
except for the joining conditions themselves.
That makes the solution as:
SELECT table1.* FROM table1 LEFT JOIN table2 ON (table1.id=table2.id and table2.CityID = 1)
WHERE table2.id IS NULL
Happy digging!
August 26th, 2006 at 9:48 am
[…] Original post by Nishith and software by Elliott Back […]
August 26th, 2006 at 10:27 pm
[…] Original post by Nishith and software by Elliott Back […]
March 9th, 2008 at 10:08 am
I’m confused why the subset of table2 is useful. If there is no match what does qualifying the join condition provide you?
Could you provide an example.
Thanks,
Brian
March 13th, 2008 at 11:09 am
Hi Brian,
Sometimes such results are needed. Lets say you have a customer table and a purchases table, and you want to target all the customers ‘that have not made a purchase yet’ for a special campaign.
This can typically be solved as:
select distinct cust_id from customers
minus
select distinct cust_id from purchases
However, this approach cannot be used for very large tables (say for telecom data) and in those cases the technique outlined in the post can be used.
Guess I should have explained this in the post itself - as the technique and its usage is non-obvious.
Glad you asked!
April 12th, 2008 at 1:36 am
Hi,
Interesting read. Just wanted to add more to Nishith’s example.
CustomerId, Location
====================
Microsoft, US
Microsoft, India
Microsoft, Japan
Sun, Japan
Sun, India
Sun, US
Purchase
========
Microsoft, US
Sun, India
Now if I want to target all those Customers in Japan who have not yet made a purchase, I would use the condition table2.country=’Japan’. Otherwise I would get both India and Japan for a customer Microsoft, and US and Japan for Sun.
regards
Anant