join in SQL

In this post, I'll provide a summary on the usage of join statement in SQL and use Leetcode 175 as a concrete example to show several equivalent join statement usage.

To be honest, this is my third time visiting this material. The very first time happened when I took DB course in college, and the second time was when I joined the federation team at IBM and learned about DB2. Unfortunately, I didn't keep my study notes well in the first two tries and I don't write SQL a lot during my day to day work. Things, again, get rusty very quickly. This time I want to do a better job by, at least, saving my notes in a good place ¹.

Summary
Motivation
- Cartesian product
join
- SQL perspective
  - inner join
  - outer join
    - Motivation
    - outer join
- Relational algebra perspective
Examples
- Leetcode 175
- Example Two
Links to resources

Summary

natural join produces a relation from two relations by considering only those pairs of tuples with the same value on those attributes that appear in the schemas of both relations.
join ... using specifies a subset of common attributes to join.
join ... on specifies a predicate to use on join.
outer join is used when we want to preserve the tuples that may have null value on the common attributes of either or both of the relations that we want to join on.

Motivation

Before we directly jump into the SQL, I want to briefly talks about the motivation for the join statement. Specifically, why do we need it? In short, join is used as a shorthand for a widely-used type of query where we want to equate two columns in two tables in the where clause (e.g., T1.a = T2.a).

To retrieve data from multiple relations (i.e. more than one table), we can either use a cartesian product or join of columns of the same data type.

Cartesian product

The cartesian product happens to the relations listed in the from clause of a SQL. The end result of cartesian product is a relation that has all attributes from all the relations in the from clause. The following iterative process shows how the cartesian product of the relations in the from clause get generated

for each tuple t1 in relation r1
  for each tuple t2 in relation r2
    . . .
    for each tuple tm in relation rm
      Concatenate t1, t2, . . . , tm into a single tuple t
      Add t into the result relation

Another perspective to understand cartesian product is from relation algebra cross-product \(R \times S\), which is defined as: returns a relation instance whose schema contains all the fields of \(R\) (in the same order as they appear in \(R\)) followed by all the fields of \(S\) (in the same order as they appear in \(S\)). The result of \(R \times S\) contains all tuples \((r,s)\) (the concatenation of tuples \(r\) and \(s\)) for each pair of tuples \(r \in R\), \(s \in S\).

To see a concrete example, let's consider the following SQL

select * from teaches, instructor;

The teaches and instructor table look like below

sqlite> select * from teaches;
ID|course_id|sec_id|semester|year
10101|CS-101|1|Fall|2009
10101|CS-315|1|Spring|2010
10101|CS-347|1|Fall|2009
12121|FIN-201|1|Spring|2010
15151|MU-199|1|Spring|2010
22222|PHY-101|1|Fall|2009
32343|HIS-351|1|Spring|2010
45565|CS-101|1|Spring|2010
45565|CS-319|1|Spring|2010
76766|BIO-101|1|Summer|2009
76766|BIO-301|1|Summer|2010
83821|CS-190|1|Spring|2009
83821|CS-190|2|Spring|2009
83821|CS-319|2|Spring|2010
98345|EE-181|1|Spring|2009

sqlite> select * from instructor;
ID|name|dept_name|salary
10101|Srinivasan|Comp. Sci.|65000
12121|Wu|Finance|90000
15151|Mozart|Music|40000
22222|Einstein|Physics|95000
32343|El Said|History|60000
33456|Gold|Physics|87000
45565|Katz|Comp. Sci.|75000
58583|Califieri|History|62000
76543|Singh|Finance|80000
76766|Crick|Biology|72000
83821|Brandt|Comp. Sci.|92000
98345|Kim|Elec. Eng.|80000

teaches table has 15 rows; instructor table has 12 rows. Then, if we run our SQL above, we get our result looks like

sqlite> select * from instructor, teaches limit 20;
ID|name|dept_name|salary|ID|course_id|sec_id|semester|year
10101|Srinivasan|Comp. Sci.|65000|10101|CS-101|1|Fall|2009
10101|Srinivasan|Comp. Sci.|65000|10101|CS-315|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|10101|CS-347|1|Fall|2009
10101|Srinivasan|Comp. Sci.|65000|12121|FIN-201|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|15151|MU-199|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|22222|PHY-101|1|Fall|2009
10101|Srinivasan|Comp. Sci.|65000|32343|HIS-351|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|45565|CS-101|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|45565|CS-319|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|76766|BIO-101|1|Summer|2009
10101|Srinivasan|Comp. Sci.|65000|76766|BIO-301|1|Summer|2010
10101|Srinivasan|Comp. Sci.|65000|83821|CS-190|1|Spring|2009
10101|Srinivasan|Comp. Sci.|65000|83821|CS-190|2|Spring|2009
10101|Srinivasan|Comp. Sci.|65000|83821|CS-319|2|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|98345|EE-181|1|Spring|2009
12121|Wu|Finance|90000|10101|CS-101|1|Fall|2009
12121|Wu|Finance|90000|10101|CS-315|1|Spring|2010
12121|Wu|Finance|90000|10101|CS-347|1|Fall|2009
12121|Wu|Finance|90000|12121|FIN-201|1|Spring|2010
12121|Wu|Finance|90000|15151|MU-199|1|Spring|2010

The result relation has 180 rows, which exactly equal to \(12 \times 15\).

Quite often, we use where clause to restrict the combinations created by the cartesian product to those that are meaningful for the desired answer. For example:

select name, course id
from instructor, teaches
where instructor.ID = teaches.ID;

In this query, we combine information from the instructor and teaches table and the matching condition requires instructor.ID to be equal to teaches.ID. In fact, these are the only attributes in the two relations that have the same name. In general, we may often find us writing SQLs that requires all attributes with matching names to be equated in the where clause. This case is so common that we use join to save us some effort.

join

In this section, I'll talk about join from SQL perspective, and then I'll also present how join is actually defined in relational algebra.

SQL perspective

There are two basic types of join: inner join and outer join. inner keyword is optional. In other words, if only join appears in the SQL statement, we usually assume it to be inner join. Under outer join, we can further specify whether it is left outer join, right outer join, or full outer join. Here is a graphic summary for the text above

- (inner) join
- outer join
   |- left outer join
   |- right outer join
   |- full outer join

In addition, there are join conditions that we can use in combination with the join form mentioned above. Any form of join (inner, left outer, right outer, or full outer) can be combined with any join condition (natural, using, or on). The table below provides a summary of join types and join conditions

Join types	Join conditions
inner join	natural
left outer join	on
right outer join	using (\(A_1, A_2, \dots, A_n\))
full outer join

Then the SQL syntax is

[table1] <natural> [Join types] [table2] <on | using>

inner join

natural join

We start this section by considering natural join. The natural join works on two relations and produces a relation as the result. Unlike the cartesian product of two relations, which concatenates each tuple of the first relation with every tuple of the second, natural join considers only those pairs of tuples with the same value on those attributes that appear in the schemas of both relations.

select * from instructor natural join teaches;

Consider the query above, computing instructor natural join teaches considers only those pairs of tuples where both the tuple from instructor and the tuple from teaches have the same value on the common attribute, ID.

sqlite> select * from instructor natural join teaches;
ID|name|dept_name|salary|course_id|sec_id|semester|year
10101|Srinivasan|Comp. Sci.|65000|CS-101|1|Fall|2009
10101|Srinivasan|Comp. Sci.|65000|CS-315|1|Spring|2010
10101|Srinivasan|Comp. Sci.|65000|CS-347|1|Fall|2009
12121|Wu|Finance|90000|FIN-201|1|Spring|2010
15151|Mozart|Music|40000|MU-199|1|Spring|2010
22222|Einstein|Physics|95000|PHY-101|1|Fall|2009
32343|El Said|History|60000|HIS-351|1|Spring|2010
45565|Katz|Comp. Sci.|75000|CS-101|1|Spring|2010
45565|Katz|Comp. Sci.|75000|CS-319|1|Spring|2010
76766|Crick|Biology|72000|BIO-101|1|Summer|2009
76766|Crick|Biology|72000|BIO-301|1|Summer|2010
83821|Brandt|Comp. Sci.|92000|CS-190|1|Spring|2009
83821|Brandt|Comp. Sci.|92000|CS-190|2|Spring|2009
83821|Brandt|Comp. Sci.|92000|CS-319|2|Spring|2010
98345|Kim|Elec. Eng.|80000|EE-181|1|Spring|2009

Since join is really a shorthand of writing a type of SQL with cartesian product, we can get the same result using select * from instructor, teaches where instructor.ID = teaches.ID;. From above query result we can see that:

We do not repeat those attributes that appear in the schemas of both relations; rather they appear only once (e.g. only one ID column, not two).
The order in which the attributes are listed: first the attributes common to the schemas of both relations, second those attributes unqiue to the schema of the first relation, and finally, those attribute unique to the schema of the second relation.
All the columns from instructor table (4 columns) and teaches (5 columns) table show up in the final result (8 columns in total except ID column showing up once).

In addition, natural join will consider ALL the attributes that appear in the schemas of both relations. Consider the following query

select *
from instructor natural join teaches natural join course;

course table looks like the following

sqlite> select * from course;
course_id|title|dept_name|credits
BIO-101|Intro. to Biology|Biology|4
BIO-301|Genetics|Biology|4
BIO-399|Computational Biology|Biology|3
CS-101|Intro. to Computer Science|Comp. Sci.|4
CS-190|Game Design|Comp. Sci.|4
CS-315|Robotics|Comp. Sci.|3
CS-319|Image Processing|Comp. Sci.|3
CS-347|Database System Concepts|Comp. Sci.|3
EE-181|Intro. to Digital Systems|Elec. Eng.|3
FIN-201|Investment Banking|Finance|3
HIS-351|World History|History|3
MU-199|Music Video Production|Music|3
PHY-101|Physical Principles|Physics|4

instructor has attributes (ID|name|dept_name|salary); teaches has attributes (ID|course_id|sec_id|semester|year); course has attributes (course_id|title|dept_name|credits). The first natural join will first do cartesian product of instructor and teaches and keep the tuples that have the same value on ID. Then, the resulting relation will do the second natural join with course and will keep the tuples that have the same value on course_id and dept_name.

sqlite> select * from instructor natural join teaches natural join course;
ID|name|dept_name|salary|course_id|sec_id|semester|year|title|credits
10101|Srinivasan|Comp. Sci.|65000|CS-101|1|Fall|2009|Intro. to Computer Science|4
10101|Srinivasan|Comp. Sci.|65000|CS-315|1|Spring|2010|Robotics|3
10101|Srinivasan|Comp. Sci.|65000|CS-347|1|Fall|2009|Database System Concepts|3
12121|Wu|Finance|90000|FIN-201|1|Spring|2010|Investment Banking|3
15151|Mozart|Music|40000|MU-199|1|Spring|2010|Music Video Production|3
22222|Einstein|Physics|95000|PHY-101|1|Fall|2009|Physical Principles|4
32343|El Said|History|60000|HIS-351|1|Spring|2010|World History|3
45565|Katz|Comp. Sci.|75000|CS-101|1|Spring|2010|Intro. to Computer Science|4
45565|Katz|Comp. Sci.|75000|CS-319|1|Spring|2010|Image Processing|3
76766|Crick|Biology|72000|BIO-101|1|Summer|2009|Intro. to Biology|4
76766|Crick|Biology|72000|BIO-301|1|Summer|2010|Genetics|4
83821|Brandt|Comp. Sci.|92000|CS-190|1|Spring|2009|Game Design|4
83821|Brandt|Comp. Sci.|92000|CS-190|2|Spring|2009|Game Design|4
83821|Brandt|Comp. Sci.|92000|CS-319|2|Spring|2010|Image Processing|3
98345|Kim|Elec. Eng.|80000|EE-181|1|Spring|2009|Intro. to Digital Systems|3

join ... using

Automatically equate all the attributes with the same name from both schemas of relations may be too strong. Quite often, we may want to do natural join on specific subsets of the common-shared attributes. This leads to our join condition grammar: join ... using (\(A_1, A_2, \dots, A_n\)). The operation requires a list of attribute names to be specified. Both inputs must have attributes with the specified names. Consider the operation \(r_1\) join \(r_2\) using ( \(A_1, A_2\) ). The operation is similar to \(r_1\) natural join \(r_2\), execpt that a pair of tuples \(t_1\) from \(r_1\) and \(t_2\) from \(r_2\) match if \(t_1.A_1 = t_2.A_1\) and \(t_1.A_2 = t_2.A_2\); even if \(r_1\) and \(r_2\) both have an attribute named \(A_3\), it is not required that \(t_1.A_3 = t_2.A_3\).

An example query look like

select name, title
from (instructor natural join teaches) join course using (course_id);

join ... on

Another form of join condition is the on condition, which allows a general predicate over the relations being joined. The predicte is written like a where clause predicate except for the use of the keyword on rahter than where.

For example, the below two queries are equivalent with each other (i.e. gives the same result)

select * from student join takes on student.ID = takes.ID;

select * from student, takes where student.ID = takes.ID;

Note

The query is almost the same as using student natural join takes, except that the "ID" columns twices in the result set.

One question we may ask is why do we need this on operation if it may look like working exactly the same as where clause? The answer is

on conditions behaves different from where conditions when we work with outer join.
SQL query is often more readable by humans if the join condition is specified in the on clause and the rest of the conditions appear in the where clause.

outer join

Motivation

Let's consider student and takes tables look like the below

sqlite> select * from student;
ID|name|dept_name|tot_cred
00128|Zhang|Comp. Sci.|102
12345|Shankar|Comp. Sci.|32
19991|Brandt|History|80
23121|Chavez|Finance|110
44553|Peltier|Physics|56
45678|Levy|Physics|46
54321|Williams|Comp. Sci.|54
55739|Sanchez|Music|38
70557|Snow|Physics|0
76543|Brown|Comp. Sci.|58
76653|Aoi|Elec. Eng.|60
98765|Bourikas|Elec. Eng.|98
98988|Tanaka|Biology|120

sqlite> select * from takes;
ID|course_id|sec_id|semester|year|grade
00128|CS-101|1|Fall|2009|A
00128|CS-347|1|Fall|2009|A-
12345|CS-101|1|Fall|2009|C
12345|CS-190|2|Spring|2009|A
12345|CS-315|1|Spring|2010|A
12345|CS-347|1|Fall|2009|A
19991|HIS-351|1|Spring|2010|B
23121|FIN-201|1|Spring|2010|C+
44553|PHY-101|1|Fall|2009|B-
45678|CS-101|1|Fall|2009|F
45678|CS-101|1|Spring|2010|B+
45678|CS-319|1|Spring|2010|B
54321|CS-101|1|Fall|2009|A-
54321|CS-190|2|Spring|2009|B+
55739|MU-199|1|Spring|2010|A-
76543|CS-101|1|Fall|2009|A
76543|CS-319|2|Spring|2010|A
76653|EE-181|1|Spring|2009|C
98765|CS-101|1|Fall|2009|C-
98765|CS-315|1|Spring|2010|B
98988|BIO-101|1|Summer|2009|A
98988|BIO-301|1|Summer|2010|

Suppose we wish to display a list of all students along with the courses they have taken. We come up a query that looks like

select *
from student natural join takes;

This query actually is not right for our purpose because it will not show the student who takes no course. This is because his ID will only appear in student table not in takes table. If we do natural join, the value of ID will not equal (one is a number and the other is null), which will not show up in our final result set. Example in our case will be student Snow with ID 70557, who has not taken any course.

sqlite> select * from student natural join takes;
ID|name|dept_name|tot_cred|course_id|sec_id|semester|year|grade
00128|Zhang|Comp. Sci.|102|CS-101|1|Fall|2009|A
00128|Zhang|Comp. Sci.|102|CS-347|1|Fall|2009|A-
12345|Shankar|Comp. Sci.|32|CS-101|1|Fall|2009|C
12345|Shankar|Comp. Sci.|32|CS-190|2|Spring|2009|A
12345|Shankar|Comp. Sci.|32|CS-315|1|Spring|2010|A
12345|Shankar|Comp. Sci.|32|CS-347|1|Fall|2009|A
19991|Brandt|History|80|HIS-351|1|Spring|2010|B
23121|Chavez|Finance|110|FIN-201|1|Spring|2010|C+
44553|Peltier|Physics|56|PHY-101|1|Fall|2009|B-
45678|Levy|Physics|46|CS-101|1|Fall|2009|F
45678|Levy|Physics|46|CS-101|1|Spring|2010|B+
45678|Levy|Physics|46|CS-319|1|Spring|2010|B
54321|Williams|Comp. Sci.|54|CS-101|1|Fall|2009|A-
54321|Williams|Comp. Sci.|54|CS-190|2|Spring|2009|B+
55739|Sanchez|Music|38|MU-199|1|Spring|2010|A-
76543|Brown|Comp. Sci.|58|CS-101|1|Fall|2009|A
76543|Brown|Comp. Sci.|58|CS-319|2|Spring|2010|A
76653|Aoi|Elec. Eng.|60|EE-181|1|Spring|2009|C
98765|Bourikas|Elec. Eng.|98|CS-101|1|Fall|2009|C-
98765|Bourikas|Elec. Eng.|98|CS-315|1|Spring|2010|B
98988|Tanaka|Biology|120|BIO-101|1|Summer|2009|A
98988|Tanaka|Biology|120|BIO-301|1|Summer|2010|

More generally, some tuples in either or both of the relations being joined may be "lost" in this way. The outer join operation works in a manner similar to the join operations we studied above, but preserve those tuples that would be lost in a join, by creating tuples in the result containing null values.

outer join

There are three forms of outer join:

The left outer join preserves tuples only in the relation named before (to the left of) the left outer join operation.
The right outer join preserves tuples only in the relation named after (to the right of) the right outer join operation.
The full outer join preserves tuples in both relations.

For our example, the actual query should be

select * 
from student natural left outer join takes;

which returns result that includes student Snow with nulls for the attributes that appear only in the schema of the take relation

sqlite> select * from student natural left outer join takes;
ID|name|dept_name|tot_cred|course_id|sec_id|semester|year|grade
00128|Zhang|Comp. Sci.|102|CS-101|1|Fall|2009|A
00128|Zhang|Comp. Sci.|102|CS-347|1|Fall|2009|A-
12345|Shankar|Comp. Sci.|32|CS-101|1|Fall|2009|C
12345|Shankar|Comp. Sci.|32|CS-190|2|Spring|2009|A
12345|Shankar|Comp. Sci.|32|CS-315|1|Spring|2010|A
12345|Shankar|Comp. Sci.|32|CS-347|1|Fall|2009|A
19991|Brandt|History|80|HIS-351|1|Spring|2010|B
23121|Chavez|Finance|110|FIN-201|1|Spring|2010|C+
44553|Peltier|Physics|56|PHY-101|1|Fall|2009|B-
45678|Levy|Physics|46|CS-101|1|Fall|2009|F
45678|Levy|Physics|46|CS-101|1|Spring|2010|B+
45678|Levy|Physics|46|CS-319|1|Spring|2010|B
54321|Williams|Comp. Sci.|54|CS-101|1|Fall|2009|A-
54321|Williams|Comp. Sci.|54|CS-190|2|Spring|2009|B+
55739|Sanchez|Music|38|MU-199|1|Spring|2010|A-
70557|Snow|Physics|0|||||
76543|Brown|Comp. Sci.|58|CS-101|1|Fall|2009|A
76543|Brown|Comp. Sci.|58|CS-319|2|Spring|2010|A
76653|Aoi|Elec. Eng.|60|EE-181|1|Spring|2009|C
98765|Bourikas|Elec. Eng.|98|CS-101|1|Fall|2009|C-
98765|Bourikas|Elec. Eng.|98|CS-315|1|Spring|2010|B
98988|Tanaka|Biology|120|BIO-101|1|Summer|2009|A
98988|Tanaka|Biology|120|BIO-301|1|Summer|2010|

We mention earlier that on and where behave differently for outer join. Let's consider the following example

select *
from student left outer join takes on true
where student.ID = takes.ID

This gives the following result ²

sqlite> select * from student left outer join takes where student.ID = takes.ID;
ID|name|dept_name|tot_cred|ID|course_id|sec_id|semester|year|grade
00128|Zhang|Comp. Sci.|102|00128|CS-101|1|Fall|2009|A
00128|Zhang|Comp. Sci.|102|00128|CS-347|1|Fall|2009|A-
12345|Shankar|Comp. Sci.|32|12345|CS-101|1|Fall|2009|C
12345|Shankar|Comp. Sci.|32|12345|CS-190|2|Spring|2009|A
12345|Shankar|Comp. Sci.|32|12345|CS-315|1|Spring|2010|A
12345|Shankar|Comp. Sci.|32|12345|CS-347|1|Fall|2009|A
19991|Brandt|History|80|19991|HIS-351|1|Spring|2010|B
23121|Chavez|Finance|110|23121|FIN-201|1|Spring|2010|C+
44553|Peltier|Physics|56|44553|PHY-101|1|Fall|2009|B-
45678|Levy|Physics|46|45678|CS-101|1|Fall|2009|F
45678|Levy|Physics|46|45678|CS-101|1|Spring|2010|B+
45678|Levy|Physics|46|45678|CS-319|1|Spring|2010|B
54321|Williams|Comp. Sci.|54|54321|CS-101|1|Fall|2009|A-
54321|Williams|Comp. Sci.|54|54321|CS-190|2|Spring|2009|B+
55739|Sanchez|Music|38|55739|MU-199|1|Spring|2010|A-
76543|Brown|Comp. Sci.|58|76543|CS-101|1|Fall|2009|A
76543|Brown|Comp. Sci.|58|76543|CS-319|2|Spring|2010|A
76653|Aoi|Elec. Eng.|60|76653|EE-181|1|Spring|2009|C
98765|Bourikas|Elec. Eng.|98|98765|CS-101|1|Fall|2009|C-
98765|Bourikas|Elec. Eng.|98|98765|CS-315|1|Spring|2010|B
98988|Tanaka|Biology|120|98988|BIO-101|1|Summer|2009|A
98988|Tanaka|Biology|120|98988|BIO-301|1|Summer|2010|

Here, left outer join esentially returns a cartesian product of two relations. Since there is no tuple in take with ID = 70577, every time a tuple appears in the outer join with name = "Snow", the values for student.ID and takes.ID must be different, and such tuples would be eliminated by the where clause predicate. Thus, student Snow never appears in the result of the latter query.

Relational algebra perspective

In the previous section, we spend quite some time understanding join from SQL perspective. In this section, we try to under the clause from relational algebra perspective.

In relational algebra, a operator accepts (one or two) relation instances as argument and returns a relation instance as the result: \(f(R_1, R_2) \to R_3\). We have the following operators:

\(\sigma\): select rows from a relation (i.e. \(\sigma_{\text{grade} < B}(takes)\))
\(\pi\): extract columns from a relation (i.e. \(\pi_{\text{ID, name}}(student)\))

Then, for join we have following definitions:

condition joins: \(R \bowtie_c S = \sigma_c (R \times S)\) (i.e. \(\text{student} \bowtie_{\text{student.id < takes.id}} \text{takes}\))

Does this look familar to you? Yes, this exactly corresponds to join ... on ... usage.

equijoin: \(R \bowtie_c S\) ( \(c\) consists solely of equalities) (i.e. \(\text{student} \bowtie_{\text{student.id = takes.id}} \text{takes}\))

This is equivalent to join ... using (...) usage.

natural join: \(R \bowtie S\) (equijoin where equalities are specified on all fields having the same name in \(R\) and \(S\)) (i.e. \(\text{student} \bowtie \text{takes}\)).

Well, this is exactly the natural join usage.

In fact, this perspective is acutally how people explain join to others. There are two excellent pages offer graphical explaination to this concept. Their links are attached in the "Links to resources" section.

Examples

Leetcode 175

Now, let's take a look at leetcode 175. Combine Two tables for practice.

The problem asks us to

Write a SQL query for a report that provides the following information (FirstName, LastName, City, State) for each person in the Person table, regardless if there is an address for each of those people.

"regardless if there is an address for each of those people" is a clear indicator for us to use outer join because we still want to keep all the person even they may not appear in the "Address" table. There are several queries we can write

# Solution 1
select FirstName, LastName, City, State from Person natural left outer join Address

# Solution 2
select FirstName, LastName, City, State from Person left outer join Address on Person.PersonId = Address.PersonId

# Solution 3
select FirstName, LastName, City, State from Person left outer join Address using (PersonId)

# Solution 4
select FirstName, LastName, City, State from Address right outer join Person using (PersonId)

# Solution 5
select FirstName, LastName, City, State from Address right outer join Person on Address.PersonId = Person.PersonId

# Solution 6
select FirstName, LastName, City, State from Address natural right outer join Person

We should have no trouble to understand these queries now.

Example Two

Suppose we have two tables t1 and t2 that are created like:

create table t1 (a int, b char(1));
create table t2 (b char(1), c char(10));

Also, t1.b contains 7 'x' and t2.b contains 3 'x'. If we do select * from t1 inner join t2 on t1.b = t2.b:

what are the columns of the resulting queries?

The result contains 4 columns: a, b, b, c. Note that one difference between join ... on and natural join or join ... using is that join ... on will keep the columns with same attributes from two tables (e.g. b in this case) while the other two will only keep one column only.

How many rows does the result set contain?

Remember join is a special case of cartesian product: only keep the rows that have the same value on the shared attributes between two tables. In this example, since there are 7 'x' in t1 and 3 'x' in t2, we will have 21 rows in the end.

Links to resources

Here are some of the resources I found helpful while preparing this article:

Database System Concepts Chapter 3, 4
DB2 10.1 fundamentals certification exam 610 prep, Part 4
Database Management Systems Chapter 4
Code project page and SO post have nice graphic explanation to this concept.
What is natural join in SQLite?

Tables appeard in this post are from the supplementary resources of the Database System Concepts book. ↩
on true is equivalent to student left outer join takes. ↩