The phrases “natural join” and “equi-join” are often used as synonyms, but there is a slight difference between them. The equi-join operation always has one or more pairs of columns that have identical values in every row. The operation that eliminates such columns from the equi-join is called a natural join. Natural join is best explained through the use of an example, so check out Example 6.57
Get full details of each employee; that is, besides the employee’s number, first and last names, and corresponding department number, also get the name of his or her department and its location, with duplicate columns displayed.
The following is the explicit join syntax:
USE sample;
SELECT employee.*, department.*
FROM employee INNER JOIN department
ON employee.dept_no = department.dept_no;
Code language: PHP (php)
The SELECT list in Example 6.57 includes all columns of the employee and department tables. The FROM clause in the SELECT statement specifies the tables that are joined as well as the explicit name of the join form (INNER JOIN). The ON clause is also part of the FROM clause; it specifies the join columns from both tables.The condition employee.dept_no = department.dept_no specifies a join condition, and both columns are said to be join columns.
The equivalent solution with the old-style, implicit join syntax is as follows:
USE sample;
SELECT employee.*, department.*
FROM employee, department
WHERE employee.dept_no = department.dept_no;
Code language: PHP (php)
This syntax has two significant differences from the explicit join syntax: the FROM clause of the query contains the list of tables that are joined, and the corresponding join condition is specified in the WHERE clause using join columns.
The result is
emp_no | emp_fname | emp_lname | dept_no | dept_no | dept_name | location |
---|---|---|---|---|---|---|
25348 | Matthew | Smith | d3 | d3 | Marketing | Dallas |
10102 | Ann | Jones | d3 | d3 | Marketing | Dallas |
18316 | John | Barrimore | d1 | d1 | Research | Dallas |
29346 | James | James | d2 | d2 | Accounting | Seattle |
9031 | Elsa | Bertoni | d2 | d2 | Accounting | Seattle |
2581 | Elke | Hansel | d2 | d2 | Accounting | Seattle |
28559 | Sybill | Moser | d1 | d1 | Research | Dallas |
Example 6.57 can be used to show how a join operation works. Note that this is just an illustration of how you can think about the join process; the Database Engine actually has several strategies from which it chooses to implement the join operator. Imagine each row of the employee table combined with each row of the department table. The result of this combination is a table with 7 columns (4 from the table employee and 3 from the table department) and 21 rows (see Table 4-1).
In the second step, all rows from Table 4-1 that do not satisfy the join condition employee.dept_no = department.dept_no are removed. These rows are prefixed in Table 4-1 with the * sign. The rest of the rows represent the result of Example 6.57.
Table 4-1 Result of the Cartesian Product Between the Tables employee and department
emp_no | emp_fname | emp_lname | dept_no | dept_no | dept_name | location |
---|---|---|---|---|---|---|
*25348 | Matthew | Smith | d3 | d1 | Research | Dallas |
*10102 | Ann | Jones | d3 | d1 | Research | Dallas |
18316 | John | Barrimore | d1 | d1 | Research | Dallas |
*29346 | James | James | d2 | d1 | Research | Dallas |
*9031 | Elsa | Bertoni | d2 | d1 | Research | Dallas |
*2581 | Elke | Hansel | d2 | d1 | Research | Dallas |
28559 | Sybill | Moser | d1 | d1 | Research | Dallas |
*25348 | Matthew | Smith | d3 | d2 | Accounting | Seattle |
*10102 | Ann | Jones | d3 | d2 | Accounting | Seattle |
*18316 | John | Barrimore | d1 | d2 | Accounting | Seattle |
29346 | James | James | d2 | d2 | Accounting | Seattle |
9031 | Elsa | Bertoni | d2 | d2 | Accounting | Seattle |
2581 | Elke | Hansel | d2 | d2 | Accounting | Seattle |
*28559 | Sybill | Moser | d1 | d2 | Accounting | Seattle |
25348 | Matthew | Smith | d3 | d3 | Marketing | Dallas |
10102 | Ann | Jones | d3 | d3 | Marketing | Dallas |
*18316 | John | Barrimore | d1 | d3 | Marketing | Dallas |
*29346 | James | James | d2 | d3 | Marketing | Dallas |
*9031 | Elsa | Bertoni | d2 | d3 | Marketing | Dallas |
*2581 | Elke | Hansel | d2 | d3 | Marketing | Dallas |
*28559 | Sybill | Moser | d1 | d3 | Marketing | Dallas |
The semantics of the corresponding join columns must be identical. This means both columns must have the same logical meaning. It is not required that the corresponding join columns have the same name (or even an identical type), although this will often be the case.
The sample database contains three pairs of columns in which each column of the pair has the same logical meaning (and they have the same names as well). The employee and department tables can be joined using the columns employee.dept_no and department.dept_no. The join columns of the employee and works_on tables are the columns employee.emp_no and works_on.emp_no. Finally, the project and works_on tables can be joined using the join columns project.project_no and works_on.project_no.
The names of columns in a SELECT statement can be qualified. “Qualifying” a column name means that, to avoid any possible ambiguity about which table the column belongs to, the column name is preceded by its table name (or the alias of the table), separated by a period: table_name.column_name.
In most SELECT statements a column name does not need any qualification, although the use of qualified names is generally recommended for readability. If column names within a SELECT statement are ambiguous (like the columns employee.dept_no and department.dept_no in Example 6.57), the qualified names for the columns must be used.
In a SELECT statement with a join, the WHERE clause can include other conditions in addition to the join condition, as shown in Example 6.58.
Get full details of all employees who work on the project Gemini.
Explicit join syntax:
USE sample;
SELECT emp_no, project.project_no, job, enter_date, project_name, budget
FROM works_on JOIN project
ON project.project_no = works_on.project_no
WHERE project_name = 'Gemini';
Code language: PHP (php)
Old-style join syntax:
USE sample;
SELECT emp_no, project.project_no, job, enter_date, project_name, budget
FROM works_on, project
WHERE project.project_no = works_on.project_no
AND project_name = 'Gemini';
Code language: PHP (php)
The result is
emp_no | project_no | job | enter_date | project_name | budget |
---|---|---|---|---|---|
25348 | p2 | Clerk | 2007-02-15 | Gemini | 95000.0 |
18316 | p2 | NULL | 2007-06-01 | Gemini | 95000.0 |
29346 | p2 | NULL | 2006-12-15 | Gemini | 95000.0 |
28559 | p2 | Clerk | 2008-02-01 | Gemini | 95000.0 |
From this point forward, all examples will be implemented using the explicit join syntax only.
Example 6.59 shows another use of the inner join.
Get the department number for all employees who entered their projects on October 15, 2007:
USE sample;
SELECT dept_no
FROM employee JOIN works_on
ON employee.emp_no = works_on.emp_no
WHERE enter_date = '10.15.2007';
Code language: PHP (php)
The result is
dept_no |
---|
d2 |
Joining More Than Two Tables
Theoretically, there is no upper limit on the number of tables that can be joined using a SELECT statement. (One join condition always combines two tables!) However, the Database Engine has an implementation restriction: the maximum number of tables that can be joined in a SELECT statement is 64.
Example 6.60 joins three tables of the sample database.
Get the first and last names of all analysts whose department is located in Seattle:
USE sample;
SELECT emp_fname, emp_lname
FROM works_on JOIN employee ON works_on.emp_no=employee.emp_no
JOIN department ON employee.dept_no=department.dept_no
AND location = 'Seattle'
AND job = 'analyst';
Code language: PHP (php)
The result is
emp_fname | emp_lname |
---|---|
Elke | Hansel |
The result in Example 6.60 can be obtained only if you join at least three tables: works_on, employee, and department. These tables can be joined using two pairs of join columns:
(works_on.emp_no, employee.emp_no)
(employee.dept_no, department.dept_no)
Code language: CSS (css)
Example 6.61 uses all four tables from the sample database to obtain the result set.
Get the names of projects (with redundant duplicates eliminated) being worked on by employees in the Accounting department:
USE sample;
SELECT DISTINCT project_name
FROM project JOIN works_on
ON project.project_no = works_on.project_no
JOIN employee ON works_on.emp_no = employee.emp_no
JOIN department ON employee.dept_no = department.dept_no
WHERE dept_name = 'Accounting';
Code language: PHP (php)
The result is
project_name |
---|
Apollo |
Gemini |
Mercury |
Notice that when joining three tables, you use two join conditions (linking two tables each) to achieve a natural join. When you join four tables, you use three such join conditions. In general, if you join n tables, you need n – 1 join conditions to avoid a Cartesian product. Of course, using more than n – 1 join conditions, as well as other conditions, is certainly permissible to further reduce the result set.
Cartesian Product
The previous section illustrated a possible method of producing a natural join. In the first step of this process, each row of the employee table is combined with each row of the department table. This intermediate result was made by the operation called Cartesian product. Example 6.62 shows the Cartesian product of the tables employee and department.
USE sample;
SELECT employee.*, department.*
FROM employee CROSS JOIN department;
Code language: PHP (php)
The result of Example 6.62 is shown in Table 4-1. A Cartesian product combines each row of the first table with each row of the second table. In general, the Cartesian product of two tables such that the first table has n rows and the second table has m rows will produce a result with n times m rows (or n*m). Thus, the result set in Example 6.62 contains 7*3 = 21 rows.
In practice, the use of a Cartesian product is highly unusual. Sometimes users generate the Cartesian product of two tables when they forget to include the join condition in the WHERE clause of the old-style join syntax. In this case, the output does not correspond to the expected result because it contains too many rows. (The existence of many and unexpected rows in the result is a hint that a Cartesian product of two tables, rather than the intended natural join, has been produced.)