Subqueries are a query within a query. Subqueries let you return records from another table or database and use the subquery data set to then manipulate records in other parts of your database. Subqueries are SELECT statements nested within your other SQL statements that return a subset of data, which is usually in an external source that you can't use in a standard WHERE or IN clause.
Revisiting a SELECT Statement
Before we get into subqueries, let's revisit the concept of SELECT statements. A SELECT statement returns a list of column values. We'll use the following Customer table and Order table in our examples.
Customer
CustomerId |
First_name |
Last_name |
City |
State |
321 |
Frank |
Loe |
Dallas |
TX |
455 |
Ed |
Thompson |
Atlanta |
GA |
456 |
Ed |
Thompson |
Atlanta |
GA |
457 |
Joe |
Smith |
Miami |
FL |
458 |
Frank |
Doe |
Dallas |
TX |
Order
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
The following SQL code is a standard SQL statement that gets customer first and last names.
SELECT first_name, last_name
FROM Customer
WHERE State='tx'
If you remember from previous chapters, the WHERE clause filters rows from the returned SELECT statement and only shows records where the customer has a State value of "TX."
Moving On to a Subquery
A subquery is basically a SELECT query. The difference with a subquery is that you must be careful of the number of columns you return. For instance, the following SQL statement has a subquery that will return an error.
SELECT * FROM Customer
WHERE State IN (SELECT * FROM Order)
The error is in the WHERE clause. The IN condition asks for a State value. The subquery returns all rows from the Order table, so the outer IN clause can't identify which column it's supposed to query against.
The subquery should return one query for comparison. For instance, suppose you wanted to return a list of customers that were also in the Order table. You can see that the CustomerId is also in the Order table, so you can use a subquery to return a list of customer ID records and then use the data set to filter the Customer records.
The following SQL statement gives you an example.
SELECT * FROM Customer
WHERE CustomerId IN (SELECT CustomerId FROM Order)
Notice that the inner subquery returns only one column – the CustomerId column. This column is then used to filter records in the main, outer query in the Customer table. The result is the following data set.
CustomerId |
First_name |
Last_name |
City |
State |
321 |
Frank |
Loe |
Dallas |
TX |
455 |
Ed |
Thompson |
Atlanta |
GA |
456 |
Ed |
Thompson |
Atlanta |
GA |
The returned data set matches the customer records in the Order table.
Since a subquery is the same as a regular SELECT statement, you can also add all of the conditions and clauses we've learned so far. First, you can add a WHERE clause to the SELECT statement.
Suppose you want a list of customers that placed orders in March. The order date is in the Order table, but you only have the customer ID in your Order table. You need the Customer table as the outer query and the subquery on the Order table as the inner query. The following code performs the query.
SELECT * FROM Customer
WHERE CustomerId IN (SELECT CustomerId FROM Order WHERE OrderDate BETWEEN ‘3/1/2014' AND ‘3/31/2014')
When the above statement runs, the SQL engine first runs the subquery SELECT statement. This subquery returns a record set with just a list of customer IDs that match the filtered WHERE clause. In this example, the results are any orders created in March. The inner subquery returns two records for March or two customer ID records.
Next, the outer SELECT query is executed. The asterisk is used to return all columns for this query, which is valid in an outer SELECT. The SQL engine matches customer ID records returned by the subquery and returns the filtered results. The results are the following data set.
CustomerId |
First_name |
Last_name |
City |
State |
455 |
Ed |
Thompson |
Atlanta |
GA |
456 |
Ed |
Thompson |
Atlanta |
GA |
Even though the customer information is the same, the customer IDs are different. The result is a data set that contains two records 455 and 456.
Querying External Databases
Subquery statements are often used to query external databases. Your databases are organization structures that contain tables. These tables contain your data. For instance, you have an ecommerce store in one database and a content database to separate the ecommerce data from the content data. Suppose the ecommerce database name is "ecommerce" and the content database name is "content."
Let's set up the data environment for the content database. You have a table for comments on your articles or products. This table is called "Comment" in the content database and it contains the customer ID for the comment.
Here is a sample Comment table.
CustomerId |
ArticleId |
Comment |
457 |
4 |
Test comment. |
458 |
5 |
This product was not for me. |
You're tasked with getting a list of customers who have made comments on your products. You have a table in the Content database that contains comments, so you can use a subquery to grab those records and use them to filter against your main Customer table. The following SQL statement performs your query.
SELECT * FROM Ecommerce.Customer
WHERE Ecommerce.Customer.CustomerId IN (SELECT Content.Comment.CustomerId FROM Content.Comment)
In the above statement, some identifiers are used. Instead of assuming the query runs against tables in the current database, the SQL statement specifies the database name, the table name and the column name. The first identifier is the database name. The outer query uses the Ecommerce database and the inner query uses the Content database.
The next identifier is the table name. "Ecommerce.Customer" means "use the Customer table in the Ecommerce database." In the inner query, the Content.Comment database and table name are identified. The CustomerId identifier is the column name returned from the inner query. Because you have two comments in the Comment table, the SQL statement returns two records. The following record set is returned.
CustomerId |
First_name |
Last_name |
City |
State |
457 |
Joe |
Smith |
Miami |
FL |
458 |
Frank |
Doe |
Dallas |
TX |
When the SQL statement returns records, it returns the columns and data specified in the outer SQL statement, so you see the data from the Customer table and not the Comment table even though SQL does return the customer ID from the Comment table.
Subqueries are also useful when you want to INSERT or UPDATE columns. You've seen how to use a subquery to INSERT data from one table to another. The same type of subquery is also used to UPDATE data. Suppose you add a date column named "OrderDate" to your Customer table. Your Customer table now looks like the following.
CustomerId |
First_name |
Last_name |
City |
State |
OrderDate |
321 |
Frank |
Loe |
Dallas |
TX |
|
455 |
Ed |
Thompson |
Atlanta |
GA |
|
456 |
Ed |
Thompson |
Atlanta |
GA |
|
457 |
Joe |
Smith |
Miami |
FL |
|
458 |
Frank |
Doe |
Dallas |
TX |
You need to add an OrderDate to each customer, but you want to use the date found in your Order table. Look at the following SQL statement.
UPDATE Customer
SET OrderDate = (SELECT TOP 1 OrderDate FROM Order WHERE Order.CustomerId=Customer.CustomerId)
The above UPDATE statement has no WHERE clause, so all records are updated with a date. The "TOP 1" clause is new here. The TOP 1 phrase gets only one record from the Order table. If you didn't use the TOP 1 phrase and had more than one record for a particular customer, SQL returns an error since more than one record is returned in your subquery.
The subquery matches the value of the customer ID in the Customer table with the customer ID in the Order table. This result ensures that you get a date that matches the right customer ID. The subquery is then used to change the value in the Customer OrderDate column. The subquery only gets one record, so if you have multiple records there is no guarantee that the right date will be updated into the Customer table. You can add a WHERE clause to the inner subquery to get the first or last date from the Order table. You can also use the ORDER BY clause to your inner query. Remember that the inner query is a standard SELECT statement that can have any WHERE clause to ensure you return the right data.
The following data set shows your new Customer table.
CustomerId |
First_name |
Last_name |
City |
State |
OrderDate |
321 |
Frank |
Loe |
Dallas |
TX |
1/2/2014 |
455 |
Ed |
Thompson |
Atlanta |
GA |
3/2/2014 |
456 |
Ed |
Thompson |
Atlanta |
GA |
3/10/2014 |
457 |
Joe |
Smith |
Miami |
FL |
NULL |
458 |
Frank |
Doe |
Dallas |
TX |
NULL |
Notice that the last two records have NULL for date values. Look at your Order table. There is no order for customers 457 and 458. The result of the inner subquery is NULL, so NULL is updated into your table. If you don't allow NULL values in your table column, an error will be returned instead of running the SQL statement successfully.
Subqueries can be used in any of your SQL statements. Think of subqueries as standard SELECT queries, build your statements, and then add them to your outer SQL statements.