In most reporting tasks, you need to group your data based on a specific value. For instance, you might need a count of customers in each city. You group the customer count by city and display the results to the viewer. The SQL language incorporates the GROUP BY operator to group data into sections or parts.
Basic GROUP BY Syntax
The following table is our Customer table.
CustomerId |
First_name |
Last_name |
City |
State |
321 |
Frank |
Loe |
Dallas |
TX |
455 |
Ed |
Thompson |
Atlanta |
GA |
456 |
Ed |
Thompson |
Atlanta |
GA |
457 |
Joe |
Smith |
Miami |
FL |
458 |
Frank |
Doe |
Dallas |
TX |
Suppose you want to get a count of customers in each city. In previous examples, we specified a city and returned records that matched those cities in the WHERE clause. In this example, we just want a count of every customer in each city. To get this count, SQL gives you the GROUP BY phrase.
Review the following SQL statement.
SELECT COUNT(CITY) as CityCount, City
GROUP BY City
The result is the following data set.
CityCount |
City |
2 |
Atlanta |
2 |
Dallas |
1 |
Miami |
Notice that we used a city alias for the COUNT function. The CityCount alias makes it easier to read, and you can use these column names in your report. Next, you have the City column. Notice that the Atlanta and Dallas cities are only listed once in the result set even though it's in two records. The COUNT column tells you that the city is recorded twice, and then shows you only the city once because you grouped your values.
Incidentally, if you do not group the city column in your SQL statement, SQL returns an error. When you list columns with aggregate functions, you must include those columns in your GROUP BY clause. For instance, the following SQL statement will give you an error and tell you that you must include City in a GROUP BY clause.
SELECT COUNT(CITY) as CityCount, City
Using WHERE with GROUP BY
The GROUP BY clause doesn't have to be used without any filtering. SQL lets you use the WHERE clause and filter records with the GROUP BY phrase just like any other SQL query.
Suppose you only want a count of customers within a city that match a specific CustomerId range. This type of query is useful when you have a customer range for a specific date and want to get a count of new customers who signed up during a date range. Since we don't have dates in our Customer table, we can use the CustomerId as the range.
Here is the Customer table again.
CustomerId |
First_name |
Last_name |
City |
State |
321 |
Frank |
Loe |
Dallas |
TX |
455 |
Ed |
Thompson |
Atlanta |
GA |
456 |
Ed |
Thompson |
Atlanta |
GA |
457 |
Joe |
Smith |
Miami |
FL |
458 |
Frank |
Doe |
Dallas |
TX |
The following SQL statement searches and groups customer count by city but bases the search using a WHERE clause filter.
SELECT COUNT(CITY) as CityCount, City
WHERE CustomerId > 400
GROUP BY City
Since we only have five records, you can see that only four records are returned by the WHERE clause. The result is the following data set.
CityCount |
City |
2 |
Atlanta |
1 |
Dallas |
1 |
Miami |
Notice Dallas now only has 1 for the count, because the WHERE clause filtered out a record.
The HAVING Operator: Filtering Groups
The WHERE clause filters records, but sometimes you want to filter groups after the GROUP BY phrase is applied. For instance, you want to see a count of customers where you have more than one customer in that particular city. You need the GROUP BY phrase because it gives you a count by city, but you can't use the WHERE clause since it does not account for counts. The answer is to use the HAVING phrase.
The following code is an example of the HAVING clause.
SELECT COUNT(CITY) as CityCount, City
GROUP BY City
HAVING count(*) > 1
The result of the above SQL statement is the following.
CityCount |
City |
2 |
Atlanta |
2 |
Dallas |
Notice that the Miami record was excluded in this record set. The HAVING filter removed all records where the count was less than 2. If you used the ">=" sign, SQL would include records that have a count of 1.
The following SQL statement includes records that have a count of 1.
SELECT COUNT(CITY) as CityCount, City
GROUP BY City
HAVING count(*) >= 1
The above result is the following.
CityCount |
City |
2 |
Atlanta |
1 |
Dallas |
1 |
Miami |
Notice that the HAVING clause is similar. The clause incorporates the ">=" condition, which includes counts that equal one.
You can use the WHERE clause with the HAVING clause. Suppose you want to get a list of records that not only have a count within a certain range but also meet a certain criteria. You need the WHERE clause for the specific criteria but the HAVING clause also takes care of the counting criteria for the grouping. The following SQL statement incorporates the WHERE clause into your HAVING clause.
SELECT COUNT(CITY) as CityCount, City
WHERE City LIKE ‘Da%'
GROUP BY City
HAVING count(*) >= 1
In the above example, the SQL statement tells the SQL engine to filter out all cities except the ones that start with "Da." In this small sample, you only have two records that start with "Da." You could use this WHERE clause to filter all records except ones that start with the letter "D." The result would give you a data set with cities that started with a specific letter.
One syntax rule for the HAVING, GROUP BY, and WHERE clauses is that they must be set in a specific order. The WHERE clause is first because you want to filter based on the records you return. Think of the WHERE clause as a raw filtering of specific records. Next is the GROUP BY clause. The GROUP BY clause takes the records that you want to return and groups them. After you've grouped the records, you then want to use the HAVING clause, which filters your groups. If you use these phrases out of order, your SQL engine will return an error.
Using the ORDER BY on Your Groups
The ORDER BY clause isn't just for a WHERE clause. You can also use ORDER BY on your grouped records. For instance, you now have a count of records that match a particular HAVING and WHERE condition, but your records will still be sorted in any random order. When you list counts in your record sets, you probably want to then sort them in alphabetical order. Alphabetical order makes the report easier to read than random record sets.
Using the following SQL syntax, you can sort your groups.
SELECT COUNT(CITY) as CityCount, City
WHERE City LIKE ‘Da%'
GROUP BY City
HAVING count(*) >= 1
ORDER BY City
In the above statement, your records are ordered by city, which makes the report easier to read by users. Note that just like other statements, you must place your ORDER BY statement at the end of your SQL statement. Rearranging the order you use for conditions will return a SQL error when you attempt to run the statement.