Summarizing Data
SQL is the language of reporting. When you build reports, you use the SQL language even if you use more point-and-click interfaces such as Microsoft Access that build statements for you. In any reporting system, some kind of SQL is used (except in Big Data concepts where NoSQL is standard). In most reporting requirements, you need to use aggregate functions. Aggregate functions are premade SQL functions that you can use instead of building your own code. There are several functions to choose from, and the syntax depends on your platform. In this example, we'll use MSSQL to show you how to use important SQL functions.
The SUM Function: Adding Values
The SUM function is an aggregate function that adds up all values in a specific column. You can only use the SUM function with numeric values either integers or decimals.
For instance, suppose you have a list of orders in a table. You want to sum up the total order for a specific date. You can use the SUM function to add up the total value of your orders without writing more complex SQL to sum up values for you. The following is an example table that contains a list of orders.
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
In this example, you want to sum up the total orders for the month of March. The following SQL statement would sum up these orders.
SELECT SUM(Total) FROM Orders
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
The return value from the above SQL statement is 60. Your frontend code is responsible for formatting the result. If you notice, a dollar sign is not stored in the table. Unless you set a column as a string column, SQL does not store formatted numbers such as dollar signs with values. You must format the frontend report with the dollar sign and trailing zeros after the decimal point.
In the above example, you use the dates to return the results. You can use any parameters in the WHERE clause in the same way you use any other SELECT statement.
The AVG Function: Average Your Results
The AVG function works in a similar way as SUM. The difference is that the AVG function adds up or sums up all values and then calculates the average. The average is based on the number of records returned by the SQL statement, so you receive different results based on your WHERE clause.
Let's use the same Order table as we previously used. The following data is in our Order table.
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
We can again use the same WHERE clause used in the SUM function above. The following SQL statement shows you how to use the AVG function.
SELECT AVG(Total) FROM Orders
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
The new value returned by the AVG function is 30. The average is calculated by summing up all of the values and dividing by the total records returned to get the summed results. In this example, there are two records returned that add up to 60, so dividing this number by two returns 30.
The COUNT Function: Count the Number of Records Returned
Sometimes, you just want to get a quick count of the number of records returned by a query. You can also use the COUNT function to count the number of records in a table. Suppose you want to know how many orders you have in total. You can use the COUNT function to get a total number of order records.
Using the same table, we have the following records.
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
With this small table, it's easy to identify the number of records, but usually you'll have thousands of records in a table.
The following SQL code shows you how to sum up your records.
SELECT COUNT(*) FROM Orders
The result is 3, which is the total number of records in your table. Notice the asterisk symbol is used in this SQL statement. You can also use a column name and the same results are returned. For instance, three is still returned for the following SQL statement.
SELECT COUNT(OrderId) FROM Orders
The MIN and MAX Functions
The MIN and MAX functions find the minimum or maximum value in a record set. For instance, you might want to find the maximum or highest order total that's you've had while in business. You might also want to know the minimum or lowest order total value. With these business requirements, you use the MIN and MAX functions.
We'll use the same Order table as before.
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
Suppose you want to find the highest order total in your whole Order table. The following SQL statement gets the highest order.
SELECT MAX(Total) FROM Order
The result of the above statement is "40." Again, any formatting must be done after the record set returns, because SQL does not store formats like currency.
Notice there is no WHERE clause in the above statement. The above SQL statement will go through all records and find the highest order total value. You can add a WHERE clause filter to limit the search. For instance, suppose you only want to return the maximum order total value for the month of March. The following SQL statement performs your search.
SELECT MAX(Total) FROM Order
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
Again, the above SQL statement returns 40.
Just like the MAX function, you can also get the lowest total value in your Order table. The following SQL statement uses MIN.
SELECT MIN(Total) FROM Order
With this statement, a value of 10 is returned. It's the lowest value in your table in the Total column. Like the MAX function, you can also use a WHERE clause to filter your record set. The following code incorporates the WHERE clause.
SELECT MIN(Total) FROM Order
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
The return value for this statement is 20, because the record with a Total value of 10 is filtered out using the WHERE clause.
Combining Functions
You're not limited to using one function when you work with SQL statements and functions. You can combine them. For instance, suppose you want to get a SUM of your order totals, but you also want to know how many orders you have. You could write two separate statements to get each value separately. However, this is not the way to write SQL for performance. Whenever you write your SQL statements, you should write with performance in mind. Running two separate SQL statements probably won't harm performance when you only have a few people running your application, but what happens when you have 100, 1000, or 10000 people running your app at the same time? When you have thousands of people using your app concurrently, writing two separate SQL statements makes a huge different when you could just combine them and write one.
Let's take a look in the Order table.
OrderId |
CustomerId |
Total |
OrderDate |
1 |
321 |
10 |
1/2/2014 |
2 |
455 |
40 |
3/2/2014 |
3 |
456 |
20 |
3/10/2014 |
Notice that there are three records, but you will probably have thousands of records in a real-world table. You may want to sum up orders for the entire table, but most applications sum up totals using a date range.
First, let's sum up totals with no WHERE clause and use the COUNT clause to identify the number of records.
SELECT SUM(Total), COUNT(Total) FROM Orders
The above statement returns a SUM function total of 70, and the COUNT function returns 3. In most cases, you'll need a WHERE clause in your SQL statement even if it's summing up multiple records. For instance, the following SQL statement uses the same WHERE clause as our other statements.
SELECT SUM(Total), COUNT(Total) FROM Orders
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
The above statement returns a summed total of 60 and a count of 2 because the record set is limited to only values in March.
You could even add the AVG function into your above statement. For instance, the following SQL statement gets the total amount for the month of March, the count of your orders and the average amount per order.
SELECT SUM(Total), COUNT(Total), AVG(Total) FROM Orders
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
The above statement would return 60, 2, and 30 respectively.
One last comment on the previous SQL statements. You can alias columns to return a column name with your functions. When you don't use aliases, SQL will automatically assign a column name such as "Column1" to the summarized data set. You can overwrite this default by specifying a column name, which is also called aliasing. Using the recent example, column names are set up for the returned data set in the below SQL statement.
SELECT SUM(Total) as SumTotal, COUNT(Total) as CountTotal, AVG(Total) as AverageTotal FROM Orders
WHERE OrderDate BETWEEN '3/1/2014' AND '3/31/2014'
Now, instead of just returning random column names, you get the aliased column names with the associated values. You can use aliased names with any column in your table.
There are several other functions you can use with SQL. The format and spelling are different between some platforms. Some other examples include ABS, RAND, ROUND, UPPER, LOWER, LTRIM, RTRIM, and CONVERT. You can even create your own functions once you get more comfortable with the SQL language.
Working with functions can reduce the time it takes to code your SQL procedures and make reporting an easier feat when you're assigned these tasks.