Aggregate functions are functions that are used to get summary values. All aggregate functions can be divided into several groups:
- Convenient aggregate functions
- Statistical aggregate functions
- User-defined aggregate functions
- Analytic aggregate functions
The first three types are described in the following sections, while analytic aggregate functions are explained in detail in a different chapter.
Convenient Aggregate Functions
The Transact-SQL language supports six aggregate functions:
- MIN
- MAX
- SUM
- AVG
- COUNT
- COUNT_BIG
All aggregate functions operate on a single argument, which can be either a column or an expression. (The only exception is the second form of the COUNT and COUNT_BIG functions, COUNT(*) and COUNT_BIG(*).) The result of each aggregate function is a constant value, which is displayed in a separate column of the result.
The aggregate functions appear in the SELECT list, which can include a GROUP BY clause. If there is no GROUP BY clause in the SELECT statement, and the SELECT list includes at least one aggregate function, then no simple columns can be included in the SELECT list (other than as arguments of an aggregate function). Therefore, Example 6.24 is wrong.
USE sample;
SELECT emp_lname, MIN(emp_no)
FROM employee;
Code language: PHP (php)
The emp_lname column of the employee table must not appear in the SELECT list of Example 6.24 because it is not the argument of an aggregate function. On the other hand, all column names that are not arguments of an aggregate function may appear in the SELECT list if they are used for grouping.
The argument of an aggregate function can be preceded by one of two keywords:
- ALL – Indicates that all values of a column are to be considered (ALL is the default value)
- DISTINCT – Eliminates duplicate values of a column before the aggregate function is applied
MIN and MAX Aggregate Functions
The aggregate functions MIN and MAX compute the lowest and highest values in the column, respectively. If there is a WHERE clause, the MIN and MAX functions return the lowest or highest of values from selected rows. Example 6.25 shows the use of the aggregate function MIN.
Get the lowest employee number:
USE sample;
SELECT MIN(emp_no) AS min_employee_no
FROM employee;
Code language: PHP (php)
The result is
min_employee_no |
---|
2581 |
The result of Example 6.25 is not user friendly. For instance, the name of the employee with the lowest number is not known. As already shown, the explicit the name of the employee with the lowest employee number, use a subquery, as shown in Example 6.26, where the inner query contains the SELECT statement of the previous example.
Get the number and the last name of the employee with the lowest employee number:
USE sample;
SELECT emp_no, emp_lname
FROM employee
WHERE emp_no =
(SELECT MIN(emp_no)
FROM employee);
Code language: PHP (php)
The result is
emp_no | emp_lname |
---|---|
2581 | Hansel |
Example 6.27 shows the use of the aggregate function MAX.
Get the employee number of the manager who was entered last in the works_on table:
USE sample;
SELECT emp_no
FROM works_on
WHERE enter_date =
(SELECT MAX(enter_date)
FROM works_on
WHERE job = 'Manager');
Code language: PHP (php)
The result is
emp_no |
---|
10102 |
The argument of the functions MIN and MAX can also be a string value or a date. If the argument has a string value, the comparison between all values will be provided using the actual collating sequence. For all arguments of temporal data types, the earliest date specifies the lowest value in the column and the latest date specifies the highest value in the column.
The DISTINCT option cannot be used with the aggregate functions MIN and MAX. All NULL values in the column that are the argument of the aggregate function MIN or MAX are always eliminated before MIN or MAX is applied.
SUM Aggregate Function
The aggregate function SUM calculates the sum of the values in the column. The argument of the function SUM must be numeric. Example 6.28 shows the use of the SUM function.
Calculate the sum of all budgets of all projects:
USE sample;
SELECT SUM(budget) sum_of_budgets
FROM project;
Code language: PHP (php)
The result is
sum_of_budgets |
---|
401500 |
The aggregate function in Example 6.28 groups all values of the projects’ budgets and determines their total sum. For this reason, the query in Example 6.28 (as does each analog query) implicitly contains the grouping function. The grouping function from Example 6.28 can be written explicitly in the query, as shown in Example 6.29.
SELECT SUM(budget) sum_of_budgets
FROM project
GROUP BY();
The use of this syntax for the GROUP BY clause is recommended because it defines a grouping explicitly. The use of the DISTINCT option eliminates all duplicate values in the column before the function SUM is applied. Similarly, all NULL values are always eliminated before SUM is applied.
AVG Aggregate Function
The aggregate function AVG calculates the average of the values in the column. The argument of the function AVG must be numeric. All NULL values are eliminated before the function AVG is applied. Example 6.30 shows the use of the AVG aggregate function.
Calculate the average of all budgets with an amount greater than $100,000:
USE sample;
SELECT AVG(budget) avg_budget
FROM project
WHERE budget > 100000;
Code language: PHP (php)
The result is
avg_budget |
---|
153250 |
COUNT and COUNT_BIG Aggregate Functions
The aggregate function COUNT has two different forms:
COUNT([DISTINCT] col_name)
COUNT(*)
Code language: CSS (css)
The first form calculates the number of values in the col_name column. When the DISTINCT keyword is used, all duplicate values are eliminated before COUNT is applied. This form of COUNT does not count NULL values for the column. Example 6.31 shows the use of the first form of the aggregate function COUNT.
Count all different jobs in each project:
USE sample;
SELECT project_no, COUNT(DISTINCT job) job_count
FROM works_on
GROUP BY project_no;
Code language: PHP (php)
The result is
project_no | job_count |
---|---|
p1 | 3 |
p2 | 1 |
p3 | 3 |
As can be seen from the result of Example 6.31, all NULL values are eliminated before the function COUNT(DISTINCT job) is applied. (The sum of all values in the column is 8 instead of 11.)
The second form of the function COUNT, COUNT(*), counts the number of rows in the table. Or, if there is a WHERE clause in the SELECT statement, COUNT(*) returns the number of rows for which the WHERE condition is true. In contrast to the first form of the function COUNT, the second form does not eliminate NULL values, because this function operates on rows and not on columns. Example 6.32 shows the use of COUNT(*).
Get the number of each job in all projects:
USE sample;
SELECT job, COUNT(*) job_count
FROM works_on
GROUP BY job;
Code language: PHP (php)
The result is
job | job_count |
---|---|
NULL | 3 |
Analyst | 2 |
Clerk | 4 |
Manager | 2 |
The COUNT_BIG function is analogous to the COUNT function. The only difference between them is in relation to their return values: COUNT_BIG always returns a value of the BIGINT data type, while the COUNT function always returns a value of the INTEGER data type.
Statistical Aggregate Functions
The following aggregate functions belong to the group of statistical aggregate functions:
- VAR – Computes the variance of all the values listed in a column or expression
- VARP – Computes the variance for the population of all the values listed in a column or expression
- STDEV – Computes the standard deviation (which is computed as the square root of the corresponding variance) of all the values listed in a column or expression
- STDEVP – Computes the standard deviation for the population of all the values listed in a column or expression
Examples showing statistical aggregate functions will be provided in Chapter “Business Intelligence and Transact-SQL”.
User-Defined Aggregate Functions
The Database Engine also supports the implementation of user-defined aggregate functions. Using these functions, you can implement and deploy aggregate functions that do not belong to aggregate functions supported by the system. These functions are a special case of user-defined functions, which will be described in detail in Chapter “Stored Procedures and User-Defined Functions”.