Microsoft SQL Server 2012 Analysis Services: DAX Basics
- 7/15/2012
Common DAX Functions
Now that you have seen the fundamentals of DAX and how to handle error conditions, take a brief tour through the most commonly used functions and expressions of DAX. In this section, we show the syntax and the meaning of various functions. In the next section, we show how to create a useful report by using these basic functions.
Aggregate Functions
Almost every Tabular data model must operate on aggregated data. DAX offers a set of functions that aggregate the values of a column in a table and return a single value. We call this group of functions aggregate functions. For example, the expression
= SUM( Sales[Amount] )
calculates the sum of all the numbers in the Amount column of the Sales table. This expression aggregates all the rows of the Sales table if it is used in a calculated column, but it considers only the rows that are filtered by slicers, rows, columns, and filter conditions in a pivot table whenever it is used in a measure.
In Table A-1 of the Appendix, you can see the complete list of aggregated functions available in DAX. The four main aggregation functions (SUM, AVERAGE, MIN, and MAX) operate on only numeric values. These functions work only if the column passed as argument is of numeric or date type.
DAX offers an alternative syntax to these functions to make the calculation on columns that can contain both numeric and nonnumeric values such as a text column. That syntax adds the suffix A to the name of the function, just to get the same name and behavior as Excel. However, these functions are useful for only columns containing TRUE/FALSE values because TRUE is evaluated as 1 and FALSE as 0. Any value for a text column is always considered 0. Empty cells are never considered in the calculation, so even if these functions can be used on nonnumeric columns without returning an error, there is no automatic conversion to numbers for text columns. These functions are named AVERAGEA, COUNTA, MINA, and MAXA.
The only interesting function in the group of A-suffixed functions is COUNTA. It returns the number of cells that are not empty and works on any type of column. If you are interested in counting all the cells in a column containing an empty value, you can use the COUNTBLANK function. Finally, if you want to count all the cells of a column regardless of their content, you want to count the number of rows of the table, which can be obtained by calling the COUNTROWS function. (It gets a table as a parameter, not a column.) In other words, the sum of COUNTA and COUNTBLANK for the same column of a table is always equal to the number of rows of the same table.
You have four functions by which to count the number of elements in a column or table:
COUNT operates only on numeric columns.
COUNTA operates on any type of columns.
COUNTBLANK returns the number of empty cells in a column.
COUNTROWS returns the number of rows in a table.
Finally, the last set of aggregation functions performs calculations at the row level before they are aggregated. This is essentially the same as creating a column calculation and a measure calculation in one formula. This set of functions is quite useful, especially when you want to make calculations by using columns of different related tables. For example, if a Sales table contains all the sales transactions and a related Product table contains all the information about a product, including its cost, you might calculate the total internal cost of a sales transaction by defining a measure with this expression.
Cost := SUMX( Sales, Sales[Quantity] * RELATED( Product[StandardCost] ) )
This function calculates the product of Quantity (from the Sales table) and StandardCost of the sold product (from the related Product table) for each row in the Sales table, and it returns the sum of all these calculated values.
Generally speaking, all the aggregation functions ending with an X suffix behave this way: they calculate an expression (the second parameter) for each of the rows of a table (the first parameter) and return a result obtained by the corresponding aggregation function (SUM, MIN, MAX, or COUNT) applied to the result of those calculations. We explain this behavior further in Chapter 5. Evaluation context is important for understanding how this calculation works. The X-suffixed functions available are SUMX, AVERAGEX, COUNTX, COUNTAX, MINX, and MAXX.
Among the counting functions, one of the most used is DISTINCTCOUNT, which does exactly what its name suggests: counts the distinct values of a column, which it takes as its only parameter.
DISTINCTCOUNT deserves a special mention among the various counting functions because of its speed. If you have some knowledge of counting distinct values in previous versions of SSAS, which implemented Multidimensional only, you already know that counting the number of distinct values of a column was problematic. If your database was not small, you had to be very careful whenever you wanted to add distinct counts to the solution and, for medium and big databases, a careful and complex handling of partitioning was necessary to implement distinct counts efficiently. However, in Tabular, DISTINCTCOUNT is amazingly fast due to the nature of the columnar database and the way it stores data in memory. In addition, you can use DISTINCTCOUNT on any column in your data model without worrying about creating new structures, as in Multidimensional.
Following what you have already learned in Chapter 1, “Introducing the Tabular Model,” if you have a previous SSAS cube that has many problematic DISTINCTCOUNT results, measuring performance of the same solution rewritten in Tabular is definitely worth a try; you might have very pleasant surprises and decide to perform the transition of the cube for the sole presence of DISTINCTCOUNT.
Logical Functions
Sometimes you might need to build a logical condition in an expression—for example, to implement different calculations depending on the value of a column or to intercept an error condition. In these cases, you can use one of the logical functions in DAX. You have already seen in the previous section, “Handling Errors in DAX Expressions,” the two most important functions of this group, which are IF and IFERROR. In Table A-3 of the Appendix, you can see the list of all these functions (which are AND, FALSE, IF, IFERROR, NOT, TRUE, and OR) and their syntax. If, for example, you want to compute the Amount as Quantity multiplied by Price only when the Price column contains a correct numeric value, you can use the following pattern.
Amount := IFERROR( Sales[Quantity] * Sales[Price], BLANK() )
If you did not use the IFERROR and the Price column contains an invalid number, the result for the calculated column would be an error because if a single row generates a calculation error, the error is propagated to the whole column. The usage of IFERROR, however, intercepts the error and replaces it with a blank value.
Another function you might put inside this category is SWITCH, which is useful when you have a column containing a low number of distinct values, and you want to get different behaviors, depending on the value. For example, the column Size in the DimProduct table contains L, M, S, and XL, and you might want to decode this value in a more meaningful column. You can obtain the result by using nested IF calls.
SizeDesc := IF (DimProduct[Size] = "S", "Small", IF (DimProduct[Size] = "M", "Medium", IF (DimProduct[Size] = "L", "Large", IF (DimProduct[Size] = "XL", "Extra Large", "Other"))))
The following is a more convenient way to express the same formula, by using SWITCH.
SizeDesc := SWITCH (DimProduct[Size], "S", "Small", "M", "Medium", "L", "Large", "XL", "Extra Large", "Other" )
The code in this latter expression is more readable, even if it is not faster, because, internally, switch statements are translated into nested IF calls.
Information Functions
Whenever you must analyze the data type of an expression, you can use one of the information functions that are listed in Table A-4 of the Appendix. All these functions return a TRUE/FALSE value and can be used in any logical expression. They are: ISBLANK, ISERROR, ISLOGICAL, ISNONTEXT, ISNUMBER, and ISTEXT.
Note that when a table column is passed as a parameter, the ISNUMBER, ISTEXT, and ISNONTEXT functions always return TRUE or FALSE, depending on the data type of the column and on the empty condition of each cell.
You might be wondering whether ISNUMBER can be used with a text column just to check whether a conversion to a number is possible. Unfortunately, you cannot use this approach; if you want to test whether a text value can be converted to a number, you must try the conversion and handle the error if it fails.
For example, to test whether the column Price (which is of type String) contains a valid number, you must write the following.
IsPriceCorrect = ISERROR( Sales[Price] + 0 )
To get a TRUE result from the ISERROR function, for example, DAX tries to add a zero to the Price to force the conversion from a text value to a number. The conversion fails for the N/A price value, so you can see that ISERROR is TRUE.
If, however, you try to use ISNUMBER, as in the following expression
IsPriceCorrect = ISNUMBER( Sales[Price] )
you will always get FALSE as a result because, based on metadata, the Price column is not a number but a string.
Mathematical Functions
The set of mathematical functions available in DAX is very similar to those in Excel, with the same syntax and behavior. You can see the complete list of these functions and their syntax in Table A-5 of the Appendix. The mathematical functions commonly used are ABS, EXP, FACT, LN, LOG, LOG10, MOD, PI, POWER, QUOTIENT, SIGN, and SQRT. Random functions are RAND and RANDBETWEEN.
There are many rounding functions, summarized here.
FLOOR = FLOOR( Tests[Value], 0.01 ) TRUNC = TRUNC( Tests[Value], 2 ) ROUNDDOWN = ROUNDDOWN( Tests[Value], 2 ) MROUND = MROUND( Tests[Value], 0.01 ) ROUND = ROUND( Tests[Value], 2 ) CEILING = CEILING( Tests[Value], 0.01 ) ROUNDUP = ROUNDUP( Tests[Value], 2 ) INT = INT( Tests[Value] ) FIXED = FIXED(Tests[Value],2,TRUE) ISO = ISO.CEILING( Tests[Value], 0.01 )
In Figure 4-10, you can see the different results when applied to some test values.
Figure 4-10 Different rounding functions lead to different values.
As you can see, FLOOR, TRUNC, and ROUNDDOWN are very similar, except in the way you can specify the number of digits on which to round. In the opposite direction, CEILING and ROUNDUP are very similar in their results. You can see a few differences in the way the rounding is done (see row B, in which the 1.265 number is rounded in two ways on the second decimal digit) between the MROUND and ROUND functions. Finally, note that FLOOR and MROUND functions do not operate on negative numbers, whereas other functions do.
Text Functions
Table A-6 of the Appendix contains a complete description of the text functions available in DAX: they are CONCATENATE, EXACT, FIND, FIXED, FORMAT, LEFT, LEN, LOWER, MID, REPLACE, REPT, RIGHT, SEARCH, SUBSTITUTE, TRIM, UPPER, and VALUE.
These functions are useful for manipulating text and extracting data from strings that contain multiple values, and are often used in calculated columns to format strings or find specific patterns.
Conversion Functions
You learned that DAX performs automatic conversion of data types to adjust them to the need of the operators. Even if it happens automatically, a set of functions can still perform explicit conversion of types.
CURRENCY can transform an expression into a currency type, whereas INT transforms an expression into an integer. DATE and TIME take the date and time parts as parameters and return a correct DATETIME. VALUE transforms a string into a numeric format, whereas FORMAT gets a numeric value as the first parameter and a string format as its second parameter, and can transform numeric values into strings.
Date and Time Functions
In almost every type of data analysis, handling time and date is an important part of the job. DAX has a large number of functions that operate on date and time. Some of them make simple transformations to and from a datetime data type, such as the ones described in Table A-7 of the Appendix. These are DATE, DATEVALUE, DAY, EDATE, EOMONTH, HOUR, MINUTE, MONTH, NOW, SECOND, TIME, TIMEVALUE, TODAY, WEEKDAY, WEEKNUM, YEAR, and YEARFRAC. To make more complex operations on dates, such as comparing aggregated values year over year or calculating the year-to-date value of a measure, there is another set of functions, called time intelligence functions, which is described in Chapter 8, “Understanding Time Intelligence in DAX.”
As mentioned before in this chapter, a datetime data type internally uses a floating-point number by which the integer part corresponds to the number of days starting from December 30, 1899, and the decimal part indicates the fraction of the day in time. (Hours, minutes, and seconds are converted into decimal fractions of the day.) Thus, adding an integer number to a datetime value increments the value by a corresponding number of days. However, most of the time, the conversion functions are used to extract day, month, and year from a date.
Relational Functions
Two useful functions that enable you to navigate through relationships inside a DAX formula are RELATED and RELATEDTABLE. In Chapter 5, you learn all the details of how these functions work; because they are so useful, it is worth describing them here.
You already know that a calculated column can reference column values of the table in which it is defined. Thus, a calculated column defined in FactResellerSales can reference any column of the same table. But what can you do if you must refer to a column in another table? In general, you cannot use columns in other tables unless a relationship is defined in the model between the two tables. However, if the two tables are in relationship, then the RELATED function enables you to access columns in the related table.
For example, you might want to compute a calculated column in the FactResellerSales table that checks whether the product that has been sold is in the Bikes category and, if it is, apply a reduction factor to the standard cost. To compute such a column, you must write an IF that checks the value of the product category, which is not in the FactResellerSales table. Nevertheless, a chain of relationships starts from FactResellerSales, reaching DimProductCategory through DimProduct and DimProductSubcategory, as you can see in Figure 4-11.
Figure 4-11 FactResellerSales has a chained relationship with DimProductCategory.
It does not matter how many steps are necessary to travel from the original table to the related one; DAX will follow the complete chain of relationship and return the related column value. Thus, the formula for the AdjustedCost column can be
=IF ( RELATED (DimProductCategory[EnglishProductCategoryName]) = "Bikes", [ProductStandardCost] * 0.95, [ProductStandardCost] )
In a one-to-many relationship, RELATED can access the one side from the many side because, in that case, only one row, if any, exists in the related table. If no row is related with the current one, RELATED returns BLANK.
If you are on the one side of the relationship and you want to access the many side, RELATED is not helpful because many rows from the other side are available for a single row in the current table. In that case, RELATEDTABLE will return a table containing all the related rows. For example, if you want to know how many products are in this category, you can create a column in DimProductCategory with this formula.
= COUNTROWS (RELATEDTABLE (DimProduct))
This calculated column will show, for each product category, the number of products related, as you can see in Figure 4-12.
Figure 4-12 Count the number of products by using RELATEDTABLE.
As is the case for RELATED, RELATEDTABLE can follow a chain of relationships, always starting from the one side and going in the direction of the many side.