Introducing DAX
- By Alberto Ferrari and Marco Russo
- 7/20/2019
In this sample chapter from The Definitive Guide to DAX, authors Russo and Ferrari start talking about the DAX language. Here you learn the syntax of the language, the difference between a calculated column and a measure (also called calculated field, in certain old Excel versions), and the most commonly used functions in DAX.
In this chapter, we start talking about the DAX language. Here you learn the syntax of the language, the difference between a calculated column and a measure (also called calculated field, in certain old Excel versions), and the most commonly used functions in DAX.
Because this is an introductory chapter, it does not cover many functions in depth. In later chapters, we explain them in more detail. For now, introducing the functions and starting to look at the DAX language in general are enough. When we reference features of the data model in Power BI, Power Pivot, or Analysis Services, we use the term Tabular even when the feature is not present in all the products. For example, “DirectQuery in Tabular” refers to the DirectQuery mode feature available in Power BI and Analysis Services but not in Excel.
Understanding DAX calculations
Before working on more complex formulas, you need to learn the basics of DAX. This includes DAX syntax, the different data types that DAX can handle, the basic operators, and how to refer to columns and tables. These concepts are discussed in the next few sections.
We use DAX to compute values over columns in tables. We can aggregate, calculate, and search for numbers, but in the end, all the calculations involve tables and columns. Thus, the first syntax to learn is how to reference a column in a table.
The general format is to write the table name enclosed in single quotation marks, followed by the column name enclosed in square brackets, as follows:
'Sales'[Quantity]
We can omit the single quotation marks if the table name does not start with a number, does not contain spaces, and is not a reserved word (like Date or Sum).
The table name is also optional in case we are referencing a column or a measure within the table where we define the formula. Thus, [Quantity] is a valid column reference, if written in a calculated column or in a measure defined in the Sales table. Although this option is available, we strongly discourage you from omitting the table name. At this point, we do not explain why this is so important, but the reason will become clear when you read Chapter 5, “Understanding CALCULATE and CALCULATETABLE.” Nevertheless, it is of paramount importance to be able to distinguish between measures (discussed later) and columns when you read DAX code. The de facto standard is to always use the table name in column references and always avoid it in measure references. The earlier you start adopting this standard, the easier your life with DAX will be. Therefore, you should get used to this way of referencing columns and measures:
Sales[Quantity] * 2 -- This is a column reference [Sales Amount] * 2 -- This is a measure reference
You will learn the rationale behind this standard after learning about context transition, which comes in Chapter 5. For now, just trust us and adhere to this standard.
DAX data types
DAX can perform computations with different numeric types, of which there are seven. Over time, Microsoft introduced different names for the same data types, creating some sort of confusion. Table 2-1 provides the different names under which you might find each DAX data type.
Table 2-1 Data Types
DAX Data Type |
Power BI Data Type |
Power Pivot and Analysis Services Data Type |
Correspondent Conventional Data Type (e.g., SQL Server) |
Tabular Object Model (TOM) Data Type |
---|---|---|---|---|
Integer |
Whole Number |
Whole Number |
Integer / INT |
int64 |
Decimal |
Decimal Number |
Decimal Number |
Floating point / DOUBLE |
double |
Currency |
Fixed Decimal Number |
Currency |
Currency / MONEY |
decimal |
DateTime |
DateTime, Date, Time |
Date |
Date / DATETIME |
dateTime |
Boolean |
True/False |
True/False |
Boolean / BIT |
boolean |
String |
Text |
Text |
String / NVARCHAR(MAX) |
string |
Variant |
- |
- |
- |
variant |
Binary |
Binary |
Binary |
Blob / VARBINARY(MAX) |
binary |
In this book, we use the names in the first column of Table 2-1 adhering to the de facto standards in the database and Business Intelligence community. For example, in Power BI, a column containing either TRUE or FALSE would be called TRUE/FALSE, whereas in SQL Server, it would be called a BIT. Nevertheless, the historical and most common name for this type of value is Boolean.
DAX comes with a powerful type-handling system so that we do not have to worry about data types. In a DAX expression, the resulting type is based on the type of the term used in the expression. You need to be aware of this in case the type returned from a DAX expression is not the expected type; you would then have to investigate the data type of the terms used in the expression itself.
For example, if one of the terms of a sum is a date, the result also is a date; likewise, if the same operator is used with integers, the result is an integer. This behavior is known as operator overloading, and an example is shown in Figure 2-1, where the OrderDatePlusOneWeek column is calculated by adding 7 to the value of the Order Date column.
Sales[OrderDatePlusOneWeek] = Sales[Order Date] + 7
Figure 2-1 Adding an integer to a date results in a date increased by the corresponding number of days.
The result is a date.
In addition to operator overloading, DAX automatically converts strings into numbers and numbers into strings whenever required by the operator. For example, if we use the & operator, which concatenates strings, DAX converts its arguments into strings. The following formula returns “54” as a string:
= 5 & 4
On the other hand, this formula returns an integer result with the value of 9:
= "5" + "4"
The resulting value depends on the operator and not on the source columns, which are converted following the requirements of the operator. Although this behavior looks convenient, later in this chapter you see what kinds of errors might happen during these automatic conversions. Moreover, not all the operators follow this behavior. For example, comparison operators cannot compare strings with numbers. Consequently, you can add one number with a string, but you cannot compare a number with a string. You can find a complete reference here: https://docs.microsoft.com/en-us/power-bi/desktop-data-types. Because the rules are so complex, we suggest you avoid automatic conversions altogether. If a conversion needs to happen, we recommend that you control it and make the conversion explicit. To be more explicit, the previous example should be written like this:
= VALUE ( "5" ) + VALUE ( "4" )
People accustomed to working with Excel or other languages might be familiar with DAX data types. Some details about data types depend on the engine, and they might be different for Power BI, Power Pivot, or Analysis Services. You can find more detailed information about Analysis Services DAX data types at http://msdn.microsoft.com/en-us/library/gg492146.aspx, and Power BI information is available at https://docs.microsoft.com/en-us/power-bi/desktop-data-types. However, it is useful to share a few considerations about each of these data types.
Integer
DAX has only one Integer data type that can store a 64-bit value. All the internal calculations between integer values in DAX also use a 64-bit value.
Decimal
A Decimal number is always stored as a double-precision floating-point value. Do not confuse this DAX data type with the decimal and numeric data type of Transact-SQL. The corresponding data type of a DAX decimal number in SQL is Float.
Currency
The Currency data type, also known as Fixed Decimal Number in Power BI, stores a fixed decimal number. It can represent four decimal points and is internally stored as a 64-bit integer value divided by 10,000. Summing or subtracting Currency data types always ignores decimals beyond the fourth decimal point, whereas multiplication and division produce a floating-point value, thus increasing the precision of the result. In general, if we need more accuracy than the four digits provided, we must use a Decimal data type.
The default format of the Currency data type includes the currency symbol. We can also apply the currency formatting to Integer and decimal numbers, and we can use a format without the currency symbol for a Currency data type.
DateTime
DAX stores dates as a DateTime data type. This format uses a floating-point number internally, wherein the integer corresponds to the number of days since December 30, 1899, and the decimal part identifies the fraction of the day. Hours, minutes, and seconds are converted to decimal fractions of a day. Thus, the following expression returns the current date plus one day (exactly 24 hours):
= TODAY () + 1
The result is tomorrow’s date at the time of the evaluation. If you need to take only the date part of a DateTime, always remember to use TRUNC to get rid of the decimal part.
Power BI offers two additional data types: Date and Time. Internally, they are a simple variation of DateTime. Indeed, Date and Time store only the integer or the decimal part of the DateTime, respectively.
Boolean
The Boolean data type is used to express logical conditions. For example, a calculated column defined by the following expression is of Boolean type:
= Sales[Unit Price] > Sales[Unit Cost]
You will also see Boolean data types as numbers where TRUE equals 1 and FALSE equals 0. This notation sometimes proves useful for sorting purposes because TRUE > FALSE.
String
Every string in DAX is stored as a Unicode string, where each character is stored in 16 bits. By default, the comparison between strings is not case sensitive, so the two strings “Power BI” and “POWER BI” are considered equal.
Variant
The Variant data type is used for expressions that might return different data types, depending on the conditions. For example, the following statement can return either an integer or a string, so it returns a variant type:
IF ( [measure] > 0, 1, "N/A" )
The Variant data type cannot be used as a data type for a column in a regular table. A DAX measure, and in general, a DAX expression can be Variant.
Binary
The Binary data type is used in the data model to store images or other nonstructured types of information. It is not available in DAX. It was mainly used by Power View, but it might not be available in other tools such as Power BI.
DAX operators
Now that you have seen the importance of operators in determining the type of an expression, see Table 2-2, which provides a list of the operators available in DAX.
Table 2-2 Operators
Operator Type |
Symbol |
Use |
Example |
---|---|---|---|
Parenthesis |
( ) |
Precedence order and grouping of arguments |
(5 + 2) * 3 |
Arithmetic |
+ − * / |
Addition Subtraction/negation Multiplication Division |
4 + 2 5 − 3 4 * 2 4 / 2 |
Comparison |
= <> > >= < <= |
Equal to Not equal to Greater than Greater than or equal to Less than Less than or equal to |
[CountryRegion] = “USA” [CountryRegion] <> “USA” [Quantity] > 0 [Quantity] >= 100 [Quantity] < 0 [Quantity] <= 100 |
Text concatenation |
& |
Concatenation of strings |
“Value is” & [Amount] |
Logical |
&& || IN NOT |
AND condition between two Boolean expressions OR condition between two Boolean expressions Inclusion of an element in a list Boolean negation |
[CountryRegion] = “USA” && [Quantity]>0 [CountryRegion] = “USA” || [Quantity] > 0 [CountryRegion] IN {“USA”, “Canada”} NOT [Quantity] > 0 |
Moreover, the logical operators are also available as DAX functions, with a syntax similar to Excel’s. For example, we can write expressions like these:
AND ( [CountryRegion] = "USA", [Quantity] > 0 ) OR ( [CountryRegion] = "USA", [Quantity] > 0 )
These examples are equivalent, respectively, to the following:
[CountryRegion] = "USA" && [Quantity] > 0 [CountryRegion] = "USA" || [Quantity] > 0
Using functions instead of operators for Boolean logic becomes helpful when writing complex conditions. In fact, when it comes to formatting large sections of code, functions are much easier to format and to read than operators are. However, a major drawback of functions is that we can pass in only two parameters at a time. Therefore, we must nest functions if we have more than two conditions to evaluate.
Table constructors
In DAX we can define anonymous tables directly in the code. If the table has a single column, the syntax requires only a list of values—one for each row—delimited by curly braces. We can delimit multiple rows by parentheses, which are optional if the table is made of a single column. The two following definitions, for example, are equivalent:
{ "Red", "Blue", "White" } { ( "Red" ), ( "Blue" ), ( "White" ) }
If the table has multiple columns, parentheses are mandatory. Every column should have the same data type throughout all its rows; otherwise, DAX will automatically convert the column to a data type that can accommodate all the data types provided in different rows for the same column.
{ ( "A", 10, 1.5, DATE ( 2017, 1, 1 ), CURRENCY ( 199.99 ), TRUE ), ( "B", 20, 2.5, DATE ( 2017, 1, 2 ), CURRENCY ( 249.99 ), FALSE ), ( "C", 30, 3.5, DATE ( 2017, 1, 3 ), CURRENCY ( 299.99 ), FALSE ) }
The table constructor is commonly used with the IN operator. For example, the following are possible, valid syntaxes in a DAX predicate:
'Product'[Color] IN { "Red", "Blue", "White" } ( 'Date'[Year], 'Date'[MonthNumber] ) IN { ( 2017, 12 ), ( 2018, 1 ) }
This second example shows the syntax required to compare a set of columns (tuple) using the IN operator. Such syntax cannot be used with a comparison operator. In other words, the following syntax is not valid:
( 'Date'[Year], 'Date'[MonthNumber] ) = ( 2007, 12 )
However, we can rewrite it using the IN operator with a table constructor that has a single row, as in the following example:
( 'Date'[Year], 'Date'[MonthNumber] ) IN { ( 2007, 12 ) }
Conditional statements
In DAX we can write a conditional expression using the IF function. For example, we can write an expression returning MULTI or SINGLE depending on the quantity value being greater than one or not, respectively.
IF ( Sales[Quantity] > 1, "MULTI", "SINGLE" )
The IF function has three parameters, but only the first two are mandatory. The third is optional, and it defaults to BLANK. Consider the following code:
IF ( Sales[Quantity] > 1, Sales[Quantity] )
It corresponds to the following explicit version:
IF ( Sales[Quantity] > 1, Sales[Quantity], BLANK () )