Design a multidimensional business intelligence (BI) semantic model
- 9/7/2017
- Skill 1.1: Create a multidimensional database by using Microsoft SQL Server Analysis Services (SSAS)
- Skill 1.2: Design and implement dimensions in a cube
- Skill 1.3: Implement measures and measure groups in a cube
- Chapter summary
- Thought experiment
- Thought experiment answers
In this sample chapter from Exam Ref 70-768 Developing SQL Data Models, learn how to create a multidimensional database by using Microsoft SQL Server Analysis Services (SSAS), design and implement dimensions in a cube, and implement measures and measure groups in a cube.
A business intelligence semantic model is a semantic layer that you create to represent and enhance data for use in reporting and analysis applications. Microsoft SQL Server Analysis Services (SSAS) supports two types of business intelligence semantic models: multidimensional and tabular. In this chapter, we review the skills you need to create a multidimensional database, whereas we explore the skills necessary for creating a tabular model in Chapter 2, “Design a tabular BI semantic model.” We start with the steps required to physically instantiate a multidimensional database on an SSAS server. Then we work through the steps to perform, and the decisions to consider, for the two main objects in a multidimensional database, dimensions and measures.
Skill 1.1: Create a multidimensional database by using Microsoft SQL Server Analysis Services (SSAS)
Before you start the development process for a multidimensional database, you should spend some time thinking about its design and preparing your data for the new database. You are then ready to set up the database on the SSAS server and choose how you want SSAS to store data in the database.
Design, develop, and create multidimensional databases
The design process begins with an understanding of how people ask questions about their business. That is, you must decide how to translate your business requirements into a data model that is suitable as a source for a multidimensional database. Then after loading data into this data model, a process that is not covered in the exam, you proceed by creating a multidimensional database project in SQL Server Data Tools for Visual Studio 2015 (SSDT).
During the development process, you create supporting objects such as a data source and a data source view to define connectivity to your data model and to provide an abstraction layer for that data that you use for developing dimensions, measures, and cubes. As you work through each step, you deploy each newly created object to a multidimensional database on the SSAS server so you can test your work and ensure business requirements are met.
Source table design
An online transactional processing (OLTP) database is structured in third normal form with efficiency of storage and optimization of write operations, or low-volume read operations in mind. An SSAS multidimensional database is an online analytical processing (OLAP) database that is optimized for read operations of high-volume data. If you do not already have a data warehouse to use as a source for your multidimensional database, you should design a new data model in a relational database in which to store data that loads into SSAS.
Before you start the design of a new data model to use as a source for your multidimensional database, you should spend time understanding how the business users want to analyze data. You can interview them to find out the types of questions they want to answer with data analysis, and review the reports they use to find clues about important analytical elements. In particular, you want to discover the measures and dimensions that you need to create in the new data model. Measures are the numeric values to be analyzed, such as total sales, and dimensions are the people, places, things, and dates that provide context to these values. In this chapter, you learn how to develop a multidimensional database that can answer the following types of questions, also known as the business requirements, based on data for a fictional company, Wide World Importers:
What is the quantity sold of items by date, customer, salesperson, or location?
How many items are sold by color or by size?
How many items that require chilling are sold as compared to stock items that do not require chilling (dry items)?
How many sales occurred by date, customer, salesperson, or location?
What are total sales (with and without tax included), taxes, and profit by date, salesperson, location, or item?
When reviewing individual transactions, what is the tax rate and what is the unit price per item sold?
For each customer billed for sales, how do those sales break down by the customers receiving the items?
What reasons do customers give for making a purchase and how do sales dollars and sales counts break down by sales reason?
How many distinct items are sold by date, customer, salesperson, or location?
How many items are in inventory by date and what are the target stock levels and reorder points for each item?
When you evaluate questions, look for clues to measure, such as “how many,” “total,” “dollars,” or “count.” You should also note whether a value can be obtained directly from the OLTP system, or whether it must be calculated. If a value is calculated, decide whether it can be calculated on a scalar basis (row by row), and whether summing the calculated results can derive a grand total. Be on the alert for different terms that refer to the same measure, and then consult with your business users to determine which term to use in the multidimensional database. Using these criteria, the following measures emerge from the business requirements:
Quantity
Stock Item Distinct Count
Chiller Items Count
Dry Items Count
Sale Count
Sales Amount With Tax
Sales Amount Without Tax
Tax Amount
Tax Rate
Unit Price
Quantity On Hand
Reorder Level
Target Stock Level
Your next step is to review the business requirements again to identify dimensions. A common clue for a dimension is the word “by” in front of a candidate dimension, although sometimes it is only implicitly included in the requirements. A second review of the business requirements for Wide World Importers yields the following dimensions:
Date
Sometimes there are multiple dates associated with a transaction. It is important to know how each user community within your organization associates data with dates. At Wide World Importers, the sales department is interested in analyzing sales by invoice date, whereas the warehouse department wants to review sales by delivery date.
Customer
Employee
City
Stock Item
One of the Wide World Importers requirements is to analyze sales by color or size of an item. Although the word “by” is a clue, color and size are more accurately descriptors or characteristics of an item and therefore become part of a single dimension table for items. You do not normally create separate dimension tables for characteristics like this.
In an ideal data model for a multidimensional database, data is denormalized to minimize the number of joins across multiple tables by using a star schema, which consists of dimension tables and at least one fact table in which measures are stored. If you create a diagram by placing the fact table in the center and surround it by related dimensions, the diagram resembles a star shape, as shown in Figure 1-1. This example of a star schema is a selection of six tables from the WideWorldImportersDW sample database for SQL Server 2016 that answer some of the questions established as the requirements for the multidimensional database that you build throughout this chapter.
FIGURE 1-1 Star schema for a subset of tables in WideWorldImportersDW
Although a star schema is not required as a source for SSAS, it is a preferred structure for enterprise-scale multidimensional models for several reasons. First, the impact of accessing the OLTP system from SSAS adds resource contention to your environment that you can avoid by creating a separate data source. (The degree of impact on the OLTP depends on the storage model you select as explained in the “Select a storage model” section in this chapter.) Second, the amount of time to move data from the OLTP system into the multidimensional database is sometimes less optimal than it can be if you restructure the data first. Third, sometimes analysis requires access to historical information that is no longer preserved in the source system. Having a separate data model in which you store data as it changes becomes a necessity in that case. Other reasons for creating a separate data model include, but are not limited to, cleansing data that cannot be corrected in the OLTP system, having the results available not only to the multidimensional database, but other downstream reporting systems (often with better performance than querying the OLTP directly), and integrating data from multiple OLTP systems.
To load the star schema with data on a periodic basis, you use an extract-transform-load (ETL) tool, such as SQL Server Integration Services (SSIS). The ETL tool is typically scheduled to run nightly to load new and changed rows into the star schema, although business requirements might dictate a different frequency, such as every five minutes when low latency is required, or once per week for source data that changes infrequently.
A dimension table contains data about the entities that a business user wants to analyze—typically a person, place, thing, or point in time. One consideration when designing a dimension table is whether to track history. A slowly changing dimension (SCD) is a dimension for which you implement specific types of columns and ETL techniques specifically to address how to manage table updates when data changes. You make this design decision for each dimension separately based on business requirements. The two more common approaches to managing history include the following SCD types:
Type 1 Only current data is tracked. The ETL process updates columns with changed values and loses the values previously stored in those columns. For example, you might not track history for employee name changes, as shown in Figure 1-2, because the effect on sales is not likely to be relevant.
FIGURE 1-2 SCD Type 1 replaces existing values when source data changes
Type 2 Both current data and historical data is tracked. The ETL process expires the row containing the original data, thus identifying it as historical data, and creates a new active row containing the current data. When data in one of the columns changes in a row, the ETL process updates the Valid To date on the existing row to reflect the current date, and thus expires that record, and then adds a new record with the current values for each column and sets the Valid From date to the current date. Depending on the business rules, the Valid To date can be NULL or a future date such as 12/31/9999.
As an example, you might track history for changes to a stock item’s data such as retail price, as show in Figure 1-3, because you might want to monitor whether sales change when the retail price changes.
FIGURE 1-3 SCD Type 2 expires original row and inserts new row when source data changes
Let’s take a closer look at City, one of the dimension tables in the WideWorldImportersDW database, to understand the types of columns that it contains. Figure 1-2 shows the following types of columns commonly found in a dimension:
Surrogate key A primary key for the dimension table to uniquely identify each row. It normally has no business meaning and is often defined as an identity column. In the City table, the [City Key] column is the surrogate key. The purpose of a surrogate key is to prevent duplicate rows when combining data from multiple sources or capturing historical data for slowly changing dimensions. In addition, the source data for a dimension table might not use an integer value for its primary key whereas enforcing an integer-based key in a dimension table ensures optimal performance when processing the dimension to load its data in a multidimensional database.
Natural or business key The primary key in the source table for the dimension, which is [WWI City ID] in the City table. This column is typically not displayed to business users for analytical purposes, but is used to match rows from the source system to existing rows in the dimension table as part of the ETL process.
Attributes Descriptive columns about a row in the dimension table. Attributes in the City table include City, State Province, Country, Continent, Sales Territory, Region, Subregion, Location, and Latest Recorded Population. Attributes can be used explicitly for analysis to aggregate numeric values from the fact table, much like you use a GROUP BY clause in a Transact-SQL (T-SQL) SELECT statement.
Slowly changing dimension history A pair of columns to show the date range for which a row is valid. Slowly changing dimension history columns specify the date range for which the set of attributes in the row is valid in a Type 2 SCD. In WideWorldImportersDW, Valid From and Valid To are the columns fulfilling this role. Another common naming convention is StartDate and EndDate.
Although the City table in WideWorldImportersDW is implemented as if it were a slowly changing dimension, because it includes history columns Valid From and Valid To, the data in the table does not reflect changed data across the historical records. Instead, you can observe an example of slowly changing dimension history in the Stock Item dimension table, which shows a change in Color that resulted in multiple rows for the WWI Stock Item ID 2, as shown in Figure 1-4.
FIGURE 1-4 Multiple rows for a single stock item demonstrate Type 2 SCD handling
Notice there are three surrogate keys, Stock Item Key, for the same natural key, and WWI Stock Item ID. Each surrogate key is associated with a separate range of Valid From and Valid To dates.
Lineage or audit key An optional column in a dimension table that has a foreign key relationship to another table in which information about ETL processes is maintained. In WideWorldImportersDW, the City table, shown in Figure 1-5, has the Lineage Key column that relates to the Integration.Lineage table. This latter table includes the date and time when the ETL process started and ended, the table affected, and the success of the process.
FIGURE 1-5 City dimension table in WideWorldImportersDW database
A fact table contains many columns of numeric data. A business user analyzes the data in many of these columns in aggregate—typically by calculating sums, averages, or counts. Other numeric columns containing foreign keys are used to protect relational integrity with dimension tables through foreign key relationships. A fact table supports comparisons of these aggregate values over different time periods, such as this year’s total sales versus last year’s total sales, or across different groups, such as brands or colors of stock items. Figure 1-6 shows an example of Sale, a fact table in WideWorldImportersDW that contains the following types of columns:
Surrogate key A primary key for the fact table to uniquely identify each row. Like a surrogate key for a dimension table, it normally has no business meaning and is often defined as an identity column. Often, a fact table has no surrogate key because the collection of foreign key columns represents a unique composite key. The inclusion of a surrogate key for a fact table is a matter of preference by the data modeler.
Foreign keys One foreign key column for each dimension table that relates to the fact table. In the Sale table, the following columns are foreign key columns for dimensions: City Key, Customer Key, Bill To Customer Key, Stock Item Key, Invoice Date Key, Delivery Date Key, and Salesperson Key. The combination of foreign keys represents the granularity, or level of detail, of the fact table.
Degenerate dimension One or more optional columns in a fact table that represents a dimension value that is not stored in a separate table. Technically speaking, the column can be stored in a separate table because it represents a “thing,” such as an invoice that could be the subject of analysis. Often the value in a degenerate dimension is unique in each row, such as an invoice identifier. However, due to the cardinality between the degenerate dimension and the fact table, the model is more efficient when the degenerate dimension is part of the fact table. In other words, no join is required to join two potentially large tables. An example of this type of degenerate dimension column in the Sale fact table is the WWI Invoice ID column.
Another reason to create a degenerate dimension in a fact table is to optimize reporting for frequently requested data by avoiding a join between tables. In the Sale fact table, the Description and Package columns are examples of this other type of degenerate dimension.
Measure One or more columns that contain numeric data that describes an event or business process. The Sale fact table presents individual sales, so each row contains the following measure columns: Quantity, Unit Price, Tax Rate, Total Excluding Tax, Tax Amount, Profit, Total Including Tax, Total Dry Items, and Total Chiller Items.
Sometimes, a fact table contains no measure columns. In that case, it is known as a factless fact table. You might use a factless fact table when you need to count occurrences of an event and have no other measurements related to the event. Another type of factless fact table is a table that serves as a bridge table between a fact table and a dimension table, as described in the “Many-to-many dimension model” section in this chapter.
Lineage or audit key An optional column in a fact table just like the same type of column in a dimension table. In the Sale table, Lineage Key is the lineage column.
FIGURE 1-6 Sales fact table in WideWorldImportersDW database
Project creation
Once you have a star schema implemented and loaded with data, you are ready to create a project in which you perform the development of your multidimensional database. To do this, you use SSDT for which there is a link to the current version at https://msdn.microsoft.com/en-us/library/mt204009.aspx.
To create a new project, perform the following steps:
In the File menu, point to New, and then select Project.
In the New Project dialog box, select Analysis Services in the Business Intelligence group of templates, and then select Analysis Services Multidimensional and Data Mining Project.
At the bottom of the dialog box, type a name for the project, select a location, and optionally type a new name for the project’s solution. The project for the examples in this chapter is named 70-768-Ch1.
Data source development
Your next step is to define a relational data source for your multidimensional database. SSAS supports the following data sources:
Microsoft Access 2010 or higher
Microsoft SQL Server 2008 or higher
Microsoft Azure SQL Database
Microsoft Azure SQL Data Warehouse
Microsoft Analytics Platform System
Oracle 9i or higher
Teradata V2R6 or V12
Informix V11.10
IBM DB2 8.1
Sybase Adaptive Server Enterprise 15.0.2
A data source accessible by using an OLE DB provider
To add a data source, perform the following steps:
Right-click the Data Sources folder in the Solution Explorer window, and select New Data Source.
Click Next in the Data Source Wizard, and then click New on the Select How To Define The Connection page of the wizard.
In the Connection Manager dialog box, select a data provider and provide connection details for your data source. To follow the examples in this chapter, use the default provider, Native OLE DB\SQL Server Native Client 11.0. Type your server name or type a period (.), (local), or localhost if you are running SSDT on the same computer as your SQL Server database engine. If you have your database set up for SQL Server authentication, select SQL Server Authentication in the Authentication drop-down list, and then provide a user name and password in the respective text boxes. Last, select WideWorldImportersDW in the Select Or Enter A Database Name drop-down list, as shown in Figure 1-7.
FIGURE 1-7 Connection Manager dialog box for a data source
Click OK to close the Connection Manager dialog box, click Next in the Data Source Wizard, and then choose one of the following options for Impersonation Information, as shown in Figure 1-8, as authentication for the data source:
Use Specific Windows User Name And Password Use this option when you need to connect to your data source with a specific login. A security best practice is to establish a Windows login with low privileges that has read permission to the data source. When you deploy the data source to the server, SSAS encrypts the password to protect it. If you later need to script out the database, such as you might when you want to move from a development server to a production server, you must provide the password again.
Use The Service Account To follow the examples in this chapter, the selection of this option is recommended. With this option selected, the account running the SSAS service, which by default is NT Service\MSSQLServerOLAPService, is used to connect to the data source.
Use The Credentials Of The Current User Do not select this option when you are developing a multidimensional database. It is included to support authentication for data mining queries.
Inherit This option uses the impersonation information that is set for the DataSourceImpersonationInfo database property. Your options include Use A Specific Windows User Name And Password, Use The Service Account, Use The Credentials Of The Current User, or Default. If it is set to Default, the Inherit selection at the data source level uses the service account.
FIGURE 1-8 The Impersonation Information page of the Data Source Wizard
Click Next in the Data Source Wizard, change the value of the Data Source Name if you like, and then click Finish.
A new file is added to your project, Wide World Importers DW.ds. This file, like all the other files that you add to your project during the development process, is an Extensible Markup Language for Analysis (XMLA) file that defines the object to create in the multidimensional database. You can view the contents of any XMLA file by right-clicking it in the Solution Explorer window, and then selecting View Code.
Data source view design
The data source definition defines where the data is located and how to authenticate, but does not specify which data to use for loading into database objects, such as dimensions and cubes. A data source view (DSV) is the definition, which identifies the specific tables and columns within tables to use when populating the multidimensional database. It is also an abstraction layer that is useful when you have only read permission on the source database, or when you need a simplified view of its structures. You can make logical changes to the tables and views in the DSV to support specific requirements in SSAS without the need to change the physical data source. Generally speaking, you should attempt to change the physical data source as needed whenever possible because implementing logical changes in the DSV can introduce challenges into ongoing maintenance in addition to the troubleshooting process.
To create a DSV, perform the following steps:
Right-click the Data Source Views folder in Solution Explorer, and select New Data Source View.
In the Data Source View Wizard, click Next, select the data source that you created for the project in the previous section, and click Next.
Then, while pressing the CTRL key, select the following tables in the Available Objects list:
City (Dimension)
Customer (Dimension)
Date (Dimension)
Employee (Dimension)
Stock Item (Dimension)
Sale (Fact)
Then click the Right (>) arrow to add the selected tables to the Included Objects list.
Click Next, and then click Finish to complete the wizard. A new file, Wide World Importers DW.dsv, is added to the project, and Data Source View Designer is opened as a document window in SSDT, as shown in Figure 1-9.
FIGURE 1-9 A data source view
When the source tables have primary keys and foreign key relationships defined, which is the ideal situation, the DSV includes them. If you are working with a well-defined star schema, the DSV probably does not require any modification.
If a primary key is not defined in a dimension table, either by design or because the source object is a view, you can add a logical key by right-clicking the column name in the table diagram or in the Tables pane, and then selecting Set Logical Primary Key. This command is not available when a key already exists for the table. If you press the CTRL key, and then select multiple columns, you can configure a composite key as the logical primary key.
To create a relationship between tables, such as between a fact table and a dimension table, click the foreign key column in the fact table and then drag it to the primary key column in the dimension table. The two columns must have the same data type before you can create the relationship. The definition of relationships is not required, but is useful because SSDT can make recommendations or automatically configure certain objects based on the DSV relationships that it detects.
The structure of the DSV is important for the development of dimension and cube objects as you continue building out your multidimensional database. To satisfy specific structural requirements for subsequent development steps, or to simplify a star schema, you can make any of the following changes to the DSV:
Rename objects When you rename tables and columns in the DSV, the wizards that you use to create objects for the database reflect the new names, which should be a user-friendly name. Although you can later rename an object created by a wizard at any time, sometimes you might reference the same DSV object multiple times and can minimize the number of name changes required elsewhere in your project.
Create a named calculation The addition of a named calculation to a DSV is like adding an expression to a SELECT statement to add a derived column to a view. Ideally, when you need data structured in a particular way for SSAS, you design the source table to include that structure or create a view. A common reason to restructure data is to concatenate columns, such as you might need to do with First Name and Last Name columns to create one column for a person’s name.
If you are unable to make the change in the source, create a named calculation in the DSV by right-clicking the header of the table to update, and selecting New Named Calculation. In the Create Named Calculation dialog box, type a name in the Column Name box, optionally type a description for the calculation in the Description box, and then type a platform-specific Structured Query Language (SQL) expression for the new column in the Expression box. There is no validation of the expression when you click OK to close the dialog box, so be sure to test your expression in a query tool first. You use the syntax applicable to the version of SQL applicable to your data source, such as Transact-SQL when your data source is SQL Server.
Create a named query A named query in a DSV is the logical equivalent of a view in a relational database. Unlike a named calculation, which is limited to a SQL expression, a named query is a SELECT statement. You can use it to reduce the number of columns from a table to simplify the DSV, create new columns by using SQL expressions, or join tables together to simplify the data structures in the DSV and avoid a snowflake design. You can find examples of adding named queries to the DSV in Skill 1.2, “Design and implement dimensions in a cube,” and Skill 1.3, “Implement measures and measure groups in a cube.”
Develop a dimension
A dimension in a cube is based on one or more dimension tables in a star schema. Skill 1.2 details the various options you must consider when developing different types of dimension models in SSAS. Regardless of model type, the development process for all dimensions begins by performing the same steps explained in this section.
Let’s review the basic dimension development process by adding the City dimension to the current database project. To this, perform the following steps:
Right-click the Dimensions folder in Solution Explorer, and click New Dimension.
In the Dimension Wizard, click Next on the first page, and then, on the Select Creation Method page, choose the Use An Existing Table option. Click Next.
On the Specify Source Information page of the wizard, select the name of the table containing the most granular level of detail for a dimension in the Main Table drop-down list. For the current example, select City. When the DSV is designed correctly, the key column in the source table displays automatically in the Key Columns list. The selection of a Name Column, such as City, as shown in Figure 1-10, is optional, but recommended when the key column is a surrogate key.
FIGURE 1-10 Specify Source Information page of the Dimension Wizard
If you follow proper dimensional modeling design principles as described in the “Source table design” section earlier in this chapter, your key column is an integer value. If you do not specify a name column during dimension development, business users will see the dimension’s key column value when exploring a cube, which has no meaning to them. The Name Column should be set to a column in the dimension table that contains a meaningful value. Unlike the key column, which can be a composite key, the name column must reference a single column. If necessary, you can use a named calculation or named query to combine values from a multiple column into one column.
Click Next to continue. On the Select Dimension Attributes page of the Dimension Wizard, notice the existing selection of the City Key check box, and then select the check box for the following attribute columns, as shown in Figure 1-11:
State Province
Country
Continent
Sales Territory
Region
Subregion
Latest Recorded Population
FIGURE 1-11 Select Dimension Attributes page of the Dimension Wizard
When business users analyze data by using a multidimensional model, they use attributes primarily for grouping aggregate values in a report, or to filter values, as shown in Figure 1-12. In this example, SSAS calculates the sum of the Sales Amount With Tax measure for each State Province attribute member appearing on rows. An attribute member is a distinct value from the source column that is bound to the attribute. Figure 1-12 shows attribute members for the State Province attribute such as Alaska, California, and so on. Above the set of State Province members, you can see a filter applied based on another attribute, Sales Territory. Specifically, the filter uses the Far West member of the Sales Territory attribute. SSAS ignores the values for any State Province attribute member that is not related to the Far West Sales Territory and eliminates those attribute members from the query results.
FIGURE 1-12 Attribute members used for aggregate grouping and filtering
As explained earlier in the “Source table design” section, a dimension table can include columns that are never used for analysis, such as the Valid From and Valid To columns in the City dimension table. These columns are used for managing historical data in the ETL process, but are not useful for business analysis and are therefore normally excluded from the dimension in the SSAS project.
To complete the wizard, click Next, and then click Finish. The City.dim file is added to the project, and the dimension designer window is displayed in SSDT, as shown in Figure 1-13. The attributes that you selected in the wizard now appear in the Attributes pane of the Dimension Structure page of the dimension designer. In addition, a diagram of the source table displays in the Data Source View pane. After you develop a cube for the multidimensional database, you can add this newly created dimension to that cube. However, there might be additional steps to perform as part of the dimension development process to satisfy specific analysis requirements, as described next, and later in Skill 1.2.
FIGURE 1-13 Dimension designer for City
When you create a dimension and then browse the members in the dimension, occasionally, a member named Unknown appears that does not exist in the source table. The addition of this member to a dimension is a feature in SSAS that gracefully accommodates data quality issues in a fact table. The Unknown member that SSAS creates does not correspond to a row in your dimension table, but serves as a bucket for values for which a key in the foreign key column in the fact table is missing or invalid. That way, when queries calculate totals for a measure, all rows in the fact table are included (after applying the applicable groupings and filters) even when they cannot be matched to existing dimension members.
The addition of the SSAS-generated member is determined by the value of the UnknownMember property for a dimension, which you can access in the Properties window from the Dimension Structure page of the dimension designer after selecting the dimension object in the Attributes pane. The member is added if the UnknownMember property value is Visible or Hidden. Either way, the grand total values for a dimension show the correct aggregated value for a fact table. When this property is set to Visible and your query includes the dimension’s members in the results, you can see the aggregated value for fact table rows with a missing or incorrect key for that dimension assigned to the Unknown member. When this property is set to Hidden, the Unknown member does not display in the query results, although the grand total correctly includes its values. However, hiding the Unknown member can be confusing to business users who might notice the aggregation of visible members does not equal the grand total. For this reason, it is not considered best practice to hide the Unknown member.
In Skill 1.2, we review specific usage scenarios that require additional development steps. Meanwhile, as a general practice for all dimensions that you develop, you should evaluate which, if any, of the following tasks are necessary:
Rename objects For example, notice in the Attributes pane that there is a key icon next to City Key to designate it as the key attribute. In other words, this is the attribute that is referenced in the fact table. In the Dimension Wizard, you specified City Key as the key column and City as the name column. The attribute’s name is inherited from the key column name. In this case, the name City Key is not user-friendly because many business users might not understand what Key means. More generally, consider avoiding technical terms and naming objects by using embedded spaces, capitalization, and business terms to create user-friendly objects. To rename a dimension or an attribute, select it in the Attributes pane. You can either right-click the object, select Rename, and type the new name, or replace the Name property in the Attributes pane. In this case, rename the City Key attribute to City.
Change the sort order Each attribute member has an OrderBy property that sets the default sort order that determines the arrangement of attribute members in a group, such as when you display many attribute members on rows in a pivot table. In most cases, the attribute members display in alphabetical order, but you can override this behavior if you have a column with values that define an alternate sort order. You can choose one of the following values for the OrderBy property.
Key This value is the default for every attribute except the key attribute if you assigned it a name column in the Dimension Wizard because every attribute always has a key column, but optionally has a name column. The sort order is alphabetical if the key column is a string data type (Char or WChar), or numerical if the key column data type is not a string.
Name The sort order is set to Name for the key attribute by default when you specify a name column in the Dimension Wizard. Any attribute in a dimension can have separate columns assigned as the key column or name column, so you can define the sort order based on the name column if that is the case.
AttributeKey You can also define the sort order for an attribute based on the key column value of another attribute, but only if an attribute relationship (described in Skill 1.2) exists between the two attributes. As an example, in an Account dimension, you can have an Account Name attribute that you want to sort by the related Account Number attribute rather than alphabetically by the account’s name.
AttributeName This option is like the AttributeKey sort order, except that the ordering is based on the name column of the related attribute instead of the key column.
Convert an attribute into a member property A member property is an attribute that is not used in a pivot table for placement on rows or columns, or as a pivot table filter, but is used instead for reporting purposes. For example, you can display sales by customer and include each customer’s telephone number in a report, but would not typically show sales by telephone number. You can also use a member property as a filter in an ad hoc MDX query.
In the City dimension, change the AttributeHierarchyEnabled property to False for the Latest Recorded Population attribute to convert it to a member property. When you do this, the attribute does not display with the other attributes in client applications, although you can still reference it in MDX queries. Its availability for use in reporting depends on the tool. As an example, you can display a member property as a tooltip when you hover your cursor over a related attribute in a Microsoft Excel pivot table.
Group attribute members into ranges A less commonly used, but no less important, feature of SSAS is its ability to break down a large set of attribute members into discrete ranges. For example, if you have 100,000 customers, it is impractical for a user to add all customers at once to a pivot table. Instead, you can configure SSAS to create groups of customers by setting the DiscretizationBucketCount and DiscretizationMethod properties.
Add translations Global organizations can build a single multidimensional database and then add translations to display captions for dimensions and attributes that are specific to a user’s locale. You can also optionally bind an attribute to columns containing translated attribute member names.
A best practice in programming is to build code once, and then to reference that code many times. You might think that a similar best practice exists in SSAS. Let’s say that you have multiple SSAS servers set up in your company to support different departments, and host different multidimensional databases on each server. If you follow dimensional modeling best practices, you have conformed dimensions in your data mart. That is, you have dimension tables that you relate to different fact tables, such as a Date dimension that is common to all analysis. If you build a date dimension for use in one multidimensional database, why not use the same dimension object in the other multidimensional database? SSAS allows you to add a linked dimension to that other multidimensional database so that you only have one dimension to build and maintain. However, the use of linked dimensions is not considered to be best practice in SSAS development because it can introduce performance problems in large cubes.
Develop a cube
A cube is the object that business users explore when interacting with a multidimensional database. It combines measure data from fact tables with dimension data. At minimum, you associate a single fact table with a cube, although it is common to associate multiple fact tables with one cube.
Like dimension development, cube development begins by using a wizard. To add a cube to your project, perform the following steps:
Launch the Cube Wizard by right-clicking the Cubes folder in Solution Explorer, and selecting New Cube.
Click Next on the first page of the wizard, keep the default option, Use Existing Tables, and then click Next.
On the Select Measure Group Tables page of the wizard, select a fact table, such as Sale, and then click Next. In SSAS, a measure group table is synonymous with a fact table. The reason for a more generic term is due to the ability to create a measure group from any type of table as long as it contains rows that can be counted. Therefore, think about a measure group as a container of measures.
On the Select Measures page of the Cube Wizard, only columns with a numeric data type in the fact table are displayed. Clear each check box that you do not want to use for analysis, such as WWI Invoice ID and Lineage Key in this example, as shown in Figure 1-14. Notice the inclusion of Sale Count, which is not a column in the fact table. SSAS suggests this column because counting the number of rows in a fact table and grouping the result by dimension attributes is a common business requirement. You can clear the check box for this derived measure if it is not necessary in your own project.
FIGURE 1-14 Select Measures page of the Cube Wizard
Click Next to continue to the Select Existing Dimensions page of the wizard. As shown in Figure 1-15, this page displays dimensions that currently exist in the project.
FIGURE 1-15 Select Existing Dimensions page of the Cube Wizard
A common development approach is to build one dimension, build a cube, deploy the project, and then review the results. (Troubleshooting any problems that arise is easier when you deploy one object at a time.) You then continue iteratively by defining another dimension, adding it to the cube, deploying the project, and then reviewing the changes.
However, you can decide to define multiple dimensions first. If you do, you see the available dimensions listed here and can include or exclude them as needed when using the Cube Wizard. On the other hand, when you create dimensions after you complete the Cube Wizard, you follow different steps to add them to the cube as explained in Skill 1.2.
Click Next, and then, on the Select New Dimensions page of the wizard, clear all of the check boxes, as shown in Figure 1-16, to continue the current example in which you are developing a simple cube with one dimension. You can do this in one step by clearing the Dimension check box at the top of the page.
FIGURE 1-16 Select New Dimensions page of the Cube Wizard
The purpose of this page in the wizard is to identify dimension tables for which a relationship exists with the fact table or tables selected for the cube. If you keep a table selected on this page, the Cube Wizard creates a basic dimension for that table and adds it to the project as a new .dim file. You can then configure the dimension’s properties as needed to meet your analysis requirements.
To finish the wizard, click Next, and then click Finish to add the Wide World Importers DW.cube file to the project and display the cube designer as an SSDT window.
Expand Sale in the Measures pane of the Cube Structure page of the cube designer to see all of the measures added to the cube, as shown in Figure 1-17.
FIGURE 1-17 Cube designer
The cube designer also includes a Dimensions pane that displays the one dimension that is included in the cube, City. In addition, the Data Source View pane displays a diagram of the fact table, displayed with a yellow header, and the related dimension table, displayed with a blue header.
Create a multidimensional database
At this point, you have created the definitions of a data source, a data source view, a dimension, and a cube, which exist only in the context of a project, but the database does not yet exist on the server and no one can explore the cube until it does. To create the database on the server, you must deploy the project from SSDT. To ensure you deploy the project to the correct server, review, and if necessary, update the project’s properties. To do this, right-click the project name in Solution Explorer and select Properties. In the 70-768-Ch1 Property Pages dialog box, select the Deployment page, as shown in Figure 1-18, and then update the Server text box with the name of your SSAS server if you need to deploy to a remote server rather than locally.
FIGURE 1-18 SSAS project deployment properties
The following additional deployment options are also available on this page:
Processing Option The default value is Default, which evaluates the processed state of each object in the solution and takes the necessary steps to put the object into a processed state when you deploy the project. You can change this value to Do Not Process if you want to manually instruct the server to process objects as a separate step, or Full if you want the database to process completely each time you deploy your project from SSDT. For now, the default value is sufficient for working through the examples in this chapter. In Chapter 4, these processing options are described in greater detail.
Transactional Deployment The default value is False, which means that deployment is not transactional. In that case, deployment can succeed, but processing can fail. Choose True if you want to roll back deployment if processing fails.
Server Mode The default value is Deploy Changes Only, which deploys only objects from your project that do not yet exist on the server, or are different from those on the server. This option is faster than the Deploy All option, which deploys all objects in your project to the server.
When you are ready to create the database on the server, right-click the project in Solution Explorer, and select Deploy. The first time you perform this step, the deployment process creates the database on the server and adds any objects that you have defined in the project. Each subsequent time that you deploy the project, as long as you have kept the default deployment options in the project properties, the deployment process preserves the existing database and adds any new database objects, and updates any database objects that you have modified in the project.
You can confirm the creation of the multidimensional database by opening SSMS and connecting to Analysis Services. In Object Explorer, expand the Databases folder to view the existing databases, and then expand the subfolders to see the objects currently in the database, as shown in Figure 1-19.
FIGURE 1-19 Multidimensional database objects visible in the SQL Server Management Studio Object Explorer
Now the cube is ready to explore using any tool that is compatible with SSAS, such as Excel, Microsoft Power BI Desktop, or SSRS, but it is not yet set up as well as it can be. There are more dimensions and fact tables to add, as well as many configuration changes to the dimension and cube to consider and implement that you learn about in Skills 1.2 and 1.3.
Select a storage model
The storage model you select for your multidimensional model determines the type of data that SSAS stores and how it physically stores that data on disk. SSAS supports the following storage models:
Multidimensional OLAP (MOLAP)
Relational OLAP (ROLAP)
Hybrid OLAP (HOLAP)
Regardless of which storage model you select, cube data is always stored separately from dimension data. Furthermore, you can specify a different storage model for cube data than you do for dimension data. You can even separate cube data into multiple partitions, each using a different storage model. For now, consider which storage mode is most appropriate for your requirements in general. In Chapter 4, you learn the reasons why you might configure different storage models for partitions and how to implement storage models by partition.
MOLAP
MOLAP is the default storage mode. Generally speaking, it is also the preferred storage mode unless you have specific requirements that only one of the other two storage modes can meet because it performs fastest. On the other hand, it requires the most storage space.
MOLAP storage compresses data and distributes it more efficiently on disk than the other storage models, but it also requires the most time to process data. When SSAS processes data for MOLAP storage, it uses the DSV to generate the necessary SQL statements that retrieve data from the relational source, restructures the results for storage on disk in a proprietary format, and then calculates and stores aggregations for measures, as shown in Figure 1-20. Aggregations are calculated only if you have defined aggregation rules as described in Chapter 4. When SSAS receives a request for data from a Multidimensional Expression (MDX) query, it gets the aggregated data or detail data in MOLAP storage to prepare the results.
FIGURE 1-20 MOLAP storage
ROLAP
With ROLAP storage, SSAS does not retrieve data from the relational source during processing. Instead, processing involves only checking the consistency of the data and therefore runs much faster than MOLAP processing. If you configure aggregations in addition to ROLAP storage, the aggregated values are stored in the data source, as shown in Figure 1-21. When SSAS receives an MDX request for data, it uses the DSV to translate the request into an SQL statement and sends that statement to the data source. For example, if the data source is SQL Server, the MDX is translated into T-SQL.
FIGURE 1-21 ROLAP storage
ROLAP storage is the best choice when your business users require near real-time access to data or when a dimension table has hundreds of millions of rows. A potential trade-off is slower query performance due to the translation of the query from MDX to SQL. However, there are techniques you can apply to mitigate the performance degradation as explained in Chapter 4.
HOLAP
HOLAP storage is a combination of MOLAP and ROLAP. Detail data is kept in the relational data source, but aggregate data and indexes are loaded into MOLAP storage during processing, as shown in Figure 1-22. When SSAS processes an MDX query, it determines where the data necessary to resolve the query resides. If it needs detail data, it translates the query to SQL and sends the translated statement to the data source. Otherwise, it retrieves aggregated data from MOLAP storage.
FIGURE 1-22 ROLAP storage
HOLAP queries are potentially slower to resolve. Furthermore, storage requirements on the SSAS server are not truly lower because SSAS must read all the data from the data source to build the aggregations and indexes and requires enough space to do this. SSAS disposes of the data afterwards, thus reducing storage requirements. Because you must factor in enough storage overhead for processing, HOLAP is not a good choice if your goal is to minimize storage.