Design and Implement an Azure Storage Strategy

3/11/2015

Contents

Objective 4.1: Implement Azure Storage blobs and Azure files
Objective 4.2: Implement Azure Storage tables
Objective 4.3: Implement Azure storage queues
Objective 4.4: Manage access
Objective 4.5: Monitor storage
Objective 4.6: Implement SQL databases
Answers

Objective 4.2: Implement Azure Storage tables

Azure Storage is a non-relational (NoSQL) entity storage service on Microsoft Azure. When you create a storage account, it includes the Table service alongside the Blob and Queue services. Table services can be accessed through a URL format. It looks like this:

http://<storage account name>.table.core.windows.net/<table name>.

There are many forms of NoSQL databases:

Key-value stores that organize data with a unique key per record and often allow for jagged entries where each row might not have a complete set of values.
Document databases that are similar to key-value stores with semi-structured, easy-to-query documents. Usually, information is stored in JavaScript Object Notation (JSON) format.
Columnar stores that are used to organize large amounts of distributed information.
Graph databases that do not use columns and rows; instead, they use a graph model for storage and query, usually for large amounts of highly distributed data.

Table storage is a key-value store that uses a partition key to help with scale out distribution of data and a row key for unique access to a particular entry. Together, these keys are used to uniquely identify a record in the account.

Using basic CRUD operations

In this section, you learn how to access table storage programmatically.

Creating a table

Create a C# console application.

In your app.config file, add an entry under the Configuration element, replacing the account name and key with your own storage account details:

<configuration>
  <appSettings>
    <add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;Accou
ntName=<your account name>;AccountKey=<your account key>" />
  </appSettings>
</configuration>

Use NuGet to obtain the Microsoft.WindowsAzure.Storage.dll. An easy way to do this is by using the following command in the NuGet console:
```
Install-package windowsazure.storage –version 3.0.3
```

Add the following using statements to the top of your Program.cs file:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Table;
using Microsoft.WindowsAzure;
using System.Configuration

Add a reference to System.Configuration.

Type the following command to retrieve your connection string in the Main function of Program.cs:

var storageAccount = CloudStorageAccount.Parse(
var storageAccount =CloudStorageAccount.Parse( ConfigurationManager.AppSettings["S
torageConnectionString"]);

Use the following command to create a table if one doesn’t already exist:

CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("customers");
table.CreateIfNotExists();

Inserting records

To add entries to a table, you create objects based on the TableEntity base class and serialize them into the table using the Storage Client Library. The following properties are provided for you in this base class:

Partition Key Used to partition data across storage infrastructure
Row Key Unique identifier in a partition
Timestamp Time of last update maintained by Azure Storage
ETag Used internally to provide optimistic concurrency

The combination of partition key and row key must be unique within the table. This combination is used for load balancing and scaling, as well as for querying and sorting entities.

Follow these steps to add code that inserts records:

Add a class to your project, and then add the following code to it:

using Microsoft.WindowsAzure.Storage.Table;
public class OrderEntity : TableEntity
{
 public OrderEntity(string customerName, String orderDate)
 {
  this.PartitionKey = customerName;
  this.RowKey = orderDate;
 }
 public OrderEntity() { }
 public string OrderNumber { get; set; }
 public DateTime RequiredDate { get; set; }
 public DateTime ShippedDate { get; set; }
 public string Status { get; set; }
}

Add the following code to the console program to insert a record:

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders");
OrderEntity newOrder = new OrderEntity("Archer", "20141216");
newOrder.OrderNumber = "101";
newOrder.ShippedDate = Convert.ToDateTime("20141218");
newOrder.RequiredDate = Convert.ToDateTime("20141222");
newOrder.Status = "shipped";
TableOperation insertOperation = TableOperation.Insert(newOrder);
table.Execute(insertOperation);

Inserting multiple records in a transaction

You can group inserts and other operations into a single batch transaction. All operations in the batch must take place on the same partition. You can have up to 100 entities in a batch. The total batch payload size cannot be greater than 4 MB.

The following code illustrates how to insert several records as part of a single transaction:

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders");
TableBatchOperation batchOperation = new TableBatchOperation();

OrderEntity newOrder1 = new OrderEntity("Lana", "20141217");
newOrder1.OrderNumber = "102";
newOrder1.ShippedDate = Convert.ToDateTime("1/1/1900");
newOrder1.RequiredDate = Convert.ToDateTime("1/1/1900");
newOrder1.Status = "pending";
OrderEntity newOrder2 = new OrderEntity("Lana", "20141218");
newOrder2.OrderNumber = "103";
newOrder2.ShippedDate = Convert.ToDateTime("1/1/1900");
newOrder2.RequiredDate = Convert.ToDateTime("12/25/2014");
newOrder2.Status = "open";
OrderEntity newOrder3 = new OrderEntity("Lana", "20141219");
newOrder3.OrderNumber = "103";
newOrder3.ShippedDate = Convert.ToDateTime("12/17/2014");
newOrder3.RequiredDate = Convert.ToDateTime("12/17/2014");
newOrder3.Status = "shipped";

batchOperation.Insert(newOrder1);
batchOperation.Insert(newOrder2);
batchOperation.Insert(newOrder3);
table.ExecuteBatch(batchOperation);

Getting records in a partition

You can select all of the entities in a partition or a range of entities by partition and row key. Wherever possible, you should try to query with the partition key and row key. Querying entities by other properties does not work well because it launches a scan of the entire table.

Within a table, entities are ordered within the partition key. Within a partition, entities are ordered by the row key. RowKey is a string property, so sorting is handled as a string sort. If you are using a date value for your RowKey property use the following order: year, month, day. For instance, use 20140108 for January 8, 2014.

The following code requests all records within a partition using the PartitionKey property to query:

CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders");
TableQuery<OrderEntity> query = new TableQuery<OrderEntity>().Where(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Lana"));

foreach (OrderEntity entity in table.ExecuteQuery(query))
{
 Console.WriteLine("{0}, {1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey,
 entity.Status, entity.RequiredDate);
}

Updating records

One technique you can use to update a record is to use InsertOrReplace(). This creates the record if one does not already exist or updates an existing record. Here’s an example:

CloudStorageAccount storageAccount = CloudStorageAccount.
Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("orders1");
TableOperation retrieveOperation = TableOperation.Retrieve<OrderEntity>("Lana",
"20141217");
TableResult retrievedResult = table.Execute(retrieveOperation);
OrderEntity updateEntity = (OrderEntity)retrievedResult.Result;
if (updateEntity != null)
{
  updateEntity.Status = "shipped";
  updateEntity.ShippedDate = Convert.ToDateTime("12/20/2014");
  TableOperation insertOrReplaceOperation = TableOperation.
InsertOrReplace(updateEntity);
  table.Execute(insertOrReplaceOperation);
}

Deleting a record

To delete a record, first retrieve the record as shown in earlier examples, and then delete it with code, such as this:

TableOperation deleteOperation = TableOperation.Delete(deleteEntity);
table.Execute(deleteOperation);
Console.WriteLine("Entity deleted.");

Querying using ODATA

The Storage API for tables supports OData, which exposes a simple query interface for interacting with table data. Table storage does not support anonymous access, so you must supply credentials using the account key or a Shared Access Signature (SAS) (discussed in “Manage Access”) before you can perform requests using OData.

To query what tables you have created, provide credentials, and issue a GET request as follows:

https://myaccount.table.core.windows.net/Tables

To query the entities in a specific table, provide credentials, and issue a GET request formatted as follows:

https://<your account name>.table.core.windows.net/<your table
name>(PartitionKey='<partition-key>',RowKey='<row-key>')?$select=<comma separated
property names>

Designing, managing, and scaling table partitions

The Azure Table service can scale to handle massive amounts of structured data and billions of records. To handle that amount, tables are partitioned. The partition key is the unit of scale for storage tables. The table service will spread your table to multiple servers and key all rows with the same partition key co-located. Thus, the partition key is an important grouping, not only for querying but also for scalability.

There are three types of partition keys to choose from:

Single value There is one partition key for the entire table. This favors a small number of entities. It also makes batch transactions easier since batch transactions need to share a partition key to run without error. It does not scale well for large tables since all rows will be on the same partition server.
Multiple values This might place each partition on its own partition server. If the partition size is smaller, it’s easier for Azure to load balance the partitions. Partitions might get slower as the number of entities increases. This might make further partitioning necessary at some point.
Unique values This is many small partitions. This is highly scalable, but batch transactions are not possible.

For query performance, you should use the partition key and row key together when possible. This leads to an exact row match. The next best thing is to have an exact partition match with a row range. It is best to avoid scanning the entire table.

THOUGHT EXPERIMENT: Partitioning data

In this thought experiment, apply what you’ve learned about this objective. You can find answers to these questions in the “Answers” section at the end of this chapter.

Your organization produces lasers that etch patterns for processors and memory. On a regular interval, the laser sends the shot count of how many times the laser fired to your application. The shot count is cumulative by day. Your company built more than 500 of these lasers and distributed them around the world. Each laser has its own machine identifier. Each time the shot count is sent, it includes a time stamp. The analysts are mostly concerned with the most recent shot count sent.

What should you use for the partition key? How many partitions should you create?
How should you create the row key?
How many tables should you build? What’s in each table?

Objective summary

Table storage is a non-relational database implementation (NoSQL) following the key-value database pattern.
Table entries each have a partition key and row key. The partition key is used to logically group rows that are related; the row key is a unique entry for the row.
The Table service uses the partition key for distributing collections of rows across physical partitions in Azure to automatically scale out the database as needed.
A Table storage query returns up to 1,000 records per request, and will time out after five seconds.
Querying Table storage with both the partition and row key results in fast queries. A table scan is required for queries that do not use these keys.

Objective review

Answer the following questions to test your knowledge of the information in this objective. You can find the answers to these questions and explanations of why each answer choice is correct or incorrect in the “Answers” section at the end of this chapter.

Which of the following is not a method for replicating a Table storage account?
1. Transactional replication
2. Zone redundant storage
3. Read access geo-redundant storage
4. Geo-redundant storage
How should you choose a good partition key for a Table storage implementation? (Choose all that apply.)
1. They should always be unique, like a primary key in a SQL table.
2. You should always use the same partition key for all records.
3. Think about how you’re likely to update the data using batch transactions.
4. Find an even way to split them so that you have relatively even partition sizes.
Which of the following statements are correct for submitting operations in a batch? (Choose all that apply.)
1. All operations have to be in the same partition.
2. Total batch size can’t be greater than 4 MB.
3. Max operation count is 100.
4. Minimum operation count is three.