Design a security operations strategy
- By Yuri Diogenes, Gladys Rodriguez, Mark Simos, Sarah Young
- 6/26/2023
- Skill 2-1: Design a logging and auditing strategy to support security operations
- Skill 2-2: Develop security operations to support a hybrid or multi-cloud environment
In this sample chapter from Exam Ref SC-100 Microsoft Cybersecurity Architect learn how to define critical events, centralize log collection, and determine valuable security logs. With effective use cases and optimized log retention, you can stay one step ahead of cyber threats.
Security operations are often thought of as the “cool” bit of security. Aside from the stereotype of people in black hoodies hunched over their keyboards, it is often the next thing that comes to mind when both the public and non-security stakeholders think about IT security. The general security operations stereotype is of many dedicated security operations center (SOC) analysts in an operations room with a big screen, a la War Games, where attacks are stopped in their tracks.
While aspects of this stereotype are indeed based on reality, crafting a modern security operations strategy ideally starts long before you hire a single SOC analyst. Before you can detect attacks, you need to be able to define the events you need to collect in your environment, select the right tooling, and so on.
Skills covered in this chapter:
Skill 2-1: Design a logging and auditing strategy to support security operations
Skill 2-2: Develop security operations to support a hybrid or multi-cloud environment
Skill 2-3: Design a strategy for SIEM and SOAR
Skill 2-4: Evaluate security workflows
Skill 2-5: Evaluate a security operations strategy for the incident management lifecycle
Skill 2-6: Evaluate a security operations strategy for sharing technical threat intelligence
Skill 2-1: Design a logging and auditing strategy to support security operations
From a top-down IT organizational view, logging and auditing isn’t an activity that is necessarily exclusive to the security domain. Many other IT functions need logs from the IT environment; most commonly, this is the IT operations and internal audit function within an organization. However, the logs these functions require are unlikely to be the same as those required for a security operations function (although there is often some overlap). This section of the chapter covers the skills necessary to design a logging and auditing strategy to support security operations according to the Exam SC-100 outline.
Centralizing log collection
In the past, it wasn’t unusual to have logs collected in several different stores throughout an organization’s IT environment because having a central place to store logs could be costly and complicated to set up. With the advent of cloud services, those challenges have largely dissipated, so you should aim to have logs collected in one central store: this makes management and querying of the logs more straightforward and efficient. In global organizations, local regulatory requirements may exist to keep log data within a certain jurisdiction. If this is the case, endeavor to minimize the number of log stores required to keep the log collection architecture as simple as possible.
Deciding which logs have security value
Security operations always start with logs. Without logs, we can have no visibility into what is happening in the IT environment. However, while most components of an IT environment create logs—whether they be user accounts, applications, virtual machines (VMs), or firewalls—not all logs have security value. Logging and storing logs for the sake of logging and storing logs is an unsustainable practice, albeit one that has been used countless times in the past. The disadvantages to this approach are many, but the three key downsides are:
Cost and management of log storage The more logs you have, the more storage you need to have available to store said logs. As all good architects are aware cost is king. Most stakeholders who hold the purse strings will need justification as to why the spend on log storage is necessary, so it is important to be able to demonstrate that the logs you are collecting and storing are actively being used in security operations. In Log Analytics (the underlying log storage mechanism for Microsoft Sentinel), the cost of the platform is based on the number of logs ingested into a workspace and how long they are retained.
Performance impact of log searches The more logs your query must search through, the slower the query may become. The impact on the query performance will depend on the log-searching tool you use. For example, if you use a cloud-based security information and event management (SIEM) tool such as Microsoft Sentinel that can increase its compute power as required, the performance hit will be less than if you were using something that was on-premises and had a fixed compute capability. However, it is still something to consider when designing your logging and auditing strategy. Even Microsoft Sentinel queries using Kusto Query Language (KQL) can time out if the query is taking too long to run.
Complexity of queries With more logs to search through, the complexity of queries and detections being run as part of the security operations function may increase, which can put more overheads on SOC analysts and the general management of your SOC.
But how do you decide which logs have value in the context of security operations? This is a question that security architects have long been plagued by, and sadly there is no one-size-fits-all answer that can be applied to every organization. However, a rule of thumb is if a log isn’t going to be used for reactive security operations (such as detections) and If it isn’t going to be used for proactive security operations (such as hunting), then, as a security architect, you need to seriously question whether that log needs to be collected. If you can’t demonstrate a tangible security use for a log type, then it probably shouldn’t be collected by your SOC.
Too often, this step is overlooked. Rather than questioning what a log will be used for when connecting or activating a new device in the environment, a security engineer will just select all of them rather than evaluating each log, as shown in Figure 2-1.
FIGURE 2-1 Log selection for Azure Key Vault in the Azure portal
Designing security operations use cases
The most effective method for creating an effective logging and auditing strategy is to know your organization’s security operations use cases. Having a range of use cases from different parts of your organization allows you to work back from the use case and determine which logs and audit trails are required to allow said use case to be realized in your security operations.
For example, let’s say the accounting department needs the security operations team to take action and investigate if any privilege escalation occurs in their SAP system that handles payment of expenses and employee salaries. This is a high-priority issue because it could disrupt the payment of employee salaries. A malicious actor could start changing the bank accounts that are paid. Because this issue is so sensitive, the accounting team wants any account involved in such an incident to be disabled immediately. There are three questions we need to ask:
The first question we need to ask is this: What logs do we need to be able to create a detection for this in our SIEM? In this case, we will need the audit logs from SAP, and if it’s SAP on Azure, we might also require the Azure AD sign-in and audit logs to get more detail about user actions.
After we have determined what logs we need, the second step is to create a detection that will pick up on this activity. In Microsoft Sentinel, we would create an analytics rule with the appropriate KQL to determine this in the logs. At this stage, we would also need to determine how frequently the analytics rule would need to run. Because it’s a high priority, it will likely be running more frequently (maybe every 10 minutes), and the SOC team needs to be alerted quickly and be able to take action if an incident is raised.
Thirdly, we can now design automation to go along with this incident: The accounts team has asked that any user account in such an incident be blocked. We can create remediation automation in Azure Logic Apps to immediately trigger an account block if an incident is raised.
No doubt, creating security operations use cases takes time and effort, but it allows security architects to have full traceability of the whys when questioned about a logging and auditing strategy.
The MITRE ATT&CK coverage screen in Microsoft Sentinel is shown in Figure 2-2.
FIGURE 2-2 Mapping MITRE ATT&CK coverage in Microsoft Sentinel
Determining log retention periods
The rule of thumb and determining your use cases for security operations discussed in the previous section is a great starting point for deciding which logs have security value. Still, even within the logs with value, not all logs are created equal. Many organizations split their security logs into:
Low fidelity Logs that are noisy and high-volume but provide less useful signals to be used in security operations (for example, Azure Firewall logs)
High fidelity Consolidated alerts from specialized security tools that have already analyzed the raw logs (such as Microsoft Defender for Cloud alerts)
In that same vein, not all logs collected for use in security operations need to be retained for the same period. Determining the exact retention periods required for the security and audit logs for your organization will come down to several factors, which may include:
Use cases
Budget
An organization’s internal IT standards
External regulatory IT standards
Remember that not all logs need to be retained for the same period. For example, in Log Analytics, you can configure retention periods on a per-table basis and thus can keep some tables for longer than others as required, as seen in the screenshot below.
FIGURE 2-3 Configuring Data Retention settings in Log Analytics
Azure Data Explorer (ADX) and Blob Storage are other options for the long-term storage of logs.