Patching Windows VMs with GCP’s VM Manager

Windows Patch

Introduction

Whilst I am a huge fan of short lived, immutable VMs with system state turned over to managed services like Cloud SQL, sometimes this simply isn’t practical or possible.

In these situations we are often left with long running, stateful instances that require the same sort of maintenance as ‘traditional’ infrastructure. But how do we manage these without the pain and ancillary infrastructure this often requires.

Recently I have been working with a client that has a stateful .NET application that runs on top of Windows Server 2016 GCE Instances. The client tasked me with determining a patching strategy for these VMs to ensure stability and more importantly security.

Now I have plenty of experience patching Windows servers from previous jobs and have seen it done both well and badly. Some particular bad examples that stick out to me are:

Over the course of my career I have had the pleasure (or misfortune) to have worked with both WSUS and SCCM. These tools however can be complex and temperamental and in the case of SCCM require deep pockets.

Generally I have found the best patching strategies to follow these ideas:

Anyway back to the my client’s request. In April Google launched their OS patch management service to the masses. This is a tool I was curious about but hadn’t the requirement to implement it at the time. This though I felt was the right opportunity to test it.

Prerequisites and Setup

Initially I wanted to see what the tool could show me with the OS Inventory Management functionality before committing to patching, as I did this in Terraform here is my code:

resource "google_compute_project_metadata_item" "guest_attributes" {  key   = "enable-guest-attributes"  
value = "TRUE"
}
resource "google_compute_project_metadata_item" "osconfig" {
key = "enable-osconfig"
value = "TRUE"
}
resource "google_project_service" "osconfig" {
service = "osconfig.googleapis.com"
disable_on_destroy = false
}

The above is really the result of the setup requirements found in the Google documentation but in summary requires a couple of metadata values to be set and API to be enabled.

The next step is to ensure that the OS you want to monitor has the required agent, fortunately as the instances used a recent Google baked 2016 image I could skip this step. If however you aren't so fortunate installation instructions are here.

At this point I also want to state that Automatic Windows Updates had been disabled previously by setting a registry key in the startup script using the Powershell below.

Set-ItemProperty -Path HKLM:\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU -Name AUOptions -Value 1

Compliance Graphs

Like most ops guys I do like a pretty graph (and could probably stare at Grafana dashboards for hours!) and in this regard Google doesn’t disappoint with clear reporting on a pie chart, even broken down by OS.

For the eagle eyed among you the three VMs reporting back no data are actually GKE Nodes running Google’s Container OS that they are responsible for maintaining.

By selecting view details then selecting a specific VM I can get a breakdown of available patches including their categories, KB numbers and when they were published.

Turning Insight into Action

So satisfied with the insight I now had into the VMs I was keen to try out the functionality to apply patches. This is done by creating a Google OS Patch Deployment, this can be done in the Console or with Terraform.

resource "google_os_config_patch_deployment" "win-patch" {
patch_deployment_id = "win-patch"
instance_filter {
group_labels {
labels = {
win-patch = "true"
}
}
zones = ["europe-west2-a", "europe-west2-b", "europe-west2-c"]
}
patch_config {
reboot_config = "DEFAULT"
windows_update {
classifications = ["CRITICAL", "SECURITY", "DEFINITION"]
}
}
duration = "3600s"recurring_schedule {
time_zone {
id = "Europe/London"
}
time_of_day {
hours = 2
}
weekly {
day_of_week = var.win_patch_day
}
}
rollout {
mode = "ZONE_BY_ZONE"
disruption_budget {
fixed = 5
}
}
}

The Terraform Documentation for this resource is quite an interesting read. There are for example options available for pre and post patching scripts which may be very useful in some environments where automated testing exists.

To break down the code above it:

As another quick a note this above resource is new and was only added in provider version 3.30.0. I was required to update to a newer provider as we were a couple of versions behind.

Reviewing Patch Jobs

When a patch job is executed its progress can be watched in real time or more likely (with patching done in the early hours) reviewed the following morning within the VM Management .

It is also possible to drill even further into the logs on specific machines to see how the job progressed. As you can see this job also required a reboot which the machine executed automatically.

Concluding Thoughts

Simply I am a fan, this solution has enabled me to deploy Windows Patches in a automated, reasonably (though not completely) controlled way with minimal to no cost. However I did have to make some compromises and assumptions:

Finally I hope this post has given some food for thought and at least presented another method to help maintain long lived GCP VMs.

I am a GCP Platform Engineer based in the UK. Thoughts here are my own and don’t necessarily represent my employer.