Service Fabric Common-Name Certificate: Cluster Upgrade & Rollover

Service Fabric Common-Name Certificate: Cluster Upgrade & Rollover

Updating a Service Fabric cluster certificate using thumbprints involves multiple time-consuming cluster upgrades.  This can be avoided by upgrading the cluster to use common-name instead of thumbprints to reference the certificates.

Unlike thumbprints, common-name remains the same between certificate versions, therefore updating to a newer version is as simple as registering the certificate with the KeyVault & VMSS.

Because the cluster configuration does not need to change, there are no cluster upgrades required when updating certificates.  When the new, later expiring certificate has been registered on the VMSS, Service Fabric automatically uses the later expiring version.

Upgrading to Common-Name

As described in the documentation, the secondary thumbprint should be removed before configuring the cluster & VMSS for common-name:

If you have two thumbprint's declared in your template, you need to perform two deployments. The first deployment is done before following the steps in this article. The first deployment sets your thumbprint property in the template to the certificate being used and removes the thumbprintSecondary property. For the second deployment, follow the steps in this article.

The following ARM templates highlight the configuration changes that need to be deployed for the common-name upgrade.  Ensure a certificate with the common-name used in your template is either already installed in the VMSS, or is installed as part of this upgrade.

Cluster:

The following config should be removed from cluster > properties.

"certificate": {
  "thumbprint": "[parameters('certificateThumbprint')]",
  "x509StoreName": "[parameters('certificateStoreValue')]"
},

And replaced with:

"certificateCommonNames": {
  "commonNames": [
    {
      "certificateCommonName": "[parameters('certificateCommonName')]",
      "certificateIssuerThumbprint": ""
    }
  ],
  "x509StoreName": "[parameters('certificateStoreValue')]"
},

VMSS:

Replace the the certificate > thumbprint property with commonNames in virtualMachineProfile > extensionProfile > extensions > [type='ServiceFabricNode']. The result should appear as below:

"certificate": {
  "commonNames": [
    "[parameters('certificateCommonName')]"
  ],
  "x509StoreName": "[parameters('certificateStoreValue')]"
}

Once the above template modifications have been deployed, the cluster is configured to find certificates by common name. We can now easily automate the certificate update process.

Certificate update can be performed by adding the new certificate to the VMSS vault using your favourite Azure API.  The following PowerShell can be used to add the new certificate to an existing KeyVault, and add the certificate to the VMSS vault:

$subscriptionId  = "sub-id"
$vmssResourceGroupName     = "vmss-rg-name"
$vmssName                  = "vmss-name"
$vaultName                 = "kv-name"
$primaryCertName           = "kv-cert-name"
$certFilePath              = "...\.pfx"
$certPassword              = ConvertTo-SecureString -String "password" -AsPlainText -Force

# Sign in to your Azure account and select your subscription
Login-AzAccount -SubscriptionId $subscriptionId

# Update primary certificate within the Key Vault
$primary = Import-AzKeyVaultCertificate `
    -VaultName $vaultName `
    -Name $primaryCertName `
    -FilePath $certFilePath `
    -Password $certPassword

$certConfig = New-AzVmssVaultCertificateConfig -CertificateUrl $primary.SecretId -CertificateStore "My"

# Get VM scale set 
$vmss = Get-AzVmss -ResourceGroupName $vmssResourceGroupName -VMScaleSetName $vmssName

# Add new certificate version
$vmss.VirtualMachineProfile.OsProfile.Secrets[0].VaultCertificates.Add($certConfig)

# Update the VM scale set 
Update-AzVmss -ResourceGroupName $vmssResourceGroupName -Verbose `
    -Name $vmssName -VirtualMachineScaleSet $vmss

Service Fabric will now use the newer, later expiring certificate.  That's it!

Update

After using this process for a few months, the cluster & application heath has been fine.  It was observed however that error events were being raised on the nodes by Service Fabric: Failed to get private key file. x509FindValue: {commonName}, x509StoreName: My, findType: FindBySubjectName, Error E_FAIL

The following SO post is from a user with the same issue:

ServiceFabric standalone: Failed to get private key file
I have a standalone ServiceFabric cluster (3 nodes). I created SSL certificate for server and client authorization. Then I assign certificate thumbprint to a cluster config. Everything work okey( c...

Keep updated for a less manual mitigation for this error.

Show Comments