How to delete a failed PKS Cluster using the BOSH CLI
UPDATE: I received some feedback on this blogpost from some of my ITQ colleagues. I should point out this is a very quick and dirty approach suitable for my homelab environment because it not only deletes the failed PKS cluster, but also the PKS deployment itself. This is easily fixed by Bosh by applying changes in Operations Manager … but this takes some time. For my homelab playground this is not a problem but you might want to dig a bit deeper if you just want to remove the failed cluster.
When I tried to delete a failed PKS cluster in my homelab, I was presented with an error message, and there was no way to delete the cluster with the command
pks delete-cluster [CLUSTERNAME]
Following some of the Pivotal troubleshooting instructions, I was able to use the BOSH CLI to eventually delete the cluster by removing the BOSH Deployment itself. The Pivotal docs are pretty good but I got sent through several articles and steps. I’m still very new to BOSH, PKS and other Pivotal technologie on a technical detail level, so I tried to gather all the bits and pieces in one document. If there a smarter, better or faster ways of doing things … please let me know!
What is BOSH CLI?
According to the official bosh.io website, “BOSH is a project that unifies release engineering, deployment, and lifecycle management of small and large-scale cloud software“. In my own words, BOSH is the magic sauce that takes care of the platform automation. PKS relies on a BOSH Director VM that takes care of everything that needs to be done on the platform layer. It provisions and configures all the VMs needed to run a PKS environment. Really cool stuff! The BOSH CLI is a command line interface that lets you directly interact with the BOSH Director and manage your BOSH Deployments.
If you want to know more about BOSH, I highly recommend watching some of the YouTube videos my colleague Christiaan made!
What is a BOSH Deployment?
“A deployment is a collection of VMs, built from a stemcell, that has been populated with specific releases and disks that keep persistent data. These resources are created in the IaaS based on a deployment manifest and managed by the Director, a centralized management server.” So, my failed PKS cluster is a BOSH Deployment that I can manage directly with the BOSH CLI.
Using the BOSH CLI
The instructions for installing BOSH CLI are described here in full detail, but it basically comes down to grabbing the binaries from GitHub, doing a a chmod +x on the dowloaded file and moving it to /usr/local/bin/bosh. Type bosh -v to validate the correct installation of the BOSH CLI.
Alternatively, you can also SSH into the OpsManager VM using ssh ubuntu@[OPSManagerIP] and and the admin password you used during installation. The OpsManager VM already has the BOSH CLI installed.
PS. Pivotal Ops Manager is the management VM used to administer a Pivotal environment.
Setting up BOSH CLI with PKS
First we need to create an Environment Alias:
bosh alias-env [ENVIRONMENT[ \
-e [BOSH-DIRECTOR-IP] \
I went with ‘pks’ as the [ENVIRONMENT] and my homelab BOSH Director has IP 172.16.11.83 so the entire command is:
bosh alias-env pks -e 172.16.11.83 --ca-cert /var/tempest/workspaces/default/root_ca_certificate
Now we need to login:
bosh -e [ENVIRONMENT] login
Next up is grabbing the login credentials. The easiest way is to navigate to
Managing BOSH Deployments using the BOSH CLI
Now we can get a list of all the BOSH deployments:
bosh -e pks deployments
Finally, we can delete the failed PKS Cluster (which is a BOSH Deployment) that we couldn’t delete using the pks delete-cluster command using the command
bosh -e MY-ENVIRONMENT delete-deployment -d DEPLOYMENT-NAME
Wow, this was a classical case of ‘yak shaving‘ . I wanted to play around with PKS, creating Kubernetes clusters, maybe deploy a test workload and so on. Instead I fell into the world of BOSH. I learned a ton though, and I guess this is the whole reason of playing around with new technology in the safety of a homelab environment. Learn, break stuff, struggle, learn, fix stuff, break stuff again, pull out your hair, learn, fix stuff …
Again, I’m really new to BOSH and PKS so if there is a smarter way of doing things, please let me know! Now to find out why the initial deployment of my PKS cluster failed in the first place…