Published on Invalid Date/Last edited on Invalid Date/12 min read
Back in 2016, Braze—then known as Appboy—began experimenting with using the open-source Kubernetes (K8s) platform as part of an effort to more effectively handle a range of microservices that the company was beginning to spin up. Now, eight years later, K8s has become a core part of the Braze platform's software infrastructure, supporting our customers' ability to understand, reach, engage, and build relationships at a massive scale.
However, while the scalability and flexibility of K8s is a big asset, our DevOps organization remains focused on finding ways to iterate and reimagine how we're approaching its use, with the goal of supporting better performance and outcomes over time. One key effort in that area revolves around our use of VMware's Velero tool to more effectively support stateful backups of K8s resources. To learn more about this effort, we sat down with Joseph Heyburn, Senior DevOps Engineer II, Braze, to explore the ins-and-outs of the team's approach and the improvements it drove in terms of cost, speed, and reliability.
At Braze, we use Kubernetes in connection with a wide range of different processes and systems, touching everything from data ingestion to our sender services. While Kubernetes has many benefits, running stateful services on Kubernetes can be complex, requiring our team to find solutions that might have been solved natively by some of the platforms and technologies we used before moving to K8s. In particular, we've been looking to find a better approach to backing up Kubernetes resources in an automated, ongoing, and reliable way.
At Braze, one of the technologies we leverage to support our customer engagement platform is Redis. This in-memory database allows many different data types to be saved. We shard Redis based on the function it performs (such as caching, or a backend to the Sidekiq background job framework); this is known internally as Redis Types. These Types are then sharded out once again to allow us to horizontally scale said Types, which we call Redis shards.
For example, Redis shards are labeled as redis-shard-N, where N is the shard index number. Each shard is managed by a StatefulSet, which creates three pods; one will become the primary Redis instance, and the remaining two will become replicas, providing redundancy should the primary pod fail. Redis Sentinel manages this highly available setup, health checking against the primary and coordinating failovers.
[REDIS IMAGE]
To help people back up data on Redis, there is a SAVE configuration setting that dumps memory content into a file following the Redis DataBase (RDB) file format. That means that when Redis first boots up, it will look at the defined dbfilename location for an existing RDB file—and, if one exists, it’ll load that into memory before accepting any connections.
The SAVE configuration takes in pairs of Redis arguments in the following format: [Seconds] [Changes]. For instance, SAVE 60 10 would write to disk every 60 seconds, as long as there had been at least 10 changes to the database since the previous save. It also allows you to add multiple pairs to a given configuration in order to define additional relevant criteria—so, for example, you might use SAVE 120 5 60 10 in order to tell Redis to save every 120 seconds as long as there have been at least five changes since the previous save AND every 60 seconds as long as there have at least 10 changes since last save.
Redis also allows you to execute on-demand dumps via its SAVE and BGSAVE commands. As a rule, the SAVE command will block all commands while it performs a given save, while BGSAVE is a “background save” that doesn’t block other commands, since the act of writing to disk is handled by a separate thread; that means that BGSAVE is generally the preferred command when performing an on-demand Redis backup. Then the saved RDB file can be moved to another location for backup as needed, or can be migrated to another platform.
Here’s how that plays out at Braze: For our legacy Amazon Web Services (AWS) EC2 clusters that existed in a pre-K8s world, we had a backup job that runs twice daily, performing an on-demand BGSAVE within Redis. When that process is completed, it uploads the resulting RDB file to Amazon S3 so that we can restore it in the event that it becomes necessary.
That’s been a successful process for us; the issue was that we needed an approach that could work for our Kubernetes clusters, too. For that situation, we needed to have the RDB files written to a Persistent Volume (PV), which is a storage volume associated with the given cloud platform that resource is in—e.g. Elastic Block Storage (EBS) volumes for an AWS K8s cluster. But rather than creating a bespoke solution that could only apply to Redis (such as a sidecar container that copies the RDB file to object storage), we were looking for an approach that we could apply successfully to any stateful workload running on K8s.
To make that happen, my team made the decision to take advantage of Velero, an open-source tool designed to backup K8s resources as defined by their YAML manifests.
Velero, which was written by VMware, allows you to backup, store, and migrate resources associated with K8s clusters and PVs. At Braze, we took advantage of Velero's ability to backup a variety of Kubernetes resources—including pods (the smallest unit of a K8s application) and Custom Resource Definitions (objects that make it possible to alternately extend the Kubernetes API or insert your own API into a given project/cluster).
This tool can also back up PVs; however, backing up their YAML manifest definitions won't allow you to successfully restore their data; to make that happen, you'd also need to integrate with the PV's backend. Velero addresses this need by including plugins for a range of different cloud providers, leveraging these backends to manage the integration natively.
Here’s what that can look like: If a Velero backup was issued against a PV that was backed by an AWS EBS Volume, Velero would invoke an EBS Snapshot to support the recovery of relevant data. Every time Velero creates a backup in this scenario, the tool tracks the snapshot ID that’s included in the backup metadata. That means that when you perform a restore of that backup, the Volume can be recreated from the Snapshot and then automatically attached to that PV.
Velero follows the Kubernetes Operator pattern. That means it’s managed in a K8s cluster as a Deployment that watches for backups to be requested. To execute an on-demand backup, you need to install the Velero command line interface (CLI) tool, allowing you to interface against the server component that’s running in the relevant cluster.
When executing an on-demand backup, you can define a selector in order to filter the resources that need to be included. In addition, you can tell Velero what types of resources it should include via the flag --include-resources and can use --ttl to define how long that backup should be kept before it’s deleted.
One of the key benefits of using Velero for this sort of work is its ability to automate key parts of the process. At Braze, we’re using those capabilities to automatically create backups on a predetermined schedule via Schedule API, which the server watches (and manages) for us. If you’re deploying via Helm, this schedule might look like the following:
You can define where the backup metadata should be saved using your Velero server config. That serves as a source of truth for your backups, while also allowing the Velero server deployment to be stateless, supporting increased scalability. Braze operates across multiple clouds and systems, so how this functions depends a bit on which cloud is involved: For AWS K8s clusters, we save this metadata to Amazon S3, and for Microsoft Azure K8s clusters, we save them to Azure Storage Accounts. That said, you aren’t limited to these providers, as there are plenty of others available.
In order for a volume snapshot to be carried out for any stateful workload running on K8s, we knew that we needed to target both the PV and the Persistent Volume Claim (PVC) resources while also ensuring that they were done against the latest RDB file. So, in order to use Velero to perform a BGSAVE for us in this situation, we wanted to define a backup pre-hook script against the K8s pod that was being backed up and to do it prior to the snapshot.
There are a couple of different ways to achieve this. Before we started using Velero, hooks within K8s were defined as annotations on the specific pods that were in scope for backup.
That’s still supported, but we recently made the decision to migrate to a different method where we define the hooks when the backup is being created by a schedule:
Once the backups are created by that method, we can view them by taking advantage of the Velero CLI, like so:
In scenarios where we need more detailed backup information, we can leverage the describe subcommand to retrieve that additional context:
Then, in order to carry out a restoration, we can leverage the Velero CLI:
In these circumstances, we intentionally restore only the PV and PVC. Just like how Velero will automatically manage volume snapshots for us, the tool will manage the volume restore from a snapshot in this case. We don’t include the K8s pod because that’s managed by the Redis StatefulSet for that particular shard. In other words, we restore the PV and PVC first, then redeploy the Redis Helm chart that creates the StatefulSet for us, and then the relevant pods are created.
We can also include another selector Argument if needed, allowing us to filter what resources should be restored. So while our backup schedule will automatically backup all Redis instance types that are in scope (along with their labels), we can choose to filter on those instance labels for scenarios where we only want to restore a subset of those instances.
For Redis, Velero allows us to restore either an entire Redis instance via --selector release=redis-shard or a specific shard via --selector app=redis-shard-0. Redis shards are managed by StatefulSets; there are three replicas that are named as followed:
Replica 1: redis-shard-0-server-0Replica 2: redis-shard-0-server-1Replica 3: redis-shard-0-server-2As mentioned before, Redis is built to load a RDB file on application startup if one already exists. But if we needed to do a full recovery, we wouldn’t have any K8s pods running. Assuming that we restore from a backup, the PVs and PVCs will exist for the three different replicas, making it safe for us to bring up the Redis pods via the StatefulSet.
When you create a StatefulSet, the initial ordinal pod will come up first (redis-shard-0-server-0), identify that there is a RDB file, and then begin to load that into memory. Since that’s the first pod coming up for the shard, it will become the primary one for that shard. The next original pod (redis-shard-0-server-1) will perform the same action on start—that is, seeing a RDB file and loading it into memory. However, because that isn’t the first pod to come up for this shard, it will be configured to replicate from the primary instance (that is, redis-shard-0-server-0). Because of this, a full resync is performed, clearing the contents of the memory database and requesting it from the primary instance.
The upshot? In essence, the process wasted the time it spent loading the RDB file into memory when the application first booted up, drawing on computing power and increasing costs without any corresponding benefit. In addition, when our system has large RDB files that need to be loaded into memory, our mean time to recovery (MTTR) goes up, hurting our ability to swiftly recover from a given incident or issue.
Given that, we now only perform backups against the first ordinal pods in a StatefulSet, mitigating against an increase in MTTR. That’s effective because the initial ordinal pod is the first pod to get created when we restore from nothing and, accordingly, the only one that matters to our recovery efforts; after all, the remaining pods in the StatefulSet will replicate against this first pod as their means of restoration.
So, how do we ensure that backups are only performed against that first Pod? Well, for Redis types that are configured for Velero backups, we have an additional initContainer which tags the first ordinal pod with the label we defined in our backup schedule (velero-backup-schedule=redis). That way, it will get included in the backup while the other two pods are excluded:
Because we use Kubernetes in connection with so many different processes and systems related to the Braze platform, using Velero to more support stateful backups of K8s resources in an automated, ongoing, and reliable way has had significant positive impacts on our systems and our business. Since this approach automates the backup process while avoiding the creation of unnecessary volume snapshots, we've been able to reduce the financial costs associated with this process. At the same time, our use of Velero in connection with K8s has also allowed our system to achieve a faster recovery time when issues do occur, supporting increased resilience and reducing the impact of these situations on our infrastructure and our customers. That's the definition of a win-win situation!
As Braze has taken increased advantage of Kubernetes to support aspects of our customer engagement platform and related infrastructure, we’ve seen new possibilities open up for our systems. At the same time, we’ve found that we sometimes need to identify additional (or different) solutions for challenges that previously had been solved natively. Velero has been a great help when it comes to natively backing up Persistent Volumes in K8s, allowing us to reduce MTTR, cut costs, and support a more reliable and scalable recovery process.
Interested in working on these sorts of ambitious projects? Braze DevOps is hiring. Check out our careers page to learn more about our open roles and our culture.
Sign up for regular updates from Braze.