How to build a Distributed Cache
A brief and intuitive overview of a foundational internet component.
Background
In a typical client-server relationship, the client makes requests through a web service and will often access resources stored in a database.
This is simple to understand, but it comes with some performance issues. The amount of data stored in modern systems is HUGE, and thus the data stores are not optimized for super-low latency, but rather for ACID compliance, scalability, availability, low price, and all those other awesome requirements. These properties are often at the expense of latency — and your application may not be happy about that.
If only there was a way to get the best of both worlds….
What is a Cache?
Don’t repeat yourself (DRY). And don’t recompute if you don’t need to, either.
Caches are one of the many artifacts of this tenant — they are what you use to save the result of some expensive operation for future re-use.
They act as an intermediary between the ‘requester’ and the ‘provider’. In this simple example, the provider is a data store — but the provider of the result could be anything. Let's say the service was asking a GPU cluster for the result of a super complicated operation — “ChatGPT, what is the meaning of life?”. If this question was asked multiple times — would you want to recompute the result each time? Of course not.
Functional Requirements
Our basic cache has a simple interface.
get(key) -> valueset(key, value)
A cache will always be a subset of the total amount of data your application manages. Why? Well, your cache will typically be more performant than other storage mediums — and it will also be more expensive, and unless you have infinitely deep pockets — smaller in terms of capacity.
What does this mean for you? You need an eviction policy. You can’t just let your cache grow indefinitely — you need to intelligently trim the size of your cache.
Some Low-Level Details First
When you think of a cache — you probably think of a hash map of some kind (if not, that’s okay).
A hash map provides constant-time access and deletion — perfect!
What it doesn’t provide a mechanism for is a sensible eviction policy.
Enter the Least Recently Used (LRU) cache.
Now, our hash map values will point to items in a doubly-linked list of a fixed capacity. When we exceed our capacity, we simply need to remove the last element in the doubly linked list and remove the hash-map entry associated with this element. Easy (and constant time).
When we add or access a new element to our hash map, we just need to remove this element from the doubly linked list (if it already exists) and add it to the front of the list. Constant time again!
For this simple example, we are using the LRU eviction policy — but in reality, there are many different policies, each with its own advantages and shortcomings.
From Monolithic to Distributed
Running the LRU cache we specified on a single machine is simple enough — but a distributed cache allows us to scale horizontally effortlessly. Let's look at how we might accomplish that.
This caching layer allows our caching layer to scale with the demands of whatever client is using it.
It also allows us to choose hardware that is more optimized for an in-memory cache (maybe our service has a totally different computational profile).
Given a cache item key, the client(s) just need to convert this key into a cache host. As long as all clients have a deterministic way to do this, then the individual cache hosts will maintain a high cache hit rate and require minimal requests to the persistent store, which is what we are trying to accomplish.
There is a problem though, many simple hashing functions (like modulo) have a terrible flaw — if you lose or gain a host in your distributed cache fleet, the entire mapping of keys to hosts changes! This is undesirable since it will load your persistent store as if you were in a “cold start”.
Choosing a Cache Host — consistent hashing
The key is to use a consistent hashing function, which minimizes the amount of rehashing that the cache cluster will need to perform, even as it scales in and out horizontally, or hosts die.
Naturally, there are downsides to most consistent hashing algorithms, but that is for another time.
Separation of Concerns — the Cache Client
The selection of a cache server by the client is performed by a very lightweight process called the Cache Client.
- This client knows about every cache server.
- The service attempts to connect to a cache server, and if the server is unavailable, it proceeds as though it was a cache miss.
How would each cache client maintain an up-to-date list of cache server hosts? Since this layer will be scaling horizontally in and out, this list is dynamic and will constantly be changing.
Managing a configuration file manually, either via a local file that is updated or a shared storage medium, is tiresome and bug-prone.
Enter Zookeeper.
Luckily, configuration services are a class of tools that are designed specifically for this application. The configuration service’s job is to maintain an awareness of which cache servers are available at a specific point in time, and our main service simply needs to query Zookeeper for this up-to-date list.
There are also other ways to accomplish “service discovery” — like gossip-based applications.
The Hot Shard Problem — achieving high availability
There is a problem with the current design — what if a particular shard within our distributed cache sees the majority of the traffic? Let's say we are building a service for Youtube where we are caching Youtube videos. What would happen when a single viral video gets a billion views?
The consistent hashing function would evaluate every single request for that video, and the request would ultimately hit the same shard in our caching cluster. Not good!
To deal with this, let's add another layer of horizontal scaling — this time to each individual cache shard.
In this basic setup, each cache shard has been replaced with a Leader-Follower model, where a single Leader exists for ‘put’ operations, as before, but there is a scaleable number of read-replicas to handle the ‘get’ volume.
Future Considerations
In this example, we focused on the core functionality of a distributed cache.
However, there are some topics that we did not explore:
- Managing consistency in distributed systems
- Caching strategies
- Cache data expiration (not eviction!!)
- Security
- Monitoring and logging
- Read / write lock contention
Some Popular Distributed Caches
You probably won’t want to design a distributed cache from scratch — but now that you have the general gist of how they work, take a look at some popular options:
- Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker
- Memcache — Free & open source, high-performance, distributed memory object caching system.
- AWS Elasticache — AWS managed distributed cache.