SpiceDB is 100% open source. Please help us by starring our GitHub repo.

Consistent Hash Load Balancing for gRPC

/assets/team/evan-cordell.jpg
November 24, 2021|4 min read

When you send a query to SpiceDB, our distributed authorization database, it will often choose to dispatch to other nodes in the cluster that are more likely to have results for subqueries already cached. Dispatching and caching help keep the latencies for SpiceDB queries low.

Keeping Cache Usage High

Not all nodes cache all data. Instead, each node is responsible for caching a subset of query results for the cluster. This makes it critical that requests get dispatched to nodes that will have the data cached (if such a node exists), especially on Lookup requests which have lots of overlapping subproblems.

SpiceDB makes use of a consistent hash ring to map requests to nodes. There are several good resources online to learn about consistent hashing, but briefly:

  • A unique identifier (based on the grpc subconnection) for each SpiceDB node is hashed and mapped to many locations in the hash ring. The number of copies (virtual nodes) is controlled by the replicationFactor.
  • Queries are hashed and mapped to locations in the hash ring
  • SpiceDB dispatches to the next N "clockwise" nodes in the ring. This N is often called the spread.

When a SpiceDB node receives a dispatch request, it will first check its cache for the answer. If the result isn't already cached, it will find the answer by querying the datastore or by decomposing the query into smaller subproblems and dispatching again. The consistent hashring means SpiceDB can get good cache utilization without an external service coordinating where requests should be sent.

The consistent property of the consistent hash ring ensures that if a node is added or removed, the smallest possible number of keys gets redistributed to other nodes.

Discovering SpiceDB nodes

Each SpiceDB node builds its own consistent hash ring so that no consensus with other nodes is required.

In the past, SpiceDB used a bespoke discovery service to discover what other nodes were available for dispatch. Recently, we switched to kuberesolver, a resolver plug-in for grpc-go that watches Kubernetes endpoints to find available SpiceDB nodes. It's a simple matter to enable kuberesolver in a project:

import "github.com/sercand/kuberesolver/v3"

func main() {
    kuberesolver.RegisterInCluster()
    // ...
}

Once registered, resolver plugins are referenced by name in the scheme portion of the gRPC address. If you're running SpiceDB on Kubernetes, you can use the Kubernetes resolver via the dispatch flags:

--dispatch-upstream-addr=kubernetes:///spicedb.default:50053

which will find all instances behind the spicedb service in the default namespace.

Extending gRPC With a Consistent Hash Load Balancer

With kuberesolver, the grpc-go client can find all instances of SpiceDB for dispatch and transparently handle creating and destroying subconnections as needed.

But now, we need to tell the client which nodes to use for any given request. Luckily, grpc-go has a way for us to register our own implementation of a load balancer that is backed by a consistent hash ring.

Like kuberesolver, it's straightforward to start using the consistent hash load balancer:

import (
	"github.com/cespare/xxhash"
	"google.golang.org/grpc/balancer"

	consistentbalancer "github.com/authzed/spicedb/pkg/balancer"
)

func main() {
    balancer.Register(consistentbalancer.NewConsistentHashringBuilder(xxhash.Sum64, hashringReplicationFactor, backendsPerKey))

    // ..
}

The consistent hashring load balancer allows you to configure the hash used, the replication factor for the hashring, and the spread of keys across the nodes. Once the balancer is registered, it can be referenced in an option to the gRPC dialer:

conn, err := grpc.Dial("kubernetes:///spicedb.default:50053",
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"consistent-hashring"}`)
)

Any request sent via the gRPC connection can then set a value in the context which will be hashed into the ring for node selection.

It's worth noting that there is an existing internal ringhash load balancer implementation in grpc-go, but it can't be used without using xds.

Hopefully, we've shed a little light on the lightly-documented extension points of grpc-go. If this load balancer sounds like something you'd like to see published as a standalone library, why not let us know on Discord?

Additional Reading

If you’re interested in learning more about Authorization and Google Zanzibar, we recommend reading the following posts:

Get started for free

Join 1000s of companies doing authorization the right way.