SpiceDB is the open-source Zanzibar implementation that drives Authzed, and it’s important that it provide all the guarantees that are described by the Zanzibar paper - one such property (oft overlooked by other descriptions or implementations of Zanzibar) is protection against the New Enemy problem.
“New Enemy” is the name Google gave to two undesirable properties we wish to prevent in a distributed authorization system:
In other words, the “New Enemy” occurs whenever you can issue an ACL check that returns results different from what you reasonably expected based on the order of your previous actions. Don’t worry if this feels a little too nebulous - we’ll look at some concrete examples later (or you can take a detour now and read our in-depth introduction to the New Enemy problem).
The system described by the Zanzibar paper relies on TrueTime and Spanner’s snapshot reads to prevent the New Enemy problem.
TrueTime is Google’s internal time API described in the Spanner paper. Instead of returning a single value for “now” when requesting the current time, it returns an interval [earliest, latest]
such that “now” must lie within the interval. TrueTime can provide this guarantee (as well as return a usefully small interval) thanks to use of GPS and atomic clocks.
TrueTime is used in Spanner to provide external consistency:
if T2 starts to commit after T1 finishes committing, then the timestamp for T2 is greater than the timestamp for T1
This is true for any two transactions, hitting any node, anywhere in the world.
If this is the first description you’ve seen of Spanner and TrueTime, this property might seem magical and potentially CAP-theorem-breaking.
But if we have an API like TrueTime available, we can write down exactly how to create external consistency:
[earliest, latest]
When all timestamps are selected this way, they become totally ordered according to “real” time. But this method will force writes to wait, so it’s important that the interval be as small as possible - this is where atomic clocks and GPS are crucial (Google reports typical intervals of less than 10ms).
Spanner also supports snapshot reads at specific timestamps, which allows Zookies to be issued to ensure ACL checks are evaluated at a specific point in time. Together, external consistency and snapshot reads prevent the new enemy problem:
SpiceDB today supports several backends (and will likely support more in the future): In-Memory, Postgres, and CockroachDB.
Note that none of these backends are Spanner, and none have a TrueTime-like API available. Nor is such a thing readily available: various cloud providers tout “TrueTime-like” time precision for their NTP sources, but none provide the uncertainty interval and guarantees described by TrueTime as required for this application. So what do we do?
For the In-Memory Driver, all communication happens through the same process, so can easily ensure total ordering. The memory backend is for testing and demo purposes only.
With the Postgres Driver, external consistency is not a concern - we get it for free from postgres. But postgres doesn’t directly support snapshot reads - we layer them in via an MVCC-like transactions table, which is worthy of its own separate post.
Postgres, while excellent, does not scale horizontally and can’t provide the global distribution and availability we require from a permissions service.
For this, we turn to the CockroachDB Driver. CockroachDB often draws comparisons to Spanner, but it is different in some important ways.
CockroachDB doesn’t have a TrueTime-like time source, but still provides many of the same properties that Spanner does (including snapshot reads).
Cockroach has written extensively about their consistency model and how they get by without atomic clocks. There’s a lot of great information in these Cockroach resources (and they’re worth a read), but to quickly summarize the parts that impact the New Enemy problem:
max_offset
for the cluster - a clock skew beyond which a node is removed from the cluster for being too far off.This means that SpiceDB with CockroachDB could display the New Enemy problem if mitigations are not put into place. To see how this could happen, we need to take a closer look at how CockroachDB stores data and synchronizes clocks.
Without any mitigations in place, is there a series of writes to a CockroachDB-backed SpiceDB that could result in a “new enemy”?
As stated above, we’re not guaranteed external consistency for non-overlapping keys. It helps to understand how Cockroach translates relational data into its underlying KV store, but for the purposes of this discussion it is enough to know that for the transactions to definitively overlap, they must touch the same row in the same (relational) table.
Relationships in SpiceDB are stored in one table, and are roughly:
<resource type> <resource id> <relation> <subject type> <subject id> (optional <relation>)
Let’s cut to the chase and look at an example of a schema and some relationships that could result in the New Enemy Problem:
definition user {}
definition resource {
relation direct: user
relation excluded: user
permission allowed = direct - excluded
}
This defines two resource types: user
and resource
.
A user
is allowed
to access a resource
if:
resource
has the relation direct
to the user
resource
does not have the relation excluded
to the resource
.This is not an unusual schema: perhaps a document is viewable by all employees except external contractors.
To display the New Enemy problem, we’ll perform two writes and a check:
resource:thegoods#direct@user:me
resource:thegoods#excluded@user:me
resource:thegoods#allowed@user:me
at snapshot T2Each of these calls happens in sequence (not parallel). If everything goes as intended, the Check will fail, because the snapshot T2 will contain both relationships (since T1 < T2).
However, the rows written to in T1 and T2 do not overlap. This means that cockroach could assign timestamps such that T2 < T1. In this case, the check in 3 will succeed - despite observing the writes in the proper order externally. This is exactly the “new enemy” problem: access was granted to a resource that should have been protected.
This doesn’t mean we can’t use CockroachDB, it just means that we need to find a way to prevent this particular scenario from happening.
Before we look at solutions to the problem, it would be nice if we had a test that could demonstrate this in the real world, so that we can demonstrate the efficacy of our mitigation.
We know that CockroachDB can assign transactions inconsistent timestamps for non-overlapping keys, but how and when does it actually happen?
We started by running the three steps above (Write, Write, Check) against a CockroachDB cluster in a loop. If this is an event that can occur, we should be able to see it with enough tries, right?
But after hundreds of thousands of tries, we couldn’t reproduce the issue.
There are a few reasons why:
We started by running against a staging cluster in CockroachCloud, but CockroachCloud provides nodes that are very highly synchronized.
Here’s a snapshot of our Clock Offset:
The scale here is microseconds, and the max_offset
for the cluster is 500 milliseconds. With these numbers, it would be incredibly difficult to see a timestamp reversal.
Cockroach translates relational data into key-value range writes. Each range is a raft cluster with a leader coordinating the writes. A transaction’s closed timestamps ensure that any writes into the same range will have properly ordered timestamps.
Ranges are split and merged as needed by Cockroach, and you can’t tell Cockroach which range a particular write should hit. Even with heavy write traffic, we’d need to get lucky and have the two specific writes land in separate ranges.
Any write into a range propagates changes to the timestamp cache to nodes holding followers for the range.
This means that not only do our two writes need to land in different ranges, they need to land in ranges with leaders on nodes that don’t contain followers (in the raft sense) of the other range.
By default, ranges for relational data in CockroachDB have a replica count of 3. This means that in a Cockroach cluster with 3 or fewer nodes, the timestamp caches will be synchronized after every write, and it won’t be possible to see out of order timestamps.
Luckily, Cockroach has parameters that we can tune that will help us get writes with “reversed” timestamps.
This is the SQL required to configure Cockroach with these (bad!) settings:
ALTER DATABASE spicedb CONFIGURE ZONE USING range_min_bytes = 0, range_max_bytes = 65536, num_replicas = 1;
But this was still not enough to see the problem in practice.
Time sources in GKE are still generally good enough that even a poorly configured Cockroach cluster couldn’t demonstrate the problem.
We experimented with several methods to lie about time to a cockroach node:
CLOCK_REALTIME
and therefore don’t help in this instance.Ultimately, we turned to chaos-mesh, a CNCF sandbox project that happens to provide support for faking vDSO time calls. The subproject watchmaker allows us to quickly adjust the time in a running cockroach instance.
Another advantage of this approach is that because we’re not relying on separate machines for time skew, we can run multiple Cockroach nodes on the same machine and still produce the problem.
Note: watchmaker does not yet support vDSO swapping on arm, and ptrace
currently is not supported in Docker’s qemu x86 emulation. This means there’s not a good way to run watchmaker against cockroachdb on an m1 mac.
Cockroach nodes, even without transactions forcing the timestamp caches into sync, are always heartbeating. Even with time delay, it could sometimes take a long time to see the issue in practice.
This was simpler to achieve than skewing time; though we again turn to chaos-mesh for chaosd to easily adjust delay at runtime.
With these tools in our toolkit, the test is coming together:
Then, in a loop:
resource:thegoods#direct@user:me
resource:thegoods#excluded@user:me
resource:thegoods#allowed@user:me
at snapshot T2Most of the time, this setup can get T2 < T1 within the first few tries, and the Check (that should fail) succeeds instead.
The relevant properties of CockroachDB suggest two approaches to preventing the New Enemy problem:
max_offset
after a write before returning to a client (and preventing further writes during that time).(1) is similar to the approach spanner takes, but without highly accurate time sources, the time we have to wait will unacceptably slow down writes. For SpiceDB, we started exploring option (2).
From the CockroachDB docs, it appears that all we would need to do to force the transactions to overlap is read a common key.
However, through testing (now that we have a test!), we determined that we needed both transactions to write to the same key in order for Cockroach to assign the timestamps properly.
In SpiceDB, you can pick two strategies for writing these keys. The simplest (and the default) static
strategy is that all relationship writes touch the same row. This does add a small amount of latency to the write path, but it leaves the read paths performant, which matches the expected usage of a permission system.
The second strategy is prefix
. If you are careful with how you design and interact with permissions in SpiceDB, you can choose to have multiple overlap domains by prefixing the definitions in your schema.
For example:
definition sys1/user {}
definition sys1/resource {
relation direct: sys1/user
relation excluded: sys1/user
permission allowed = direct - excluded
}
definition sys2/user {}
Writes to sys1
-prefixed definitions will overlap with other writes to sys1
-prefixed definitions, but not with sys2
-prefixed definitions.
This allows advanced users to optimize overlap strategy, but we don’t think it’s something you’ll need to worry about in general. The default is safe and fast.
The tricky thing about our test for the New Enemy problem is that there are still enough variables outside our control that we can’t guarantee it will happen in one try.
Usually it will happen within a few tries, but sometimes it may take dozens or even hundreds of tries.
How then can we be confident that the problem has been fixed?
We take a few tricks from statistics to increase our confidence:
In the best case scenarios, the iterations we pick in (2) are very small (because we successfully showed the problem very quickly in (1)). But occasionally the number picked in (2) is very high, and the test runs for a bit longer to increase confidence.
To recap, we have now:
You can check out the result of this work in the SpiceDB tests.
SpiceDB is always improving, and we follow the work that CockroachDB does closely. We’re excited to try out new approaches to this problem as CockroachDB evolves, and now we have a test framework to quickly try new solutions.
If you have an idea for how we can improve this, don’t be shy! Let us know what you think.
if clocks get very quickly out of sync, this may not hold ↩︎