Replacing Keycloak's Infinispan Caches with Redis/Valkey (Keycloak DevDay 2026)
At Keycloak DevDay 2026, we shared our work on replacing Keycloak's distributed Infinispan caches with Redis/Valkey.
For the full technical deep dive, we will release slides when the talk is published on Youtube.
This post focuses on the core technical content from the presentation and summarizes what we built, what we learned, and what comes next.
Why We Took This On
The recurring operator pain points were clear: restart complexity, JGroups/Infinispan operational friction, and upgrade risk in high-availability environments.
Our goals for the proof-of-concept were practical:
- Functional parity with Keycloak's distributed cache behavior.
- At least 50% of embedded Infinispan performance.
- Fast startup (sub-5s per pod) with reliable behavior.
- Seamless upgrades and horizontal scaling without rebalance pain.
- Extension-based implementation on current Keycloak, without forking.
We also set explicit non-goals, including replacing local Infinispan caches, integrating/migrating persistent sessions, supporting every Redis topology, and solving multi-region concerns in v1.
What We Implemented (High Level)
We implemented a new DatastoreProvider-based approach to replace distributed caches while leaving the rest of Keycloak behavior intact.
At a high level:
- Stored entities as Redis hashes (session and related objects).
- Added secondary indexes with Redis sets for fast lookup paths (for example, user-to-sessions, client-to-sessions).
- Built a changelog-based transaction implementation of
KeycloakTransactionto batch and minimize writes at the appropriate stage of the Keycloak request lifecycle. - Used Redis
MULTI/EXECfor batched commit behavior. - Implemented Lua-based CAS behavior for single-round-trip atomic updates.
- Replaced the use of the replicated work cache, used for local cache invalidation, with a Redis PUBSUB-backed
ClusterProvider. - Exposed Redis client and command metrics in Keycloak's Prometheus endpoint (
vendor_jedis_*).
For detailed APIs, object mapping rules, and transaction semantics, the slide deck has the full implementation flow and diagrams.
Notable Engineering Details
A few areas mattered most in practice:
- Atomicity model: Lua CAS operations reduced contention and retry complexity versus
WATCH-based optimistic workflows. - Expiration correctness: TTL handling required careful validation to avoid stale references (for example, refresh token edge cases).
- Behavior parity: Reproducing nuanced Infinispan behavior (especially around offline session flows and invalidation semantics) required close reading of existing Keycloak internals.
- Index discipline: Clean separation between hash storage and set-based secondary indexes kept lookups efficient and made invalidation predictable.
Benchmarks and Outcomes
In our benchmark setup, the Redis implementation met and exceeded the original performance target. The result was parity or better across the tested flow, while also meeting the startup-speed and extension/no-fork constraints. We used the Keycloak provided keycloak-benchmark project to execute performance tests against a 3 node cluster using Redis caching, and compared the results to a standard Keycloak setup on the same number of nodes, using embedded Infinispan.
This approach hit the proof-of-concept goals, with room for additional optimization and broader validation.
Benchmark | Infinispan (99th pct) | Redis (99th pct) | Delta |
|---|---|---|---|
| All Requests | 67 | 47 | 29.9% |
| Browser to Log In Endpoint | 71 | 86 | -21.1% |
Browser posts correct credentials | 83 | 57 | 31.3% |
| Exchange Code | 22 | 10 | 54.5% |
| RefreshToken | 36 | 12 | 66.7% |
| Browser logout | 43 | 9 | 79.1% |
Note: lower 99th percentile values are better. The "Browser to Log In Endpoint" row is the one regression shown in the original chart.
Benchmark screenshots
Redis benchmark test results:

Infinispan benchmark test results:

Timeline and Context
This work started in September 2025.
A successful implementation would not have been possible without years of prior work in Keycloak internals, extension development, and production operations. Existing knowledge around map-store patterns, DatastoreProvider internals, transaction behavior, and real-world customer requirements was the foundation that made this tractable.
What's Next
Near-term next steps include broader performance validation across workloads, clearer packaging/licensing for source and images, and additional multi-region validation/documentation.
We also plan to release a full image soon at quay.io/repository/phasetwo/keycloak-redis, so stay tuned for more.
If you want all the implementation specifics, design tradeoffs, and diagrams, please stay tuned for the release of the slides and videos of our Keycloak DevDay presentation.
Want to go deeper?
- Reach out to our team: support@phasetwo.io
- Try our hosting and extensions: Phase Two Dash
- Contact us for additional implementation details and architecture guidance: https://phasetwo.io/contact