Skip to main content

Replacing Keycloak's Infinispan Caches with Redis/Valkey (Keycloak DevDay 2026)

· 6 min read
Phase Two
Hosted Keycloak and Keycloak Support

At Keycloak DevDay 2026, we shared our work on replacing Keycloak's distributed Infinispan caches with Redis/Valkey.

For the full technical deep dive, we will release slides when the talk is published on Youtube.

This post focuses on the core technical content from the presentation and summarizes what we built, what we learned, and what comes next.

Why We Took This On

The recurring operator pain points were clear: restart complexity, JGroups/Infinispan operational friction, and upgrade risk in high-availability environments.

Our goals for the proof-of-concept were practical:

  1. Functional parity with Keycloak's distributed cache behavior.
  2. At least 50% of embedded Infinispan performance.
  3. Fast startup (sub-5s per pod) with reliable behavior.
  4. Seamless upgrades and horizontal scaling without rebalance pain.
  5. Extension-based implementation on current Keycloak, without forking.

We also set explicit non-goals, including replacing local Infinispan caches, integrating/migrating persistent sessions, supporting every Redis topology, and solving multi-region concerns in v1.

What We Implemented (High Level)

We implemented a new DatastoreProvider-based approach to replace distributed caches while leaving the rest of Keycloak behavior intact.

At a high level:

  1. Stored entities as Redis hashes (session and related objects).
  2. Added secondary indexes with Redis sets for fast lookup paths (for example, user-to-sessions, client-to-sessions).
  3. Built a changelog-based transaction implementation of KeycloakTransaction to batch and minimize writes at the appropriate stage of the Keycloak request lifecycle.
  4. Used Redis MULTI/EXEC for batched commit behavior.
  5. Implemented Lua-based CAS behavior for single-round-trip atomic updates.
  6. Replaced the use of the replicated work cache, used for local cache invalidation, with a Redis PUBSUB-backed ClusterProvider.
  7. Exposed Redis client and command metrics in Keycloak's Prometheus endpoint (vendor_jedis_*).

For detailed APIs, object mapping rules, and transaction semantics, the slide deck has the full implementation flow and diagrams.

Notable Engineering Details

A few areas mattered most in practice:

  1. Atomicity model: Lua CAS operations reduced contention and retry complexity versus WATCH-based optimistic workflows.
  2. Expiration correctness: TTL handling required careful validation to avoid stale references (for example, refresh token edge cases).
  3. Behavior parity: Reproducing nuanced Infinispan behavior (especially around offline session flows and invalidation semantics) required close reading of existing Keycloak internals.
  4. Index discipline: Clean separation between hash storage and set-based secondary indexes kept lookups efficient and made invalidation predictable.

Benchmarks and Outcomes

In our benchmark setup, the Redis implementation met and exceeded the original performance target. The result was parity or better across the tested flow, while also meeting the startup-speed and extension/no-fork constraints. We used the Keycloak provided keycloak-benchmark project to execute performance tests against a 3 node cluster using Redis caching, and compared the results to a standard Keycloak setup on the same number of nodes, using embedded Infinispan.

This approach hit the proof-of-concept goals, with room for additional optimization and broader validation.

Benchmark

Infinispan (99th pct)

Redis (99th pct)

Delta
All Requests674729.9%
Browser to Log In Endpoint7186-21.1%

Browser posts correct credentials

835731.3%
Exchange Code221054.5%
RefreshToken361266.7%
Browser logout43979.1%

Note: lower 99th percentile values are better. The "Browser to Log In Endpoint" row is the one regression shown in the original chart.

Benchmark screenshots

Redis benchmark test results:

Redis benchmark test results

Infinispan benchmark test results:

Infinispan benchmark test results

Timeline and Context

This work started in September 2025.

A successful implementation would not have been possible without years of prior work in Keycloak internals, extension development, and production operations. Existing knowledge around map-store patterns, DatastoreProvider internals, transaction behavior, and real-world customer requirements was the foundation that made this tractable.

What's Next

Near-term next steps include broader performance validation across workloads, clearer packaging/licensing for source and images, and additional multi-region validation/documentation.

We also plan to release a full image soon at quay.io/repository/phasetwo/keycloak-redis, so stay tuned for more.

If you want all the implementation specifics, design tradeoffs, and diagrams, please stay tuned for the release of the slides and videos of our Keycloak DevDay presentation.


Want to go deeper?

  1. Reach out to our team: support@phasetwo.io
  2. Try our hosting and extensions: Phase Two Dash
  3. Contact us for additional implementation details and architecture guidance: https://phasetwo.io/contact