Blogs

Why Kafka was selected for Corda

Divya Taori, Senior Developer Evangelist June 27 2023 | 5 min read

A distributed ledger made up of mutually distrusting nodes would allow for a single global database that records the state of deals, obligations, and other agreements between institutions and people. This would eliminate much of the manual, time-consuming effort currently required to keep disparate ledgers synchronized with each other. It would also allow for greater levels of code sharing than is currently typical in the financial and other industries, reducing transaction costs for everyone.

Richard Gendal Brown, August 2016

These are the foundational principles set forth by Richard Brown, the Chief Technology Officer of R3. Seven years later, as the Next Generation of Corda is introduced, these principles remain steadfast. Our vision for Corda is to create an ecosystem that fosters digital trust at its core, enabling mutually distrustful parties to seamlessly engage in transactions using the Corda platform.

Introducing Next-Gen Corda

With Corda 5, the Corda platform has undergone significant enhancements to meet the needs of our customers. It builds upon the trust, consistency, and security assurances provided by the previous Corda version, while also enabling high availability (HA), horizontal scalability, and reducing the overall total cost of ownership. Corda 5 represents a rebuilt and completely fresh implementation of the architecture, while still retaining the important features of distributed ledger technology (DLT) such as native smart contracts, distributed ledger, immutability, privacy, composability, and more.

The Corda platform embraces an event-driven worker architecture at its core, emphasizing reliability and scalability. This approach involves two key steps. Firstly, it separates the attested identity from a single compute instance, and secondly, it strategically divides the functionality among distinct components to optimize the total cost of ownership. Let’s understand each of these steps in further detail.

The first step simply means that now a single Corda compute instance does not have to be necessarily linked to one and only one identity. Through the means of sandboxing, the architecture allows you to host multiple identities on a single compute instance. These are referred to as virtual nodes.

The second step entails dividing operations into multiple processes, referred to as ”workers”. Workers handle specific elements of a distributed ledger technology (DLT) system, such as RESTful user interaction, cryptographic operations, and execution of user code (flows). The workers are designed to be stateless, thus, enabling them to be used with different identities hosted on the same compute instance.

These workers seamlessly communicate with each other through Apache Kafka. Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation and is written in Java and Scala. Workers are optimally hosted in a Kubernetes (K8s) environment. This enables a cluster administrator to leverage the innate capabilities of K8s for automatic scaling and failover of the workers deployed into it. It is through this scaling that Corda achieves its HA guarantees – enabling Corda to be deployed in a Hot-Hot/Active-Active configuration, departing from the Hot-Warm/Active-Passive strategy of Corda 4. The below diagram represents a single cluster, on which multiple identities (virtual nodes) can be hosted. A network (application network) may comprise of multiple identities hosted on the same or different clusters. To learn more about these concepts, refer to the documentation.

The Role of Kafka in Corda

In Corda 5, the message bus no longer forms part of the Corda protocol itself. Instead, it serves as a reliable, fault-tolerant message delivery system within a single cluster. By introducing a bus abstraction layer, future alternatives to Kafka can be explored. However, currently, Corda utilizes Kafka at the heart of its cluster.

Reasoning Behind the Selection of Kafka

When designing the runtime infrastructure for Corda, the primary goal was to achieve a hot-hot, high-availability configuration, with automatic work sharding, maximizing throughput and reducing costs. This immediately mandated a structure with multiple worker processes communicating together and locally distributed algorithms to elect the work items handled by any individual worker. Several options were considered:

  • Picking a message bus and implementing the work distribution by hand.
  • Considering Flink, which shares a similar streaming architecture with Corda flows but lacks the same level of industry adoption as Kafka.
  • Utilizing Akka cluster and implementing everything through algorithms within it, although this approach can be challenging to maintain.
  • Choosing Kafka, which already implements all the required functionality and is widely used in production at scale.

The selection process led to the conclusion that Kafka, combined with the Corda processor framework, offered the best approach. Moreover, Kafka’s industry-standard status for high availability and low latency messaging further solidified its suitability for Corda.

Benefits of Kafka

The following are some reasons why Kafka has become the de facto standard for messaging in various applications, counting over 35% of Fortune 500 companies as its users:

  • Storage Capabilities: Unlike most message queues that remove messages immediately after receipt, Kafka can store data for as long as required, providing durability and persistence.
  • Stream Processing: Kafka enables dynamic computation of derived streams and datasets, going beyond simple message batching.
  • Message Replay: Kafka supports the replay of messages, although it does incur a storage overhead.

Conclusion

Apache Kafka emerged as the clear choice for Corda due to its maturity, scalability, and widespread adoption. By leveraging Kafka as the backbone of Corda’s communication infrastructure, Corda 5 achieves the desired high availability, horizontal scalability, and reduced total cost of ownership, ultimately delivering on the rigorous needs of our customers.

All of this is available today! Try out Corda for free here.

Special Thanks to Dr. Kat Baker, Matthew Nesbit, and Dries Samyn for their help with the blog.

Tags:
Share: