Data Streaming Archives - Kai Waehner

Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker

Kai Waehner — Mon, 26 May 2025 05:32:01 +0000

Agentic AI is gaining traction as a design pattern for building more intelligent, autonomous, and collaborative systems. Unlike traditional task-based automation, agentic AI involves intelligent agents that operate independently, make contextual decisions, and collaborate with other agents or systems—across domains, departments, and even enterprises.

In the enterprise world, agentic AI is more than just a technical concept. It represents a shift in how systems interact, learn, and evolve. But unlocking its full potential requires more than AI models and point-to-point APIs—it demands the right integration backbone.

That’s where Apache Kafka as event broker for true decoupling comes into play together with two emerging AI standards: Google’s Application-to-Application (A2A) Protocol and Antrophic’s Model Context Protocol (MCP) in an enterprise architecture for Agentic AI.

Inspired by my colleague Sean Falconer’s blog post, “Why Google’s Agent2Agent Protocol Needs Apache Kafka”, this blog post explores the Agentic AI adoption in enterprises and how an event-driven architecture with Apache Kafka fits into the AI architecture.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various AI examples across industries.

Business Value of Agentic AI in the Enterprise

For enterprises, the promise of agentic AI is compelling:

Smarter automation through self-directed, context-aware agents
Improved customer experience with faster and more personalized responses
Operational efficiency by connecting internal and external systems more intelligently
Scalable B2B interactions that span suppliers, partners, and digital ecosystems

But none of this works if systems are coupled by brittle point-to-point APIs, slow batch jobs, or disconnected data pipelines. Autonomous agents need continuous, real-time access to events, shared state, and a common communication fabric that scales across use cases.

Model Context Protocol (MCP) + Agent2Agent (A2A): New Standards for Agentic AI

The Model Context Protocol (MCP) coined by Anthropic offers a standardized, model-agnostic interface for context exchange between AI agents and external systems. Whether the interaction is streaming, batch, or API-based, MCP abstracts how agents retrieve inputs, send outputs, and trigger actions across services. This enables real-time coordination between models and tools—improving autonomy, reusability, and interoperability in distributed AI systems.

Source: Anthropic

Google’s Agent2Agent (A2A) protocol complements this by defining how autonomous software agents can interact with one another in a standard way. A2A enables scalable agent-to-agent collaboration—where agents discover each other, share state, and delegate tasks without predefined integrations. It’s foundational for building open, multi-agent ecosystems that work across departments, companies, and platforms.

Source: Google

Why Apache Kafka Is a Better Fit Than an API (HTTP/REST) for A2A and MCP

Most enterprises today use HTTP-based APIs to connect services—ideal for simple, synchronous request-response interactions.

In contrast, Apache Kafka is a distributed event streaming platform designed for asynchronous, high-throughput, and loosely coupled communication—making it a much better fit for multi-agent (A2A) and agentic AI architectures.

API-Based Integration	Kafka-Based Integration
Synchronous, blocking	Asynchronous, event-driven
Point-to-point coupling	Loose coupling with pub/sub topics
Hard to scale to many agents	Supports multiple consumers natively
No shared memory	Kafka retains and replays event history
Limited observability	Full traceability with schema registry & DLQs

Kafka serves as the decoupling layer. It becomes the place where agents publish their state, subscribe to updates, and communicate changes—independently and asynchronously. This enables multi-agent coordination, resilience, and extensibility.

MCP + Kafka = Open, Flexible Communication

As the adoption of Agentic AI accelerates, there’s a growing need for scalable communication between AI agents, services, and operational systems. The Model-Context Protocol (MCP) is emerging as a standard to structure these interactions—defining how agents access tools, send inputs, and receive results. But a protocol alone doesn’t solve the challenges of integration, scaling, or observability.

This is where Apache Kafka comes in.

By combining MCP with Kafka, agents can interact through a Kafka topic—fully decoupled, asynchronous, and in real time. Instead of direct, synchronous calls between agents and services, all communication happens through Kafka topics, using structured events based on the MCP format.

This model supports a wide range of implementations and tech stacks. For instance:

A Python-based AI agent deployed in a SaaS environment
A Spring Boot Java microservice running inside a transactional core system
A Flink application deployed at the edge performing low-latency stream processing
An API gateway translating HTTP requests into MCP-compliant Kafka events

Regardless of where or how an agent is implemented, it can participate in the same event-driven system. Kafka ensures durability, replayability, and scalability. MCP provides the semantic structure for requests and responses.

The result is a highly flexible, loosely coupled architecture for Agentic AI—one that supports real-time processing, cross-system coordination, and long-term observability. This combination is already being explored in early enterprise projects and will be a key building block for agent-based systems moving into production.

Stream Processing as the Agent’s Companion

Stream processing technologies like Apache Flink or Kafka Streams allow agents to:

Filter, join, and enrich events in motion
Maintain stateful context for decisions (e.g., real-time credit risk)
Trigger new downstream actions based on complex event patterns
Apply AI directly within the stream processing logic, enabling real-time inference and contextual decision-making with embedded models or external calls to a model server, vector database, or any other AI platform

Agents don’t need to manage all logic themselves. The data streaming platform can pre-process information, enforce policies, and even trigger fallback or compensating workflows—making agents simpler and more focused.

Technology Flexibility for Agentic AI Design with Data Contracts

One of the biggest advantages of Kafka-based event-driven and decoupled backend for agentic systems is that agents can be implemented in any stack:

Languages: Python, Java, Go, etc.
Environments: Containers, serverless, JVM apps, SaaS tools
Communication styles: Event streaming, REST APIs, scheduled jobs

The Kafka topic is the stable data contract for quality and policy enforcement. Agents can evolve independently, be deployed incrementally, and interoperate without tight dependencies.

Microservices, Data Products, and Reusability – Agentic AI Is Just One Piece of the Puzzle

To be effective, Agentic AI needs to connect seamlessly with existing operational systems and business workflows.

Kafka topics enable the creation of reusable data products that serve multiple consumers—AI agents, dashboards, services, or external partners. This aligns perfectly with data mesh and microservice principles, where ownership, scalability, and interoperability are key.

A single stream of enriched order events might be consumed via a single data product by:

A fraud detection agent
A real-time alerting system
An agent triggering SAP workflow updates
A lakehouse for reporting and batch analytics

This one-to-many model is the opposite of traditional REST designs and crucial for enabling agentic orchestration at scale.

Agentic Al Needs Integration with Core Enterprise Systems

Agentic AI is not a standalone trend—it’s becoming an integral part of broader enterprise AI strategies. While this post focuses on architectural foundations like Kafka, MCP, and A2A, it’s important to recognize how this infrastructure complements the evolution of major AI platforms.

Leading vendors such as Databricks, Snowflake, and others are building scalable foundations for machine learning, analytics, and generative AI. These platforms often handle model training and serving. But to bring agentic capabilities into production—especially for real-time, autonomous workflows—they must connect with operational, transactional systems and other agents at runtime. (See also: Confluent + Databricks blog series | Apache Kafka + Snowflake blog series)

This is where Kafka as the event broker becomes essential: it links these analytical backends with AI agents, transactional systems, and streaming pipelines across the enterprise.

At the same time, enterprise application vendors are embedding AI assistants and agents directly into their platforms:

SAP Joule / Business AI – Embedded AI for finance, supply chain, and operations
Salesforce Einstein / Copilot Studio – Generative AI for CRM and sales automation
ServiceNow Now Assist – Predictive automation across IT and employee services
Oracle Fusion AI / OCI – ML for ERP, HCM, and procurement
Microsoft Copilot – Integrated AI across Dynamics and Power Platform
IBM watsonx, Adobe Sensei, Infor Coleman AI – Governed, domain-specific AI agents

Each of these solutions benefits from the same architectural foundation: real-time data access, decoupled integration, and standardized agent communication.

Whether deployed internally or sourced from vendors, agents need reliable event-driven infrastructure to coordinate with each other and with backend systems. Apache Kafka provides this core integration layer—supporting a consistent, scalable, and open foundation for agentic AI across the enterprise.

Agentic AI Requires Decoupling – Apache Kafka Supports A2A and MCP as an Event Broker

To deliver on the promise of agentic AI, enterprises must move beyond point-to-point APIs and batch integrations. They need a shared, event-driven foundation that enables agents (and other enterprise software) to work independently and together—with shared context, consistent data, and scalable interactions.

Apache Kafka provides exactly that. Combined with MCP and A2A for standardized Agentic AI communication, Kafka unlocks the flexibility, resilience, and openness needed for next-generation enterprise AI.

It’s not about picking one agent platform—it’s about giving every agent the same, reliable interface to the rest of the world. Kafka is that interface.

The post Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker appeared first on Kai Waehner.

Replacing Legacy Systems, One Step at a Time with Data Streaming: The Strangler Fig Approach

Kai Waehner — Thu, 27 Mar 2025 06:37:24 +0000

Organizations looking to modernize legacy applications often face a high-stakes dilemma: Do they attempt a complete rewrite or find a more gradual, low-risk approach? Enter the Strangler Fig Pattern, a method that systematically replaces legacy components while keeping the existing system running. Unlike the “Big Bang” approach, where companies try to rewrite everything at once, the Strangler Fig Pattern ensures smooth transitions, minimizes disruptions, and allows businesses to modernize at their own pace. Data streaming transforms the Strangler Fig Pattern into a more powerful, scalable, and truly decoupled approach. Let’s explore why this approach is superior to traditional migration strategies and how real-world enterprises like Allianz are leveraging it successfully.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming architectures and use cases, including various use cases from the retail industry.

What is the Strangler Fig Design Pattern?

The Strangler Fig Pattern is a gradual modernization approach that allows organizations to replace legacy systems incrementally. The pattern was coined and popularized by Martin Fowler to avoid risky “big bang” system rewrites.

Inspired by the way strangler fig trees grow around and eventually replace their host, this pattern surrounds the old system with new services until the legacy components are no longer needed. By decoupling functionalities and migrating them piece by piece, businesses can minimize disruptions, reduce risk, and ensure a seamless transition to modern architectures.

When combined with data streaming, this approach enables real-time synchronization between old and new systems, making the migration even smoother.

Why Strangler Fig is Better than a Big Bang Migration or Rewrite

Many organizations have learned the hard way that rewriting an entire system in one go is risky. A Big Bang migration or rewrite often:

Takes years to complete, leading to stale requirements by the time it’s done
Disrupts business operations, frustrating customers and teams
Requires a high upfront investment with unclear ROI
Involves hidden dependencies, making the transition unpredictable

The Strangler Fig Pattern takes a more incremental approach:

It allows gradual replacement of legacy components one service at a time
Reduces business risk by keeping critical systems operational during migration
Enables continuous feedback loops, ensuring early wins
Keeps costs under control, as teams modernize based on priorities

Here is an example from the Industrial IoT space for the strangler fig pattern leveraging data streaming to modernizing OT Middleware:

If you come from the traditional IT world (banking, retail, etc.) and don’t care about IoT, then you can explore my article about mainframe integration, offloading and replacement with data streaming.

Instead of replacing everything at once, this method surrounds the old system with new components until the legacy system is fully replaced—just like a strangler fig tree growing around its host.

Better Than Reverse ETL: A Migration with Data Consistency across Operational and Analytical Applications

Some companies attempt to work around legacy constraints using Reverse ETL—extracting data from analytical systems and pushing it back into modern operational applications. On paper, this looks like a clever workaround. In reality, Reverse ETL is a fragile, high-maintenance anti-pattern that introduces more complexity than it solves.

Reverse ETL carries several critical flaws:

Batch-based by nature: Data remains at rest in analytical lakehouses like Snowflake, Databricks, Google BigQuery, Microsoft Fabric or Amazon Redshift. It is then periodically moved—usually every few hours or once a day—back into operational systems. This results in outdated and stale data, which is dangerous for real-time business processes.
Tightly coupled to legacy systems: Reverse ETL pipelines still depend on the availability and structure of the original legacy systems. A schema change, performance issue, or outage upstream can break downstream workflows—just like with traditional ETL.
Slow and inefficient: It introduces latency at multiple stages, limiting the ability to react to real-world events at the right moment. Decision-making, personalization, fraud detection, and automation all suffer.
Not cost-efficient: Reverse ETL tools often introduce double processing costs—you pay to compute and store the data in the warehouse, then again to extract, transform, and sync it back into operational systems. This increases both financial overhead and operational burden, especially as data volumes scale.

In short, Reverse ETL is a short-term fix for a long-term challenge. It’s a temporary bridge over the widening gap between real-time operational needs and legacy infrastructure.

Why Data Streaming with Apache Kafka and Flink is an Excellent Fit for the Strangler Fig Pattern

Many modernization efforts fail because they tightly couple old and new systems, making transitions painful. Data streaming with Apache Kafka and Flink changes the game by enabling real-time, event-driven communication.

1. True Decoupling of Old and New Systems

Data streaming using Apache Kafka with its event-driven streaming and persistence layer enables organizations to:

Decouple producers (legacy apps) from consumers (modern apps)
Process real-time and historical data without direct database dependencies
Allow new applications to consume events at their own pace

This avoids dependency on old databases and enables teams to move forward without breaking existing workflows.

2. Real-Time Replication with Persistence

Unlike traditional migration methods, data streaming supports:

Real-time synchronization of data between legacy and modern systems
Durable persistence to handle retries, reprocessing, and recovery
Scalability across multiple environments (on-prem, cloud, hybrid)

This ensures data consistency without downtime, making migrations smooth and reliable.

3. Supports Any Technology and Communication Paradigm

Data streaming’s power lies in its event-driven architecture, which supports any integration style—without compromising scalability or real-time capabilities.

The data product approach using Kafka Topics with data contracts handles:

Real-time messaging for low-latency communication and operational systems
Batch processing for analytics, reporting and AI/ML model training
Request-response for APIs and point-to-point integration with external systems
Hybrid integration—syncing legacy databases with cloud apps, uni- or bi-directionally

This flexibility lets organizations modernize at their own pace, using the right communication pattern for each use case while unifying operational and analytical workloads on a single platform.

4. No Time Pressure – Migrate at Your Own Pace

One of the biggest advantages of the Strangler Fig Pattern with data streaming is flexibility in timing.

No need for overnight cutovers—migrate module by module
Adjust pace based on business needs—modernization can align with other priorities
Handle delays without data loss—thanks to durable event storage

This ensures that companies don’t rush into risky migrations but instead execute transitions with confidence.

5. Intelligent Processing in the Data Migration Pipeline

In a Strangler Fig Pattern, it’s not enough to just move data from old to new systems—you also need to transform, validate, and enrich it in motion.

While Apache Kafka provides the backbone for real-time event streaming and durable storage, Apache Flink adds powerful stream processing capabilities that elevate the modernization journey.

With Apache Flink, organizations can:

Real-time preprocessing: Clean, filter, and enrich legacy data before it’s consumed by modern systems.
Data product migration: Transform old formats into modern, domain-driven event models.
Improved data quality: Validate, deduplicate, and standardize data in motion.
Reusable business logic: Centralize processing logic across services without embedding it in application code.
Unified streaming and batch: Support hybrid workloads through one engine.

Apache Flink allows you to roll out trusted, enriched, and well-governed data products—gradually, without disruption. Together with Kafka, it provides the real-time foundation for a smooth, scalable transition from legacy to modern.

Allianz’s Digital Transformation and Transition to Hybrid Cloud: An IT Modernization Success Story

Allianz, one of the world’s largest insurers, set out to modernize its core insurance systems while maintaining business continuity. A full rewrite was too risky, given the critical nature of insurance claims and regulatory requirements. Instead, Allianz embraced the Strangler Fig Pattern with data streaming.

This approach allows an incremental, low-risk transition. By implementing data streaming with Kafka as an event backbone, Allianz could gradually migrate from legacy mainframes to a modern, scalable cloud architecture. This ensured that new microservices could process real-time claims data, improving speed and efficiency without disrupting existing operations.

Source: Allianz at Data in Motion Tour Frankfurt 2022

To achieve this IT modernization, Allianz leveraged automated microservices, real-time analytics, and event-driven processing to enhance key operations, such as pricing, fraud detection, and customer interactions.

A crucial component of this shift was the Core Insurance Service Layer (CISL), which enabled decoupling applications via a data streaming platform to ensure seamless integration across various backend systems.

With the goal of migrating over 75% of applications to the cloud, Allianz significantly increases agility, reduced operational complexity, and minimized technical debt, positioning itself for long-term digital success by transitioning many applications to the cloud, as you can read (in German) in the CIO.de article:

“As one of the largest insurers in the world, Allianz plans to migrate over 75 percent of its applications to the cloud and modernize its core insurance system.”

Beyond technology, Allianz recognized that successful modernization also required cultural transformation. To drive internal adoption of event-driven architectures, the company launched Allianz Eventing Day—an initiative to educate teams on the benefits of Kafka-based streaming.

Hosted in partnership with Confluent, the event brought together Allianz experts and industry leaders to discuss real-world implementations, best practices, and future innovations. This collaborative environment reinforced Allianz’s commitment to continuous learning and agile development, ensuring that data streaming became a core pillar of its IT strategy.

Allianz also extended this engagement to the broader insurance industry, organizing the first Insurance Roundtable on Event-Driven Architectures with top insurers from across Germany and Switzerland. Experts from Allianz, Generali, HDI, Swiss Re, and others exchanged insights on topics like real-time data analytics, API decoupling, and event discovery. The discussions highlighted how streaming technologies drive competitive advantage, allowing insurers to react faster to customer needs and continuously improve their services. By embracing domain-driven design (DDD) and event storming, Allianz ensured that event-driven architectures were not just a technical upgrade, but a fundamental shift in how insurance operates in the digital age.

The Future of IT Modernization and Legacy Migrations with Strangler Fig using Data Streaming

The Strangler Fig Pattern is the most pragmatic approach to modernizing enterprise systems, and data streaming makes it even more powerful.

By decoupling old and new systems, enabling real-time synchronization, and supporting multiple architectures, data streaming with Apache Kafka and Flink provides the flexibility, reliability, and scalability needed for successful migrations and long-term integrations between legacy and cloud-native applications.

As enterprises continue to strengthen, this approach ensures that modernization doesn’t become a bottleneck, but a business enabler. If your organization is considering legacy modernization, data streaming is the key to making the transition seamless and risk-free.

Are you ready to transform your legacy systems without the risks of a big bang rewrite? What’s one part of your legacy system you’d “strangle” first—and why? If you could modernize just one workflow with real-time data tomorrow, what would it be?

Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And download my free book about data streaming architectures, use cases and success stories in the retail industry.

The post Replacing Legacy Systems, One Step at a Time with Data Streaming: The Strangler Fig Approach appeared first on Kai Waehner.

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Kai Waehner — Mon, 17 Mar 2025 12:45:14 +0000

Operational Technology (OT) has traditionally relied on legacy middleware to connect industrial systems, manage data flows, and integrate with enterprise IT. However, these monolithic, proprietary, and expensive middleware solutionsstruggle to keep up with real-time, scalable, and cloud-native architectures.

Just as mainframe offloading modernized enterprise IT, offloading and replacing legacy OT middleware is the next wave of digital transformation. Companies are shifting from vendor-locked, heavyweight OT middleware to real-time, event-driven architectures using Apache Kafka and Apache Flink—enabling cost efficiency, agility, and seamless edge-to-cloud integration.

This blog explores why and how organizations are replacing traditional OT middleware with data streaming, the benefits of this shift, and architectural patterns for hybrid and edge deployments.

Why Replace Legacy OT Middleware?

Industrial environments have long relied on OT middleware like OSIsoft PI, proprietary SCADA systems, and industry-specific data buses. These solutions were designed for polling-based communication, siloed data storage, and batch integration. But today’s real-time, AI-driven, and cloud-native use cases demand more.

Challenges: Proprietary, Monolithic, Expensive

High Costs – Licensing, maintenance, and scaling expenses grow exponentially.
Proprietary & Rigid – Vendor lock-in restricts flexibility and data sharing.
Batch & Polling-Based – Limited ability to process and act on real-time events.
Complex Integration – Difficult to connect with cloud and modern IT systems.
Limited Scalability – Not built for the massive data volumes of IoT and edge computing.

Just as PLCs are transitioning to virtual PLCs, eliminating hardware constraints and enabling software-defined industrial control, OT middleware is undergoing a similar shift. Moving from monolithic, proprietary middleware to event-driven, streaming architectures with Kafka and Flink allows organizations to scale dynamically, integrate seamlessly with IT, and process industrial data in real time—without vendor lock-in or infrastructure bottlenecks.

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Data streaming is NOT a direct replacement for OT middleware, but it serves as the foundation for modernizing industrial data architectures. With Kafka and Flink, enterprises can offload or replace OT middleware to achieve real-time processing, edge-to-cloud integration, and open interoperability.

While Kafka and Flink provide real-time, scalable, and event-driven capabilities, last-mile integration with PLCs, sensors, and industrial equipment still requires OT-specific SDKs, open interfaces, or lightweight middleware. This includes support for MQTT, OPC UA or open-source solutions like Apache PLC4X to ensure seamless connectivity with OT systems.

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Kafka acts as the central nervous system for industrial data to ensure low-latency, scalable, and fault-tolerant event streaming between OT and IT systems.

Aggregates and normalizes OT data from sensors, PLCs, SCADA, and edge devices.
Bridges OT and IT by integrating with ERP, MES, cloud analytics, and AI/ML platforms.
Operates seamlessly in hybrid, multi-cloud, and edge environments, ensuring real-time data flow.
Works with open OT standards like MQTT and OPC UA, reducing reliance on proprietary middleware solutions.

And just to be clear: Apache Kafka and similar technologies support “IT real-time” (meaning milliseconds of latency and sometimes latency spikes). This is NOT about hard real-time in the OT world for embedded systems or safety critical applications.

Apache Flink: The Real-Time Processing Engine for OT Data

Flink powers real-time analytics, complex event processing, and anomaly detection for streaming industrial data.

Processes sensor data in real-time for predictive maintenance, energy optimization, and fault detection.
Enables stateful stream processing, supporting event correlation, time-series aggregation, and trend analysis.
Runs on edge devices, data centers, or cloud platforms, providing scalability and deployment flexibility.

By leveraging Kafka and Flink, enterprises can process OT and IT data only once, ensuring a real-time, unified data architecture that eliminates redundant processing across separate systems. This approach enhances operational efficiency, reduces costs, and accelerates digital transformation while still integrating seamlessly with existing industrial protocols and interfaces.

Unifying Operational (OT) and Analytical (IT) Workloads

As industries modernize, a shift-left architecture approach ensures that operational data is not just consumed for real-time operational OT workloads but is also made available for transactional and analytical IT use cases—without unnecessary duplication or transformation overhead.

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

In traditional architectures, OT data is first collected, processed, and stored in proprietary or siloed middleware systems before being moved later to IT systems for analysis. This delayed, multi-step process leads to inefficiencies, including:

High latency between data collection and actionable insights.
Redundant data storage and transformations, increasing complexity and cost.
Disjointed AI/ML pipelines, where models are trained on outdated, pre-processed data rather than real-time information.

A shift-left approach eliminates these inefficiencies by bringing analytics, AI/ML, and data science closer to the raw, real-time data streams from the OT environments.

Instead of waiting for batch pipelines to extract and move data for analysis, a modern architecture integrates real-time streaming with open table formats to ensure immediate usability across both operational and analytical workloads.

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

By integrating open table formats like Apache Iceberg and Delta Lake, organizations can:

Unify operational and analytical workloads to enable both real-time data streaming and batch analytics in a single architecture.
Eliminate data silos, ensuring that OT and IT teams access the same high-quality, time-series data without duplication.
Ensure schema evolution and ACID transactions to enable robust and flexible long-term data storage and retrieval.
Enable real-time and historical analytics, allowing engineers, business users, and AI/ML models to query both fresh and historical data efficiently.
Reduce the need for complex ETL pipelines, as data is written once and made available for multiple workloadssimultaneously. And no need to use the anti-pattern of Reverse ETL.

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

This open, hybrid OT/IT architecture allows organizations to maintain real-time industrial automation and monitoring with Kafka and Flink, while ensuring structured, queryable, and analytics-ready data with Iceberg or Delta Lake. The shift-left approach ensures that data streams remain useful beyond their initial OT function, powering AI-driven automation, predictive maintenance, and business intelligence in near real-time rather than relying on outdated and inconsistent batch processes.

By adopting this unified, streaming-first architecture to build an open and cloud-native data historian, organizations can:

Process data once and make it available for both real-time decisions and long-term analytics.
Reduce costs and complexity by eliminating unnecessary data duplication and movement.
Improve AI/ML effectiveness by feeding models with real-time, high-fidelity OT data.
Ensure compliance and historical traceability without compromising real-time performance.

This approach future-proofs industrial data infrastructures, allowing enterprises to seamlessly integrate IT and OT, while supporting cloud, edge, and hybrid environments for maximum scalability and resilience.

Key Benefits of Offloading OT Middleware to Data Streaming

Lower Costs – Reduce licensing fees and maintenance overhead.
Real-Time Insights – No more waiting for batch updates; analyze events as they happen.
One Unified Data Pipeline – Process data once and make it available for both OT and IT use cases.
Edge and Hybrid Cloud Flexibility – Run analytics at the edge, on-premise, or in the cloud.
Open Standards & Interoperability – Support MQTT, OPC UA, REST/HTTP, Kafka, and Flink, avoiding vendor lock-in.
Scalability & Reliability – Handle massive sensor and machine data streams continuously without performance degradation.

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

Companies transitioning from legacy OT middleware have several strategies by leveraging data streaming as an integration and migration platform:

Hybrid Data Processing
Lift-and-Shift
Full OT Middleware Replacement

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

Traditional OT architectures often duplicate data processing across multiple siloed systems, leading to higher costs, slower insights, and operational inefficiencies. Many enterprises still process data inside expensive legacy OT middleware, only to extract and reprocess it again for IT, analytics, and cloud applications.

A hybrid approach using Kafka and Flink enables organizations to offload processing from legacy middleware while ensuring real-time, scalable, and cost-efficient data streaming across OT, IT, cloud, and edge environments.

How?

Connect to the existing OT middleware via:

A Kafka Connector (if available).
HTTP APIs, OPC UA, or MQTT for data extraction.
Custom integrations for proprietary OT protocols.
Lightweight edge processing to pre-filter data before ingestion.

Use Kafka for real-time ingestion, ensuring all OT data is available in a scalable, event-driven pipeline.

Process data once with Flink to:

Apply real-time transformations, aggregations, and filtering at scale.
Perform predictive analytics and anomaly detection before storing or forwarding data.
Enrich OT data with IT context (e.g., adding metadata from ERP or MES).

Distribute processed data to the right destinations, such as:

Time-series databases for historical analysis and monitoring.
Enterprise IT systems (ERP, MES, CMMS, BI tools) for decision-making.
Cloud analytics and AI platforms for advanced insights.
Edge and on-prem applications that need real-time operational intelligence.

Result?

Eliminate redundant processing across OT and IT, reducing costs.
Real-time data availability for analytics, automation, and AI-driven decision-making.
Unified, event-driven architecture that integrates seamlessly with on-premise, edge, hybrid, and cloud environments.
Flexibility to migrate OT workloads over time, without disrupting current operations.

By offloading costly data processing from legacy OT middleware, enterprises can modernize their industrial data infrastructure while maintaining interoperability, efficiency, and scalability.

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

Many enterprises rely on legacy OT middleware like OSIsoft PI, proprietary SCADA systems, or industry-specific data hubs for storing and processing industrial data. However, these solutions come with high licensing costs, limited scalability, and an inflexible architecture.

A lift-and-shift approach provides an immediate cost reduction by offloading data ingestion and storage to Apache Kafka while keeping existing integrations intact. This allows organizations to modernize their infrastructure without disrupting current operations.

How?

Use the Stranger Fig Design Pattern as a gradual modernization approach where new systems incrementally replace legacy components, reducing risk and ensuring a seamless transition:

“The most important reason to consider a strangler fig application over a cut-over rewrite is reduced risk.” Martin Fowler

Replace expensive OT middleware for ingestion and storage:

Deploy Kafka as a scalable, real-time event backbone to collect and distribute data.
Offload sensor, PLC, and SCADA data from OSIsoft PI, legacy brokers, or proprietary middleware.
Maintain the connectivity with existing OT applications to prevent workflow disruption.

Streamline OT data processing:

Store and distribute data in Kafka instead of proprietary, high-cost middleware storage.
Leverage schema-based data governance to ensure compatibility across IT and OT systems.
Reduce data duplication by ingesting once and distributing to all required systems.

Maintain existing IT and analytics integrations:

Keep connections to ERP, MES, and BI platforms via Kafka connectors.
Continue using existing dashboards and reports while transitioning to modern analytics platforms.
Avoid vendor lock-in and enable future migration to cloud or hybrid solutions.

Result?

Immediate cost savings by reducing reliance on expensive middleware storage and licensing fees.
No disruption to existing workflows, ensuring continued operational efficiency.
Scalable, future-ready architecture with the flexibility to expand to edge, cloud, or hybrid environments over time.
Real-time data streaming capabilities, paving the way for predictive analytics, AI-driven automation, and IoT-driven optimizations.

A lift-and-shift approach serves as a stepping stone toward full OT modernization, allowing enterprises to gradually transition to a fully event-driven, real-time architecture.

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

Legacy OT middleware systems were designed for on-premise, batch-based, and proprietary environments, making them expensive, inflexible, and difficult to scale. As industries embrace cloud-native architectures, edge computing, and real-time analytics, replacing traditional OT middleware with event-driven streaming platforms enables greater flexibility, cost efficiency, and real-time operational intelligence.

A full OT middleware replacement eliminates vendor lock-in, outdated integration methods, and high-maintenance costs while enabling scalable, event-driven data processing that works across edge, on-premise, and cloud environments.

How?

Use Kafka and Flink as the Core Data Streaming Platform

Kafka replaces legacy data brokers and middleware storage by handling high-throughput event ingestion and real-time data distribution.
Flink provides advanced real-time analytics, anomaly detection, and predictive maintenance capabilities.
Process OT and IT data in real-time, eliminating batch-based limitations.

Replace Proprietary Connectors with Lightweight, Open Standards

Deploy MQTT or OPC UA gateways to enable seamless communication with sensors, PLCs, SCADA, and industrial controllers.
Eliminate complex, costly middleware like OSIsoft PI with low-latency, open-source integration.
Leverage Apache PLC4X for industrial protocol connectivity, avoiding proprietary vendor constraints.

Adopt a Cloud-Native, Hybrid, or On-Premise Storage Strategy

Store time-series data in scalable, purpose-built databases like InfluxDB or TimescaleDB.
Enable real-time query capabilities for monitoring, analytics, and AI-driven automation.
Ensure data availability across on-premise infrastructure, hybrid cloud, and multi-cloud deployments.

Modernize IT and Business Integrations

Enable seamless OT-to-IT integration with ERP, MES, BI, and AI/ML platforms.
Stream data directly into cloud-based analytics services, digital twins, and AI models.
Build real-time dashboards and event-driven applications for operators, engineers, and business stakeholders.

Result?

Fully event-driven and cloud-native OT architecture that eliminates legacy bottlenecks.
Real-time data streaming and processing across all industrial environments.
Scalability for high-throughput workloads, supporting edge, hybrid, and multi-cloud use cases.
Lower operational costs and reduced maintenance overhead by replacing proprietary, heavyweight OT middleware.
Future-ready, open, and extensible architecture built on Kafka, Flink, and industry-standard protocols.

By fully replacing OT middleware, organizations gain real-time visibility, predictive analytics, and scalable industrial automation, unlocking new business value while ensuring seamless IT/OT integration.

Helin is an excellent example for a cloud-native IT/OT data solution powered by Kafka and Flink to focus on real-time data integration and analytics, particularly in the context of industrial and operational environments. Its industry focus on maritime and energy sector, but this is relevant across all IIoT industries.

The next generation of OT architectures is being built on open standards, real-time streaming, and hybrid cloud.

Most new industrial sensors, machines, and control systems are now designed with Kafka, MQTT, and OPC UA compatibility.
Modern IT architectures demand event-driven data pipelines for AI, analytics, and automation.
Edge and hybrid computing require scalable, fault-tolerant, real-time processing.

Use Kafka Cluster Linking for seamless bi-directional data replication and command&control, ensuring low-latency, high-availability data synchronization across on-premise, edge, and cloud environments.

Enable multi-region and hybrid edge to cloud architectures with real-time data mirroring to allow organizations to maintain data consistency across global deployments while ensuring business continuity and failover capabilities.

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

The days of expensive, proprietary, and rigid OT middleware are numbered (at least for new deployments). Industrial enterprises need real-time, scalable, and open architectures to meet the growing demands of automation, predictive maintenance, and industrial IoT. By embracing open IoT and data streaming technologies, companies can seamlessly bridge the gap between Operational Technology (OT) and IT, ensuring efficient, event-driven communication across industrial systems.

MQTT, OPC-UA and Apache Kafka are a match in heaven for industrial IoT:

MQTT enables lightweight, publish-subscribe messaging for industrial sensors and edge devices.
OPC-UA provides secure, interoperable communication between industrial control systems and modern applications.
Kafka acts as the high-performance event backbone, allowing data from OT systems to be streamed, processed, and analyzed in real time.

Whether lifting and shifting, optimizing hybrid processing, or fully replacing legacy middleware, data streaming is the foundation for the next generation of OT and IT integration. With Kafka at the core, enterprises can decouple systems, enhance scalability, and unlock real-time analytics across the entire industrial landscape.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases and industry success stories.

The post Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming appeared first on Kai Waehner.

CIO Summit: The State of AI and Why Data Streaming is Key for Success

Kai Waehner — Thu, 13 Mar 2025 07:31:33 +0000

This week, I had the privilege of engaging in insightful conversations at the CIO Summit organized by GDS Group in Amsterdam, Netherlands. The event brought together technology leaders from across Europe and industries such as financial services, manufacturing, energy, gaming, telco, and more. The focus? AI – but with a much-needed reality check. While the potential of AI is undeniable, the hype often outpaces real-world value. Discussions at the summit revolved around how enterprises can move beyond experimentation and truly integrate AI to drive business success.

Key Learnings on the State of AI

The CIO Summit in Amsterdam provided a reality check on AI adoption across industries. While excitement around AI is high, success depends on moving beyond the hype and focusing on real business value. Conversations with technology leaders revealed critical insights about AI’s maturity, challenges, and the key factors driving meaningful impact. Here are the most important takeaways.

AI is Still in Its Early Stages – Beware of the Buzz vs. Value

The AI landscape is evolving rapidly, but many organizations are still in the exploratory phase. Executives recognize the enormous promise of AI but also see challenges in implementation, scaling, and achieving meaningful ROI.

The key takeaway? AI is not a silver bullet. Companies that treat it as just another trendy technology risk wasting resources on hype-driven projects that fail to deliver tangible outcomes.

Generative AI vs. Predictive AI – Understanding the Differences

There was a lot of discussion about Generative AI (GenAI) vs. Predictive AI, two dominant categories that serve very different purposes:

Predictive AI analyzes historical and real-time data to forecast trends, detect anomalies, and automate decision-making (e.g., fraud detection, supply chain optimization, predictive maintenance).
Generative AI creates new content based on trained data (e.g., text, images, or code), enabling applications like automated customer service, software development, and marketing content generation.

While GenAI has captured headlines, Predictive AI remains the backbone of AI-driven automation in enterprises. CIOs must carefully evaluate where each approach adds real business value.

Good Data Quality is Non-Negotiable

A critical takeaway: AI is only as good as the data that fuels it. Poor data quality leads to inaccurate AI models, bad predictions, and failed implementations.

To build trustworthy and effective AI solutions, organizations need:

Accurate, complete, and well-governed data

Real-time and historical data integration

Continuous data validation and monitoring

Context Matters – AI Needs Real-Time Decision-Making

Many AI use cases rely on real-time decision-making. A machine learning model trained on historical data is useful, but without real-time context, it quickly becomes outdated.

For example, fraud detection systems need to analyze real-time transactions while comparing them to historical behavioral patterns. Similarly, AI-powered supply chain optimization depends on up-to-the-minute logistics datarather than just past trends.

The conclusion? Real-time data streaming is essential to unlocking AI’s full potential.

Automate First, Then Apply AI

One common theme among successful AI adopters: Optimize business processes before adding AI.

Organizations that try to retrofit AI onto inefficient, manual processes often struggle with adoption and ROI. Instead, the best approach is:

1⃣ Automate and optimize workflows using real-time data

2⃣ Apply AI to enhance automation and improve decision-making

By taking this approach, companies ensure that AI is applied where it actually makes a difference.

ROI Matters – AI Must Drive Business Value

CIOs are under pressure to deliver business-driven, NOT tech-driven AI projects. AI initiatives that lack a clear ROI roadmap often stall after pilot phases.

Two early success stories for Generative AI stand out:

Customer support – AI chatbots and virtual assistants enhance response times and improve customer experience.
Software engineering – AI-powered code generation boosts developer productivity and reduces time to market.

The lesson? Start with AI applications that deliver clear, measurable business impact before expanding into more experimental areas.

Data Streaming and AI – The Perfect Match

At the heart of AI’s success is data streaming. Why? Because modern AI requires a continuous flow of fresh, real-time data to make accurate predictions and generate meaningful insights.

Data streaming not only powers AI with real-time insights but also ensures that AI-driven decisions directly translate into measurable business value:

Here’s how data streaming powers both Predictive and Generative AI:

Predictive AI + Data Streaming

Predictive AI thrives on timely, high-quality data. Real-time data streaming enables AI models to process and react to events as they happen. Examples include:

Fraud detection: AI analyzes real-time transactions to detect suspicious activity before fraud occurs.

Predictive maintenance: Streaming IoT sensor data allows AI to predict equipment failures before they happen.

Supply chain optimization: AI dynamically adjusts logistics routes based on real-time disruptions.

Here is an example from Capital One bank about fraud detection and prevention in real-time, preventing $150 of fraud on average a year/customer:

Source: Confluent

Generative AI + Data Streaming

Generative AI also benefits from real-time data. Instead of relying on static datasets, streaming data enhances GenAI applications by incorporating the latest information:

AI-powered customer support: Chatbots analyze live customer interactions to generate more relevant responses.

AI-driven marketing content: GenAI adapts promotional messaging in real-time based on customer engagement signals.

Software development acceleration: AI assistants provide real-time code suggestions as developers write code.

In short, without real-time data, AI is limited to outdated insights.

Here is an example for GenAI with data streaming in the travel Industry by Expedia where 60% of travelers are self-servicing in chat, saving 40+% of variable agent cost:

Source: Confluent

The Future of AI: Agentic AI and the Role of Data Streaming

As AI evolves, we are moving toward Agentic AI – systems that autonomously take actions, learn from feedback, and adapt in real time.

For example:

AI-driven cybersecurity systems that detect and respond to threats instantly

Autonomous supply chains that dynamically adjust based on demand shifts

Intelligent business operations where AI continuously optimizes workflows

But Agentic AI can only work if it has access to real-time operational AND analytical data. That’s why data streaming is becoming a critical foundation for the next wave of AI innovation.

The Path to AI Success

The CIO Summit reinforced one key message: AI is here to stay, but its success depends on strategy, data quality, and business value – not just hype.

Organizations that:

Focus on AI applications with clear business ROI

Automate before applying AI

Prioritize real-time data streaming

… will be best positioned to drive AI success at scale.

As AI moves towards autonomous decision-making (Agentic AI), data streaming will become even more critical. The ability to process and act on real-time data will separate AI leaders from laggards.

Now the real question: Where is your AI strategy headed? Let’s discuss!

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book focusing on data streaming use cases, industry stories and business value.

The post CIO Summit: The State of AI and Why Data Streaming is Key for Success appeared first on Kai Waehner.

How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025

Kai Waehner — Fri, 07 Mar 2025 06:44:11 +0000

The telecommunications and technology industries are at a pivotal moment. As innovation accelerates, businesses must leverage cutting-edge technologies to stay ahead. For MWC 2025, McKinsey highlighted five crucial themes shaping the future: IT excellence in telecom, sustainability challenges, the evolution of 6G, the rise of generative AI, and AI-driven software development.

MWC (Mobile World Congress) 2025 serves as the global stage where industry leaders, telecom operators, and technology pioneers converge to discuss the next wave of connectivity and digital transformation. As organizations gear up for a data-driven future, real-time data streaming emerges as the critical enabler of efficiency, agility, and value creation.

This blog explores each of McKinsey’s key themes from MWC 2025 and how data streaming helps businesses innovate and gain a competitive advantage in the hyper-connected world ahead.

1. IT Excellence: Driving Telecom Innovation and Cost Efficiency

Telecom operators are under immense pressure to monetize massive infrastructure investments while maintaining cost efficiency. McKinsey’s benchmarking study shows that leading telecom tech players spend less on IT while achieving superior cost efficiency and innovation. Successful operators integrate business and IT transformations holistically, optimizing cloud strategies, IT architectures, and AI-driven processes.

How Data Streaming Powers IT Excellence

Real-Time IT Monitoring: Streaming data pipelines provide continuous observability into IT performance, reducing downtime and optimizing infrastructure costs.
Automated Network Operations: Event-driven architectures allow operators to dynamically allocate resources, minimizing network congestion and improving service quality.
Cloud-Native AI Models: By continuously feeding AI models with live data, telecom leaders ensure optimal network performance and predictive maintenance.

Business Impact: Faster time-to-market, lower IT costs, and improved network reliability.

A great example of this transformation is Dish Wireless, which built a fully cloud-native, software-driven 5G network powered by Apache Kafka. By leveraging real-time data streaming, Dish ensures low-latency, scalable, and event-driven operations, allowing it to optimize network performance, automate infrastructure management, and provide next-generation connectivity for enterprise applications.

Dish’s data-first approach demonstrates how streaming technologies are redefining telecom infrastructure and unlocking new business models.

Read more about how Apache Kafka powers Dish Wireless’ 5G infrastructure or watch the following webinar with Dish:

2. Tackling Telecom Emissions: A Sustainable Future

The telecom industry faces increasing regulatory pressure and consumer expectations to decarbonize operations. While many companies have reduced Scope 1 (direct emissions) and Scope 2 (energy consumption) emissions, the real challenge lies in Scope 3 emissions from supply chains. McKinsey’s research suggests that 60% of an integrated operator’s emissions can be reduced for less than $100 per ton of CO₂.

How Data Streaming Supports Sustainability Efforts

Energy Optimization in Real Time: Streaming analytics continuously monitor energy usage across network infrastructure, automatically adjusting power consumption.
Carbon Footprint Tracking: Data pipelines aggregate real-time emissions data, enabling operators to meet sustainability goals efficiently.
Predictive Maintenance for Energy Efficiency: AI-driven insights help optimize network hardware lifespan, reducing waste and unnecessary energy consumption.

Business Impact: Reduced carbon footprint, cost savings on energy consumption, and regulatory compliance.

Beyond telecom, data streaming is transforming sustainability efforts across industries. For example, in manufacturing and real estate, companies like Ampeers Energy and PAUL Tech AG use Apache Kafka and Flink to optimize energy distribution, reduce emissions, and improve ESG ratings.

These real-time data platforms analyze IoT sensor data, weather forecasts, and energy consumption patterns to automate decision-making and lower energy waste. Similarly, EverySens leverages streaming data to decarbonize freight transport, eliminating hundreds of thousands of unnecessary truck rides each year. These use cases demonstrate how data-driven sustainability strategiescan be scaled across sectors to achieve meaningful environmental impact.

3. Shaping the Future of 6G: Beyond Connectivity

6G is expected to revolutionize industries by enabling ultra-low latency, ubiquitous connectivity, and AI-driven network optimization. However, the transition from 5G to 6G requires overcoming legacy infrastructure challenges and developing multi-capability platforms that go beyond simple connectivity.

How Data Streaming Powers the 6G Revolution

Network Sensing and Intelligent Routing: Streaming architectures process real-time network telemetry, enabling adaptive, self-optimizing networks.
AI-Enhanced Edge Computing: Real-time analytics ensure minimal latency for mission-critical applications such as autonomous vehicles and smart cities.
Cross-Sector Data Monetization: Operators can leverage streaming data to offer network-as-a-service (NaaS) solutions, opening new revenue streams.

Business Impact: New monetization opportunities, improved network efficiency, and enhanced customer experience.

Source: Dish Wireless

As the 6G era approaches, real-time data streaming is already proving its value in 5G deployments, unlocking low-latency, high-bandwidth use cases.

A great example is Verizon’s Mobile Edge Computing (MEC) initiative, which uses data streaming and AI-powered analytics to support real-time applications like autonomous drone monitoring, vehicle-to-everything (V2X) communication, and predictive maintenance in industrial settings. By processing data at the network edge, telcos minimize latency and optimize bandwidth—capabilities that will be even more critical in 6G.

With cloud-native, event-driven architectures, data streaming enables telcos to evolve from traditional connectivity providers to technology leaders. As 6G advances, expect faster network automation, more sophisticated AI integration, and deeper partnerships between telecom operators and enterprise customers.

4. Generative AI: A Profitability Game-Changer for Telcos

McKinsey highlights generative AI’s potential to boost telco profitability by up to 10% in annual EBITDA through automation, hyper-personalization, and AI-driven customer engagement. Leading telcos are already leveraging AI to improve customer service, marketing, and network operations.

How Data Streaming Enhances Gen AI in Telecom

Real-Time Customer Insights: AI-powered recommendation engines deliver personalized offers and dynamic pricing in milliseconds.
Automated Call Center Operations: Real-time transcription and sentiment analysis improve chatbot accuracy and agent productivity.
Proactive Network Management: AI models trained on continuous streaming data predict and prevent network failures before they occur.

Business Impact: Higher customer satisfaction, reduced operational costs, and increased revenue per user.

As telecom providers integrate Generative AI (GenAI) into their business models, real-time data streaming is a foundational technology that enables efficient AI inference and model retraining. One compelling example is the GenAI Demo with Kafka, Flink, LangChain, and OpenAI, which illustrates how streaming architectures power AI-driven sales and customer interactions.

This demo showcases how real-time CRM data from Salesforce is enriched with web and LinkedIn data via streaming ETL using Apache Flink. Then, AI models process this context using LangChain and OpenAI, generating personalized, context-specific sales recommendations—a workflow that can be extended to telecom call centers and customer engagement platforms.

Expedia’s success story further highlights how GenAI combined with data streaming improves customer interactions. Facing a massive surge in support requests during COVID-19, Expedia automated responses with AI-driven chatbots, significantly reducing agent workloads. By integrating Apache Kafka with AI models, 60% of travelers began self-servicing their inquiries, resulting in over 40% cost savings in customer support operations.

Source: Confluent

For telecom providers, similar AI-driven automation can optimize call centers, personalized customer offers, fraud detection, and even predictive maintenance for network infrastructure. Data streaming ensures that AI models continuously learn from fresh data, making GenAI solutions more accurate, responsive, and cost-effective.

5. AI-Driven Software Development: Faster, Smarter, Better

AI is fundamentally transforming software development, accelerating the product development lifecycle (PDLC) and improving product quality. AI-assisted coding, automated testing, and real-time feedback loops are enabling companies to deliver customer-centric solutions at unprecedented speed.

How Data Streaming Transforms AI-Driven Software Development

Continuous Feedback and Iteration: Streaming analytics provide instant feedback from user behavior, enabling faster iterations and bug fixes.
Automated Code Quality Checks: AI-driven continuous integration (CI/CD) pipelines validate new code in real-time, ensuring seamless software deployments.
Live Performance Monitoring: Streaming data enables real-time anomaly detection, ensuring optimal application performance.

Business Impact: Faster time-to-market, higher software reliability, and reduced development costs.

For telecom providers, AI-driven software development is key to maintaining a reliable, scalable, and secure network infrastructure while rolling out new customer-facing services at speed. Data streaming accelerates software development by enabling real-time feedback loops, automated testing, and AI-powered observability—bringing the industry closer to a true “Shift Left” approach.

The Shift Left Architecture in software development means moving testing, security, and quality assurance earlier in the development lifecycle, reducing costly errors and vulnerabilities late in production. Data streaming enables this shift by continuously feeding AI-driven CI/CD pipelines with real-time insights, allowing developers to detect issues earlier, optimize network performance, and iterate faster on new services.

A relevant AI-powered automation example comes from the “GenAI for Development vs. Visual Coding” article, which discusses how automation is shifting from traditional code-based development to AI-assisted software engineering. Instead of manual coding, AI-driven workflows help telcos streamline DevOps, automate CI/CD pipelines, and enhance software quality in real time.

For telecom providers, this transformation means proactive issue detection, faster rollouts of network upgrades, and automated AI-driven security monitoring—all powered by real-time data streaming and a Shift Left mindset.

Data Streaming as the Ultimate Competitive Advantage for Telcos

Across all five of McKinsey’s key trends, real-time data streaming is the backbone of transformation. Whether optimizing IT efficiency, reducing emissions, unlocking 6G’s potential, enabling generative AI and Agentic AI, or accelerating software development, streaming technologies provide the agility and intelligence businesses need to win in 2025 and beyond.

The path forward isn’t just about adopting AI or cloud-native infrastructure—it’s about embracing real-time, event-driven architectures to drive innovation at scale.

As organizations take bold steps to lead the future, those who harness the power of data streaming will emerge as the industry’s true pioneers.

The post How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025 appeared first on Kai Waehner.

Data Streaming as the Technical Foundation for a B2B Marketplace

Kai Waehner — Wed, 05 Mar 2025 06:26:59 +0000

A B2B data marketplace is a groundbreaking platform enabling businesses to exchange, monetize, and use data in real time. Beyond the basic promise of data sharing, these marketplaces are evolving into self-service platforms with features such as subscription management, usage-based billing, and secure data monetization. This post explores the core technical and functional aspects of building a data marketplace for subscription commerce using data streaming technologies like Apache Kafka. Drawing inspiration from real-world implementations like AppDirect, the post examines how these capabilities translate into a robust and scalable architecture.

Subscription Commerce with a Digital Marketplace

Subscription commerce refers to business models where customers pay recurring fees—monthly, annually, or usage-based—for access to products or services, such as SaaS, streaming platforms, or subscription boxes.

Digital marketplaces are online platforms where multiple vendors can sell their products or services to customers, often incorporating features like catalog management, payment processing, and partner integrations.

Together, subscription commerce and digital marketplaces enable businesses to monetize recurring offerings efficiently, manage customer relationships, and scale through multi-vendor ecosystems. These solutions enables organizations to sell own or third-party recurring technology services through a white-labeled marketplace, or streamline procurement with an internal IT marketplace to manage and acquire services. The platform empowers digital growth for businesses of all sizes across direct and indirect go-to-market channels.

The Competitive Landscape for Subscription Commerce

The subscription commerce and digital marketplace space includes several prominent players offering specialized solutions.

Zuora leads in enterprise-grade subscription billing and revenue management, while Chargebee and Recurly focus on flexible billing and automation for SaaS and SMBs. Paddle provides global payment and subscription management tailored to SaaS businesses. AppDirect stands out for enabling SaaS providers and enterprises to manage subscriptions, monetize offerings, and build partner ecosystems through a unified platform.

For marketplace platforms, CloudBlue (from Ingram Micro) enables as-a-service ecosystems for telcos and cloud providers, and Mirakl excels at building enterprise-level B2B and B2C marketplaces.

Solutions like ChannelAdvisor and Vendasta cater to resellers and localized businesses with marketplace and subscription tools. Each platform offers unique capabilities, making the choice dependent on specific needs like scalability, industry focus, and integration requirements.

What Makes a B2B Data Marketplace Technically Unique?

A data marketplace is more than a repository; it is a dynamic, decentralized platform that enables the continuous exchange of data streams across organizations. Its key distinguishing features include:

Real-Time Data Sharing: Enables instantaneous exchange and consumption of data streams.
Decentralized Design: Avoids reliance on centralized data hubs, reducing latency and risk of single points of failure.
Fine-Grained Access Control: Ensures secure and compliant data sharing.
Self-Service Capabilities: Simplifies the discovery and consumption of data through APIs and portals.
Usage-Based Billing and Monetization: Tracks data usage in real time to enable flexible pricing models.

These characteristics require a scalable, fault-tolerant, and real-time data processing backbone. Enter data streaming with the de facto standard Apache Kafka.

Data Streaming as the Backbone of a B2B Data Marketplace

At the heart of a B2B data marketplace lies data streaming, a technology paradigm enabling continuous data flow and processing. Kafka’s publish-subscribe architecture aligns seamlessly with the marketplace model, where data producers publish streams that consumers can subscribe to in real time.

Why Apache Kafka for a Data Marketplace?

A data streaming platform uniquely combines different characteristics and capabilities:

Scalability and Fault Tolerance: Kafka’s distributed architecture allows for handling large volumes of data streams, ensuring high availability even during failures.
Event-Driven Design: Kafka provides a natural fit for event-driven architectures, where data exchanges trigger workflows, such as subscription activation or billing.
Stream Processing with Kafka Streams or ksqlDB: Real-time transformation, filtering, and enrichment of data streams can be performed natively, ensuring the data is actionable as it flows.
Integration with Ecosystem: Kafka’s connectors enable seamless integration with external systems such as billing platforms, monitoring tools, and data lakes.
Security and Compliance: Built-in features like TLS encryption, SASL authentication, and fine-grained ACLs ensure the marketplace adheres to strict security standards.

I wrote a separate article that explores how an Event-driven Architecture (EDA) and Apache Kafka build the foundation of a streaming data exchange.

Architecture Overview

Modern architectures for data marketplaces are often inspired by Domain-Driven Design (DDD), microservices, and the principles of a data mesh.

Domain-Driven Design helps structure the platform around distinct business domains, ensuring each part of the marketplace aligns with its core functionality, such as subscription management or billing.
Microservices decompose the marketplace into independently deployable services, promoting scalability and modularity.
A Data mesh decentralizes data ownership, empowering individual teams or providers to manage and share their datasets while adhering to shared governance policies.

Together, these principles create a flexible, scalable, and business-aligned architecture. A high-level architecture for such a marketplace involves:

Data Providers: Publish real-time data streams to Kafka Topics. Use Kafka Connect to ingest data from external sources.
Data Marketplace Platform: A front-end portal backed by Kafka for subscription management, search, and discovery. Kafka Streams or Apache Flink for real-time processing (e.g., billing, transformation). Integration with billing systems, identity management, and analytics platforms.
Data Consumers: Subscribe to Kafka Topics, consuming data tailored to their needs. Integrate the marketplace streams into their own analytics or operational workflows.

A data streaming platoform enable simple and secure data sharing within or across organizations with chargeback capabilities built-in to build cost APIs and new business models. The following is an implementation leveraging Confluent’s Stream Sharing functionality in Confluent Cloud:

Source: Confluent

Data Marketplace Features and Their Technical Implementation

A robust B2B data marketplace should offer the following vendor-agnostic features:

Self-Service Data Discovery

Functionality: Allows users to browse available datasets or streams, explore metadata, and request subscriptions.
Technical Implementation: Build a front-end portal or API leveraging Kafka’s schema registry for metadata management. Use Kafka Topics to represent each data stream or dataset in the data mesh.

Real-Time Subscription Management

Functionality: Enables users to subscribe to data streams with customizable preferences, such as data filters or frequency of updates.
Technical Implementation: Use Kafka’s consumer groups to manage subscriptions. Implement filtering logic with Kafka Streams or ksqlDB to tailor streams to user preferences.

Usage-Based Billing

Functionality: Tracks the volume or type of data consumed by each user and generates invoices dynamically.
Technical Implementation: Use Kafka’s log retention and monitoring tools to track data consumption. Integrate with a billing engine via Kafka Connect or RESTful APIs for real-time invoice generation.

Functionality: Facilitates revenue sharing between data providers and marketplace operators.
Technical Implementation: Build a revenue-sharing logic layer using Kafka Streams or Apache Flink, processing data usage metrics. Store provider-specific pricing models in a database connected via Kafka Connect.

Compliance and Data Governance

Functionality: Ensures data sharing complies with regulations (e.g., GDPR, HIPAA) and provides an audit trail.
Technical Implementation: Leverage Kafka’s immutable event log as an auditable record of all data exchanges. Implement data contracts for Kafka Topics with policies, role-based access control (RBAC), and encryption for secure sharing.

Dynamic Pricing Models

Functionality: Enables data providers to offer pricing models based on volume, frequency, or quality of data.
Technical Implementation: Deploy stream processing applications to calculate pricing dynamically. Use APIs to display pricing updates in real time on the marketplace portal.

Marketplace Analytics

Functionality: Offers insights into usage patterns, revenue streams, and operational metrics.
Technical Implementation: Aggregate Kafka stream data into analytics platforms such as Snowflake, Databricks, Elasticsearch or Microsoft Fabri.

Real-World Success Story: AppDirect’s Subscription Commerce Platform Powered by a Data Streaming Platform

AppDirect is a leading subscription commerce platform that helps businesses monetize and manage software, services, and data through a unified digital marketplace. It provides tools for subscription billing, usage tracking, partner management, and revenue sharing, enabling seamless B2B transactions.

Source: AppDirect

AppDirect serves customers across industries such as telecommunications (e.g., Telstra, Deutsche Telekom), technology (e.g., Google, Microsoft), and cloud services, powering ecosystems for software distribution and partner-driven monetization.

The Challenge

AppDirect enables SaaS providers to monetize their offerings, but faced significant challenges in scaling its platform to handle the growing complexity of real-time subscription billing and data flow management.

As the number of vendors and consumers on the platform increased, ensuring accurate, real-time tracking of usage and billing became increasingly difficult. Additionally, the legacy systems struggled to support seamless integration, dynamic pricing models, and real-time updates required for a competitive marketplace experience.

The Solution

AppDirect implemented a data streaming backbone with Apache Kafka leveraging Confluent’s data streaming platform. This enabled:

Real-time billing for subscription services.
Accurate usage tracking and monetization.
Improved scalability with a distributed, event-driven architecture.

The Outcome

90% reduction in time-to-market for new features.
Enhanced customer experience with real-time updates.
Seamless scaling to handle increasing vendor participation and data loads.

Advantages Over Competitors in the Subscription Commerce and Data Marketplace Business

Powered by the event-driven architecture and a data streaming platform, AppDirect distinguishes itself with from competitors in the subscription commerce and data marketplace business:

A unified approach to subscription management, billing, and partner ecosystem enablement.
Strong focus on the telecommunications and technology sectors.
Deep integrations for vendor and reseller ecosystems.

The technical backbone of a B2B data marketplace relies on data streaming to deliver real-time data sharing, scalable subscription management, and secure monetization. Platforms like Apache Kafka and Confluent enable these features through their distributed, event-driven architecture, ensuring resilience, compliance, and operational efficiency.

By implementing these principles, organizations can build a modern, self-service data marketplace that fosters innovation and collaboration. The success of AppDirect highlights the potential of this approach, offering a blueprint for businesses looking to capitalize on the power of data streaming.

Whether you’re a data provider seeking additional revenue streams or a business aiming to harness external insights, a well-designed data marketplace is your gateway to unlocking value in the digital economy.

The post Data Streaming as the Technical Foundation for a B2B Marketplace appeared first on Kai Waehner.

Free Ebook: Data Streaming Use Cases and Industry Success Stories Featuring Apache Kafka and Flink

Kai Waehner — Wed, 05 Feb 2025 15:42:37 +0000

In today’s fast-moving world, real-time data isn’t just an advantage—it’s a necessity. Over the past decade, data streaming has transformed from niche tech to a must-have capability, empowering businesses to gain real-time insights, streamline operations, and uncover new opportunities.

Having spent 7+ years at Confluent, I’ve been privileged to witness and contribute to this evolution firsthand. From our beginnings as a small Silicon Valley startup to becoming a global leader with a ~$1 billion USD run rate, Confluent has been at the forefront of this transformation. Along the way, I’ve seen countless organizations unlock the power of real-time data, and it’s been incredible to watch Apache Kafka and Flink evolve into essential tools for innovation.

Now, I’m excited to share my free eBook: “The Ultimate Data Streaming Guide: Concepts, Use Cases, Industry Stories”, which I wrote with my colleague Evi Schneider over the past few months. Inside, you’ll find:

Core Concepts: Understand why data streaming is reshaping industries.
Industry Use Cases: Explore how leaders in automotive, finance, retail, and beyond are leveraging real-time data.
Customer Success Stories: Learn from companies like BMW and others leading the way with Apache Kafka and Flink.
Building a Data Streaming Organization: Best practices and real lief examples from our living and breathing customer communities

TL;DR: Download the new, free ebook “THE ULTIMATE DATA STREAMING GUIDE – Concepts, Use Cases, Industry Stories” to learn about data streaming use cases and customer stories with Apache Kafka and Flink across all industries.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

Chapter 1: Data Streaming with Apache Kafka and Flink in an Event-driven Architecture

Data streaming is the continuous flow of data in motion to allow businesses processing and acting on information as events occur. Unlike traditional batch processing systems, which collect and analyze data at intervals, data streaming enables real-time data processing and decision-making, leading to faster insights and more agile operations.

At the heart of data streaming are Apache Kafka and Apache Flink, two open-source technologies that have become the de facto standards in the field.

Apache Kafka: Originally developed as a distributed event-streaming platform, Kafka has evolved into the backbone for real-time data pipelines and business applications. It supports high throughput and scalability, enabling organizations to collect, store, and distribute events across applications and systems seamlessly for analytical and operational workloads.
Apache Flink: As a stream processing framework, Flink complements Kafka by providing powerful capabilities for real-time analytics and complex event processing. Flink’s stateful processing and advanced features make it ideal for applications such as fraud detection, predictive maintenance, and personalization.

The first book chapter explores the transformative power of event-driven architecture in addressing the challenges of fragmented, batch-oriented data systems. It highlights the value of real-time data streaming technologies, such as Apache Kafka and Flink, in enabling operational efficiency, actionable insights, and forward-thinking architectures like the “Shift Left Architecture” to empower businesses with faster, more reliable decision-making.

Chapter 2: Open Source, Cloud and AI: The Winning Combination

Kafka and Flink’s open-source nature offers unparalleled flexibility and innovation. Their vibrant communities ensure the constant evolution and adoption of cutting-edge technologies. Companies often deploy these platforms in combination with cloud-native services like Confluent Cloud, a fully managed Kafka-based offering, or Apache Flink on major cloud providers, simplifying deployment and operations.

This synergy between open source, cloud, and AI ensures businesses can scale while maintaining agility and reducing operational complexity. From hybrid cloud setups to edge computing, data streaming platforms powered by Kafka and Flink are ready to meet diverse requirements.

The second chapter of the free ebook walks through diverse technical use cases for data streaming, showcasing its critical role in enabling modern data products, IT modernization, observability, and hybrid/multi-cloud strategies. It highlights how event-driven microservices, IoT, AI, and data governance leverage streaming to ensure scalability, accuracy, and compliance while also driving innovation in areas like ESG reporting and continuous data sharing.

Chapter 3: Use Cases: Driving Business Value Across Industries

Data streaming addresses a broad range of business challenges and use cases. Here are some examples of how organizations are leveraging Kafka and Flink to deliver business value:

Let’s look at a few case studies explored in the free ebook on how data streaming is transforming businesses:

BMW Group (Automotive & Manufacturing): By implementing Kafka and Flink, BMW Group has reduced downtime, enhanced manufacturing efficiency, and introduced real-time telemetry features in its vehicles.
Migros (Retail): One of the leading Swiss retailers uses Kafka to personalize customer journeys and optimize stock levels, ensuring timely delivery of products.
Erste Group (Financial Services): With data streaming, this European bank has built robust fraud detection systems, improving customer trust and reducing financial risks.
Schiphol Airport (Travel & Logistics): Using Kafka, the Dutch airport integrates data from various systems to optimize passenger flow and improve overall travel experiences.

These stories underline the versatility and transformative potential of data streaming across industries. Chapter 3 of the book presents a comprehensive collection of industry success stories demonstrating the transformative impact of data streaming across sectors such as financial services, manufacturing, retail, telecommunications, gaming, government, energy, transportation, insurance, healthcare, and technology.

Each scenario highlights real-world implementations, showcasing how Apache Kafka and related technologies enable real-time operations, enhanced efficiency, innovation, and compliance in diverse use cases like fraud detection, predictive maintenance, smart grids, loyalty programs, and medical diagnostics.

You will also be able to navigate yourself through the different horizontal use cases across industries and the concrete real-life examples by an interactive matrix that you will find at the beginning of the book.

Chapter 4: Data Streaming Organization and (Internal) Community Building

The shift to real-time data isn’t just a technological upgrade; it’s a strategic imperative. By enabling real-time insights, Kafka and Flink empower organizations to:

Enhance Customer Experiences: Personalized recommendations and real-time support increase customer satisfaction.
Improve Operational Efficiency: Proactive maintenance and seamless integrations reduce downtime and costs.
Drive Innovation: Real-time data powers AI and machine learning use cases, fostering innovation.

Chapter 4 explores the process of building and positioning a data streaming organization, emphasizing the importance of fostering a strong internal community. It outlines key drivers, tactics, and strategies for developing and sustaining engagement within a data streaming community to ensure long-term success and innovation.

Start Your Data Streaming Journey with my Free Ebook about Use Cases and Industry Success Stories

Data streaming, powered by technologies like Apache Kafka and Apache Flink, redefines how businesses operate and innovate. Having spent over seven years at Confluent, I’ve had the privilege to see firsthand how real-time data transforms organizations—from driving efficiency to uncovering entirely new opportunities.

Whether you’re an IT leader navigating modernization or a business executive aiming to improve ROI, understanding these tools is essential. That’s why Evi Schneider and I created The Ultimate Data Streaming Guide: Concepts, Use Cases, Industry Stories. This free ebook is packed with real-world examples and actionable insights to help you harness the full potential of data streaming.

Let this guide be your roadmap to revolutionizing your organization’s data strategy with real-time insights. Apache Kafka and Apache Flink are more than technologies—they’re catalysts for innovation and long-term success.

[Download the ebook here] to get started today!

Evi and I are already working on the second edition, which will feature even more customer stories and insights. If you’re interested in contributing or learning more, don’t hesitate to reach out—I’d love to hear from you!

And let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Free Ebook: Data Streaming Use Cases and Industry Success Stories Featuring Apache Kafka and Flink appeared first on Kai Waehner.

Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink

Kai Waehner — Sat, 18 Jan 2025 11:33:44 +0000

The cloud revolution has transformed how businesses deploy, scale, and manage data streaming solutions. While Software-as-a-Service (SaaS) and Platform-as-a-Service (PaaS) cloud models are often used interchangeably in marketing, their distinctions have significant implications for operational efficiency, cost, and scalability. In the context of data streaming around Apache Kafka and Flink, understanding these differences and recognizing common misconceptions—such as the overuse of the term “serverless”—can help you make an informed decision. Additionally, the emergence of Bring Your Own Cloud (BYOC) offers yet another option, providing organizations with enhanced control and flexibility in their cloud environments.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

The Data Streaming Landscape: Kafka, Flink, Cloud, and More

The Data Streaming Landscape 2025 highlights how data streaming has evolved into a key software category, moving from niche adoption to a fundamental part of modern data architecture.

With frameworks like Apache Kafka and Flink at its core, the landscape now spans self-managed, BYOC, and fully managed SaaS solutions, driving real-time use cases, unifying transactional and analytical workloads, and enabling innovation across industries.

If you’re still grappling with the fundamentals of stream processing, this article is a must-read: “Stateless vs. Stateful Stream Processing with Kafka Streams and Apache Flink“.

What is SaaS in Data Streaming?

SaaS data streaming solutions are fully managed services where the provider handles all aspects of deployment, maintenance, scaling, and updates. SaaS offerings are designed for ease of use, providing a serverless experience where developers focus solely on building applications rather than managing infrastructure.

Characteristics of SaaS in Data Streaming

Serverless Architecture: Resources scale automatically based on demand. True SaaS solutions eliminate the need to provision or manage servers.
Low Operational Overhead: The provider manages hardware, software, and runtime configurations, including upgrades and security patches.
Pay-As-You-Go Pricing: Consumption-based pricing aligns costs directly with usage, reducing waste during low-demand periods.
Rapid Deployment: SaaS enables users to start processing streams within minutes, accelerating time-to-value.

Examples of SaaS in Data Streaming:

Confluent Cloud: A fully managed Kafka platform offering serverless scaling, multi-tenancy, and a broad feature set for both stateless and stateful processing.
Amazon Kinesis Data Analytics: A managed service for real-time analytics with automatic scaling.

What is PaaS in Data Streaming?

PaaS offerings sit between fully managed SaaS and infrastructure-as-a-service (IaaS). These solutions provide a platform to deploy and manage applications but still require significant user involvement for infrastructure management.

Characteristics of PaaS in Data Streaming

Partial Management: The provider offers tools and frameworks, but users must manage servers, clusters, and scaling policies.
Manual Configuration: Deployment involves provisioning VMs or containers, tuning parameters, and monitoring resource usage.
Complex Scaling: Scaling is not always automatic; users may need to adjust resource allocation based on workload changes.
Higher Overhead: PaaS requires more expertise and operational involvement, making it less accessible to teams without dedicated DevOps resources.

Examples of PaaS in Data Streaming (Kafka, Flink)

PaaS offerings in data streaming, while simplifying some infrastructure tasks, still require significant user involvement compared to fully serverless SaaS solutions. Below are three common examples, along with their benefits and pain points compared to serverless SaaS:

Apache Flink (Self-Managed on Kubernetes Cloud Service like EKS, AKS or GKE)
- Benefits: Full control over deployment and infrastructure customization.
- Pain Points: High operational overhead for managing Kubernetes clusters, manual scaling, and complex resource tuning.
Amazon Managed Service for Apache Flink (Amazon MSF)
- Benefits: Simplifies infrastructure management and integrates with some other AWS services.
- Pain Points: Users still handle job configuration, scaling optimization, and monitoring, making it less user-friendly than serverless SaaS solutions.
Amazon MSK (Managed Streaming for Apache Kafka)
- Benefits: Eases Kafka cluster maintenance and integrates with the AWS ecosystem.
- Pain Points: Requires users to design and manage producers/consumers, manually configure scaling, and handle monitoring responsibilities. MSK also excludes support for Kafka if you have any operational issues with the Kafka piece of the infrastructure.

SaaS vs. PaaS: Key Differences

SaaS and PaaS differ in the level of management and user responsibility, with SaaS offering fully managed services for simplicity and PaaS requiring more user involvement for customization and control.

Feature	SaaS	PaaS
Infrastructure	Fully managed by the provider	Partially managed; user controls clusters
Scaling	Automatic and server less	Manual or semi-automatic scaling
Deployment Speed	Immediate, ready to use	Slower; requires configuration
Operational Complexity	Minimal	Moderate to high
Cost Model	Consumption-based, no idle costs	May incur idle resource costs

The big benefit of PaaS over SaaS is greater flexibility and control, allowing users to customize the platform, integrate with specific infrastructure, and optimize configurations to meet unique business or technical requirements. This level of control is often essential for organizations with strict compliance, security, or data sovereignty requirements.

SaaS is NOT Always Better than PaaS!

Be careful: The limitations and pain points of PaaS do NOT mean that SaaS is always better.

A concrete example: Amazon MSK Serverless simplifies Apache Kafka operations with automated scaling and infrastructure management but comes with significant limitations, including the lack of fully-managed connectors, advanced data governance tools, and native multi-language client support.

Amazon MSK also excludes Kafka engine support, leading to potential operational risks and cost unpredictability, especially when integrating with additional AWS services for a complete data streaming pipeline. I explored these challenges in more detail in my article “When NOT to choose Amazon MSK Serverless for Apache Kafka?“.

Bring Your Own Cloud (BYOC) as Alternative to PaaS

BYOC (Bring Your Own Cloud) offers a middle ground between fully managed SaaS and self-managed PaaS solutions, allowing organizations to host applications in their own VPCs.

BYOC provides enhanced control, security, and compliance while reducing operational complexity. This makes BYOC a strong alternative to PaaS for companies with strict regulatory or cost requirements.

As an example, here are the options of Confluent for deploying the data streaming platform: Serverless Confluent Cloud, Self-managed Confluent Platform (some consider this a PaaS if you leverage Confluent’s Kubernetes Operator and other automation / DevOps tooling) and WarpStream as BYOC offering:

Source: Confluent

While BYOC complements SaaS and PaaS, it can be a better choice when fully managed solutions don’t align with specific business needs. I wrote a detailed article about this topic: “Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud)“.

“Serverless” Claims: Don’t Trust the Marketing

Many cloud data streaming solutions are marketed as “serverless,” but this term is often misused. A truly serverless solution should:

Abstract Infrastructure: Users should never need to worry about provisioning, upgrading, or cluster sizing.
Scale Transparently: Resources should scale up or down automatically based on workload.
Eliminate Idle Costs: There should be no cost for unused capacity.

However, many products marketed as serverless still require some degree of infrastructure management or provisioning, making them closer to PaaS. For example:

A so-called “serverless” PaaS solution may still require setting initial cluster sizes or monitoring node health.
Some products charge for pre-provisioned capacity, regardless of actual usage.

Do Your Own Research

When evaluating data streaming solutions, dive into the technical documentation and ask pointed questions:

Does the solution truly abstract infrastructure management?
Are scaling policies automatic, or do they require manual configuration?
Is there a minimum cost even during idle periods?

By scrutinizing these factors, you can avoid falling for misleading “serverless” claims and choose a solution that genuinely meets your needs.

Choosing the Right Model for Your Data Streaming Business: SaaS, PaaS, or BYOC

When adopting a data streaming platform, selecting the right model is crucial for aligning technology with your business strategy:

Use SaaS (Software as a Service) if you prioritize ease of use, rapid deployment, and operational simplicity. SaaS is ideal for teams looking to focus entirely on application development without worrying about infrastructure.
Use PaaS (Platform as a Service) if you require deep customization, control over resource allocation, or have unique workloads that SaaS offerings cannot address.
Use BYOC (Bring Your Own Cloud) if your organization demands full control over its data but sees benefits in fully managed services. BYOC enables you to run the data plane within your cloud VPC, ensuring compliance, security, and architectural flexibility while leveraging SaaS functionality for the control plane .

In the rapidly evolving world of data streaming around Apache Kafka and Flink, SaaS data streaming platforms like Confluent Cloud provide the best of both worlds: the advanced features of tools like Apache Kafka and Flink, combined with the simplicity of a fully managed serverless experience. Whether you’re handling stateless stream processing or complex stateful analytics, SaaS ensures you’re scaling efficiently without operational headaches.

What deployment option do you use today for Kafka and Flink? Any changes planned in the future? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink appeared first on Kai Waehner.

How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)

Kai Waehner — Sat, 12 Oct 2024 06:58:00 +0000

In today’s data-driven world, understanding data at rest versus data in motion is crucial for businesses. Data streaming frameworks like Apache Kafka and Apache Flink enable real-time data processing, offering quick insights and seamless system integration. They are ideal for applications that require immediate responses and handle transactional workloads. Meanwhile, lakehouses like Snowflake, Databricks, and Microsoft Fabric excel in long-term data storage and detailed analysis, perfect for reports and AI training. By leveraging both data streaming and lakehouse systems, businesses can effectively meet both short-term and long-term data needs. This blog post explores how these technologies complement each other in enterprise architecture.

This is part two of a blog series about Microsoft Fabric and its relation to other data platforms on the Azure cloud:

What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and how it Competes with Snowflake and Databricks
How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.)
When to Choose Apache Kafka vs. Azure Event Hubs vs. Confluent Cloud for a Microsoft Fabric Lakehouse

Subscribe to my newsletter to get an email about a new blog post every few weeks.

Data at Rest (Lakehouse) vs. Data in Motion (Data Streaming)

Data streaming technologies like Apache Kafka and Apache Flink enable continuous data processing while the data is in motion in an event-driven architecture. Data streaming enables immediate insights and seamless integration of data across systems. Kafka provides a robust real-time messaging and persistence platform, while Flink excels in low-latency stream processing, making them ideal for dynamic, stateful applications. A data streaming platform supports operational/transactional and analytical use cases.

Data lakes and data warehouses store data at rest before processing the data. The platforms are optimized for batch processing and long-term analytics, including AI/ML use cases such as model training. Some components provide near real-time capabilities, e.g. data ingestion or dashboards. Data lakes offer scalable, flexible storage for raw data, and data warehouses provide structured, high-performance environments for business intelligence and reporting, complementing the real-time capabilities of streaming technologies. Most leading data platforms provide a unified combination of data lake and data warehouse called lakehouse. Lakehouses are almost exclusively used for analytical workloads as they typically lack the SLAs and tight latency required for operational/transactional use cases.

Data streaming and lakehouses are complementary, with some overlaps but different sweet spots. If you want to learn more, check out these articles:

I also created a short ten minute video explaining the above concepts:

Let’s explore why data streaming and a lakehouse like Microsoft Fabric are complementary (with a few overlaps). I explained in the first blog of this series what Microsoft Fabric is. To understand the differences, it is important to understand what a data streaming platform really is.

Data Streaming in an Event-driven Architecture with Apache Kafka and Flink

There is a lot of confusion in the market. For instance, some folks still compare Apache Kafka to a message broker like RabbitMQ or IBM MQ. I mainly focus on Apache Kafka and Apache Flink as these are the de facto standards for data streaming across industries. Before talking about technologies and solutions, we need to start with the concept of an event-driven architecture as the foundation of data streaming.

Event-driven Architecture for Operational and Analytical Workloads

In today’s digital world, getting real-time data quickly is more important than ever. Traditional methods that process data in batches or via request-response APIs often cannot keep up when you need immediate insights.

Event-driven architecture offers a different approach by focusing on handling events – like transactions or user actions – as they happen. One of the key benefits of an event-driven architecture is its ability to decouple systems, meaning that different parts of a system can work independently. This makes it easier to scale and adapt to changes. An event-driven architecture excels in handling both operational and analytical workloads.

Source: Confluent

For operational tasks, the event-driven architecture enables real-time data processing, automating processes, enhancing customer experiences, and boosting efficiency. In e-commerce, for example, an event-driven system can instantly update inventory, trigger marketing campaigns, and detect fraud.

On the analytical side, the event-driven architecture allows organizations to derive insights from data as it flows, enabling real-time analytics and trend identification without the delays of batch processing. This is invaluable in sectors like finance and healthcare, where timely insights are crucial.

Building an event-driven architecture with data streaming technologies like Apache Kafka and Apache Flink enhances its potential. These platforms provide the infrastructure for high-throughput, low-latency data streams, enabling scalable and resilient event-driven systems.

Apache Kafka – The De Facto Standard for Event-driven Messaging and Integration

Apache Kafka has become the go-to platform for event-driven messaging and integration, transforming how organizations manage data in motion. Developed by LinkedIn and open-sourced, Kafka is a distributed streaming platform adept at handling real-time data feeds. Over 150,000 organizations use Kafka in the meantime.

Kafka’s architecture is based on a distributed commit log, ensuring data durability and consistency. It decouples data producers and consumers, allowing for flexible and scalable data architectures. Producers publish data to topics, and consumers subscribe independently, facilitating system evolution.

Beyond messaging, Kafka serves as a robust integration platform, connecting diverse systems and enabling seamless data flow. Its ecosystem of connectors allows integration with databases, cloud services, and legacy systems. This helps organizations in modernizing their data infrastructure step-by-step.

Kafka’s stream processing capabilities, through Kafka Streams and integration with Apache Flink, further enhance the value of the streaming data pipelines. Kafka Streams allows real-time data processing within Kafka to enable complex transformations and enrichments, driving data-driven innovation.

Apache Flink – The De Facto Standard for Stream Processing

Apache Flink stands out as the leading framework for stream processing. It offers a versatile platform for streaming ETL and stateful business applications. Flink processes data streams with low latency and high throughput, suitable for diverse use cases.

Source: Apache

Flink provides a unified programming model for batch and stream processing that allows developers to use the same API for real-time transactional or analytical batch tasks. This flexibility is a significant advantage as it enables varied data processing without separate tools.

A key feature of Flink is its stateful stream processing. This is crucial for maintaining state across events in real-time applications. Flink’s state management ensures accurate processing in complex scenarios. In contrast to many other stream processing solutions, Flink can do stateful processing even at an extreme scale (i.e., with a throughput of gigabytes per second).

Flink’s event time processing capabilities handle out-of-order or delayed events and ensure consistent results. Developers can define windows and triggers based on event timestamps, accommodating late-arriving data.

Apache Flink supports multiple programming languages, including Java, Python, and SQL, offering developers the flexibility to use their preferred language for building stream processing applications. This is a key differentiator to other stream processing engines, such as Kafka Streams or KSQL.

Kafka and Flink are a “Match Made in Heaven” for Data Streaming

The integration of Flink with Apache Kafka enhances its capabilities. Kafka serves as a reliable data source for Flink to enable seamless real-time data ingestion and processing. With Kafka’s persistent commit log, you can travel back in time and replay historical data in guaranteed ordering for analytical use cases. This combination supports high-volume, low-latency data pipelines, unlocking transactional real-time scenarios and batch analytics.

In summary, Apache Flink’s robust stream processing, combined with Apache Kafka, offers a powerful solution for organizations seeking to leverage real-time data. Whether for operational tasks, real-time analytics, or complex event processing (CEP), Flink provides the necessary flexibility and performance for data-driven innovation.

Microsoft Fabric and Data Streaming (Kafka, Flink): Complementary Forces, NOT Competitors

In the growing landscape of data management, it’s crucial to understand the complementary roles of Microsoft Fabric and data streaming technologies. While some may perceive these technologies as competitors, they actually serve distinct yet interconnected purposes that enhance an organization’s data strategy. And keep in mind that Microsoft Fabric is not just an offering for Azure cloud. Hybrid edge scenarios in the IoT space are perfect for Microsoft Fabric and data streaming together.

Microsoft Fabric’s Streaming Ingestion: A Common Feature Among Lakehouses

Microsoft Fabric, like other modern lakehouse platforms such as Snowflake and Databricks, offers streaming ingestion capabilities. This feature is essential for handling near real-time data flows. It allows organizations to capture and process data as it arrives in the lakehouse. However, it’s important to distinguish between operational and analytical workloads when considering the role of streaming ingestion.

Operational workloads benefit from the immediacy of streaming data, enabling real-time decision-making and process automation. In contrast, analytical workloads often require data to be stored at rest for in-depth analysis and reporting. Microsoft Fabric’s architecture focuses on streaming ingestion into robust storage solutions for analytical purposes.

The Fabric Real Time Intelligence Hub: Understanding Its Capabilities and Limitations

The integration of streaming ingestion into Microsoft Fabric is part of its Real Time Intelligence Hub, which aims to provide a comprehensive platform for managing real-time data. However, beyond the marketing and buzz around Fabric Real Time Intelligence Hub, it’s important to note that it doesn’t operate in true real-time.

Instead, Fabric’s “Real Time Intelligence Hub” uses Spark Streaming jobs to manipulate data, which can introduce some latency. And the infrastructure is not meant for critical SLAs that are required by operational / transactional systems. Additionally, the ingestion process is throttled when using Power BI and other batch analytics tools via an API gateway with a Kafka client.

Microsoft is strong in introducing new names for products or feature for Fabric that are actually just a new brand of existing services. If you find new terms such as “eventhouse” or “event streams feature in the Microsoft Fabric Real-Time Intelligence”, make sure to evaluate if this is really a new component or just some Fabric marketing.

Therefore, despite some overlapping with a data streaming platform, the collaboration between Microsoft Fabric and data streaming vendors like Confluent (Kafka, Flink) underscores the complementary nature of these platforms. By leveraging the strengths of both, organizations can build a robust data infrastructure that supports real-time operations and comprehensive analytics.

In conclusion, Microsoft Fabric and data streaming technologies such as Kafka and Flink are not competitors but complementary tools that, when used together, can significantly enhance an organization’s ability to manage and analyze data. By understanding the distinct roles each plays, businesses can create a more agile and responsive data strategy that meets both operational and analytical needs.

Enterprise Architecture with Data Streaming like Kafka / Flink and Lakehouse(s) like Microsoft Fabric

In the modern enterprise architecture landscape, data streaming and lakehouse platforms are pivotal in creating a robust and flexible data ecosystem. Data streaming technologies enable continuous data ingestion and processing for operational and analytical use cases.

Lakehouse platforms, like Microsoft Fabric, Snowflake and Databricks, provide a unified architecture that combines the best of data lakes and data warehouses, offering scalable storage and advanced analytics capabilities.

Together, these technologies empower businesses to handle both operational and analytical workloads efficiently, breaking down data silos and fostering a data-driven culture. By integrating data streaming with lakehouse architectures, enterprises can achieve seamless data flow and comprehensive insights across their operations.

Reverse ETL is an Anti-Pattern

Reverse ETL is the process of moving data from a data store at rest back into operational systems. It is often considered an anti-pattern in modern data architecture. This approach can lead to data inconsistencies, increased complexity, and higher maintenance costs, as it essentially reverses the natural flow of data in motion. Do NOT store data in MIcrosoft Fabric Lakehouse just to reverse it later into other operational systems!

Instead of relying on reverse ETL, organizations should focus on building real-time data pipelines that enable direct integration between data sources and operational systems. By leveraging an event-driven architecture and data streaming technologies, businesses can ensure that data is consistently updated and available where it’s needed most. This approach not only simplifies data management, but also enhances the accuracy and timeliness of insights.

Apache Iceberg as the De Facto Standard for an Open Table Format – Store Once, Analyze with any Tool

Apache Iceberg has emerged as the de facto standard for an open table format. It offers a opportunity for storing data once in an object store like Amazon S3 and analyzing data across various tools. With its ability to handle large-scale datasets and support ACID transactions, Iceberg provides a reliable and efficient way to manage data in a lakehouse environment.

Organizations can use their preferred analytics and processing tools without being locked into a specific vendor. This flexibility is crucial for businesses looking to maximize their data investments and adapt to changing technological landscapes. By adopting Apache Iceberg together with data streaming, enterprises can ensure data consistency and accessibility across all business units to drive better data-quality, insights and decision-making.

Shift Left Architecture to Support Operational and Analytical Workloads with an Event-driven Architecture

Traditionally, many organizations use data streaming with Kafka as a dumb pipeline to ingest all raw data into a data lake. The consequences are high compute cost for multiple (re-)processing of the raw data, inconsistencies across business units, and slow time to market for new applications.

The Shift Left Architecture is a forward-thinking approach that integrates operational and analytical workloads within an event-driven architecture. By shifting data processing closer to the source, this architecture enables real-time data ingestion and analysis, improving data quality for lakehouse ingestion, reducing latency and improving responsiveness.

Event-driven architectures, powered by technologies like Apache Kafka and Flink, facilitate the seamless flow and processing of data across systems. Shift Left ensures that both operational and analytical needs are met. This approach not only enhances the agility of data-driven applications, but also supports continuous improvement and innovation. By adopting a Shift Left Architecture, organizations can streamline their data processes, improve efficiency, and gain a competitive edge in the market.

Example: Confluent (Data Streaming) + Microsoft Fabric (Lakehouse) + Snowflake (Another Lakehouse)

An example of integrating data streaming and lakehouse technologies is the combination of Confluent, Microsoft Fabric, and Snowflake.

Confluent, built on Apache Kafka and Flink, provides a robust platform for real-time data streaming, enabling organizations to integrate with operational and analytical workloads.

Microsoft Fabric and Snowflake, both lakehouse platforms, offer scalable storage and advanced analytics capabilities to allow businesses performing in-depth analysis and reporting on historical data, near real-time analytics and AI model training.

Apache Iceberg enables storing data once and connects any analytical engine to the data, including lakehouses such as Microsoft Fabric or Snowflake, and unified batch and streaming frameworks such as Apache Flink. Iceberg improves the overall data quality for data sharing, reduces storage cost and enables a much faster rollout of new analytical applications.

By leveraging Confluent for data streaming and integrating it with Microsoft Fabric and Snowflake, organizations can create a comprehensive data architecture that supports both real-time operations and long-term analytics. This synergy not only enhances data accessibility and consistency but also empowers businesses to make data-driven decisions with confidence.

Microsoft Fabric Lakehouse + Data Streaming (Kafka, Flink) = Match Made in Heaven

In conclusion, the synergy between Microsoft Fabric and data streaming technologies like Apache Kafka and Apache Flink creates a powerful combination for modern data management. While Microsoft Fabric excels in providing robust analytics and storage capabilities, data streaming platforms offer real-time data processing and integration, ensuring that businesses can respond swiftly to operational demands.

By leveraging both technologies together, organizations can build a comprehensive data architecture that supports both immediate and long-term needs, enhancing their ability to make informed, data-driven decisions. This complementary relationship not only breaks down data silos, but also fosters a more agile and responsive data strategy. As businesses continue to navigate the complexities of data management, understanding and using the strengths of both Microsoft Fabric and data streaming with data streaming vendors like Confluent will be key to achieving a competitive edge.

The Shift Left Architecture, when paired with Apache Iceberg’s open table format, simplifies the integration of data streaming with one or more lakehouses. This combination enhances data quality for all data consumers and significantly reduces overall storage costs.

In part three of this blog series, I will dig deeper into the data streaming alternatives. When to choose open source frameworks such as Apache Kafka and Flink, a leading data streaming platform such as Confluent, or a native Azure service like Event Hubs. Primer: The trade-offs are huge. Do a proper evaluation BEFORE choosing your data streaming solution.

How do you see the combination of a lakehouse like Microsoft Fabric with data streaming? Do you already use both together? And what is your strategy for other data lakes and data warehouses you already have in your enterprise architecture, such as Databricks or Snowflake? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post How Microsoft Fabric Lakehouse Complements Data Streaming (Apache Kafka, Flink, et al.) appeared first on Kai Waehner.

Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud)

Kai Waehner — Thu, 12 Sep 2024 13:43:31 +0000

BYOC (Bring Your Own Cloud) is an emerging deployment model for organizations looking to maintain greater control over their cloud environments. Unlike traditional SaaS models, BYOC allows businesses to host applications within their own VPCs to provide enhanced data privacy, security, and compliance. This approach leverages existing cloud infrastructure. It offers more flexibility for custom configurations, particularly for companies with stringent security needs. In the data streaming sector around Apache Kafka, BYOC is changing how platforms are deployed. Organizations get more control and adaptability for various use cases. But it is clearly NOT the right choice for everyone!

BYOC (Bring Your Own Cloud) – A New Deployment Model for Cloud Infrastructure

BYOC (Bring Your Own Cloud) is a deployment model where organizations choose their preferred cloud infrastructure to host applications or services, rather than using a serverless / fully managed cloud solution selected by a software vendor; typically known as Software as a Service (SaaS). This model gives businesses flexibility to leverage their existing cloud services (like AWS, Google Cloud, Microsoft Azure, or Alibaba) while integrating third-party applications that are compatible with multiple cloud environments.

BYOC helps companies maintain control over their cloud infrastructure, optimize costs, ensure compliance with security standards. BYOC is typically implemented within an organization’s own cloud VPC. Unlike SaaS models, BYOC offers enhanced privacy and compliance by maintaining control over network architecture and data management.

However, BYOC also has some serious drawbacks! The main challenge is scaling a fleet of co-managed clusters running in customer environments with all the reliability expectations of a cloud service. Confluent has shied away from offering a BYOC deployment model for Apache Kafka based on Confluent Platform because doing BYOC at scale requires a different architecture. WarpStream has built this architecture, with a BYOC-native platform that was designed from the ground up to avoid the pitfalls of traditional BYOC.

The Data Streaming Landscape

Data Streaming is a separate software category of data platforms. Many software vendors built their entire businesses around this category. The data streaming landscape shows that most vendors use Kafka or implement its protocol because Apache Kafka has become the de facto standard for data streaming.

New software companies have emerged in this category in the last few years. And several mature players in the data market added support for data streaming in their platforms or cloud service ecosystem. Most software vendors use Kafka for their data streaming platforms. However, there is more than Kafka. Some vendors only use the Kafka protocol (Azure Event Hubs) or utterly different APIs (like Amazon Kinesis).

The following Data Streaming Landscape 2024 summarizes the current status of relevant products and cloud services.

The Data Streaming Landscape evolves. Last year, I added WarpStream as a new entrant into the market. WarpStream uses the Kafka protocol and provides a BYOC offering for Kafka in the cloud. In my next update of the data streaming landscape, I need to do yet another update: WarpStream is now part of Confluent. There are also many other new entrants. Stay tuned for a new “Data Streaming Landscape 2025” in a few weeks (subscribe to my newsletter to stay up-to-date with all the things data streaming).

Confluent Acquisition of WarpStream

Confluent had two product offerings:

Confluent Platform: A self-managed data streaming platform powered by Kafka, Flink, and much more that you can deploy everywhere (on-premise data center, public cloud VPC, edge like factory or retail store, and even stretched across multiple regions or clouds).
Confluent Cloud: A fully managed data streaming platform powered by Kafka, Flink, and much more that you can leverage as a serverless offering in all major public cloud providers (Amazon AWS, Microsoft Azure, Google Cloud Platform).

Why did Confluent acquire WarpStream? Because many customers requested a third deployment option: BYOC for Apache Kafka.

As Jay Kreps described in the acquisition announcement: “Why add another flavor of streaming? After all, we’ve long offered two major form factors–Confluent Cloud, a fully managed serverless offering, and Confluent Platform, a self-managed software offering–why complicate things? Well, our goal is to make data streaming the central nervous system of every company, and to do that we need to make it something that is a great fit for a vast array of use cases and companies.”

Read more details about the acquisition of WarpStream by Confluent in Jay’s blog post: Confluent + WarpStream = Large-Scale Streaming in your Cloud. In summary, WarpStream is not dead. The WarpStream team clarified the status quo and roadmap of this BYOC product for Kafka in its blog post: “WarpStream is Dead, Long Live WarpStream“.

Let’s dig deeper into the three deployment options and their trade-offs.

Deployment Options for Apache Kafka

Apache Kafka can be deployed in three primary ways: self-managed, fully managed/serverless, and BYOC (Bring Your Own Cloud).

In self-managed deployments, organizations handle the entire infrastructure, including setup, maintenance, and scaling. This provides full control but requires significant operational effort.
Fully managed or serverless Kafka is offered by providers like Confluent Cloud or Azure Event Hubs. The service is hosted and managed by a third-party, reducing operational overhead but with limited control over the underlying infrastructure.
BYOC deployments allow organizations to host Kafka within their own cloud VPC. BYOC combines some of the benefits of cloud flexibility with enhanced security and control, while it outsources most of Kafka’s management to specialized vendors.

Confluent’s Kafka Products: Self-Managed Platform vs. BYOC vs. Serverless Cloud

Using the example of Confluent’s product offerings, we can see why there are three product categories for data streaming around Apache Kafka.

There is no silver bullet. Each deployment option for Apache Kafka has its pros and cons. The key differences are related to the trade-offs between “ease of management” and “level of control”.

Source: Confluent

If we go into more detail, we see that different use cases require different configurations, security setups, and levels of control while also focusing on being cost effective and providing the right SLA and latency for each use case.

Trade-Offs of Confluent’s Deployment Options for Apache Kafka

On a high level, you need to figure out if you want or have to managed the data plane(s) and control plane of your data streaming infrastructure:

Source: Confluent

If you follow my blog, you know that a key focus is exploring various use cases, architectures and success stories across all industries. And use cases such as log aggregation or IoT sensor analytics required very different deployment characteristics than an instant payment platform or fraud detection and prevention.

Choose the right Kafka deployment model for your use case. Even within one organization, you will probably need different deployments because of security, data privacy and compliance requirements, but also staying cost efficient for high-volume workloads.

BYOC for Apache Kafka with WarpStream

Self-managed Kafka and fully managed Kafka are pretty well understood in the meantime. However, why is BYOC needed as a third option and how to do it right?

I had plenty of customer conversations across industries. Common feedback is that most organizations have a cloud-first strategy, but many also (have to) stay hybrid for security, latency or cost reasons.

And let’s be clear: If a data streaming project goes to the cloud, fully managed Kafka (and Flink) should always be the first option as it is much easier to manage and operate to focus on fast time to market and business innovation. Having said that, sometimes, security, cost or other reasons require BYOC.

How Is BYOC Implemented in WarpStream?

Let’s explore why WarpStream is an excellent option for Kafka as BYOC deployment and when to use it instead of serverless Kafka in the cloud:

WarpStream provides BYOC, meaning single-tenant service so that a customer has its own “instance” of Kafka (to use the protocol, it is not Apache Kafka under the hood).
However, under the hood, the system still uses cloud-native serverless systems like Amazon S3 for scalability, cost-efficiency and high availability (but the customer does not see this complexity and does not have to care about it).
As a result, the data plane is still customer managed (that’s what they need for security or other reasons), but in contrary to self-managed Kafka, the customer does not need to worry about the complexity under the hood (like rebalancing, rolling upgrades, backups) – that is what S3 and other magic code of the WarpStream service takes over.
The magic is the stateless agents in the customer VPC. It makes this solution scalable and still easy to operate (compared to the self-managed deployment option) while the customer has its own instance.
Many use cases are around lift and shift of existing Kafka deployments (like self-managed Apache Kafka or another vendor like Kafka as part of Cloudera or Red Hat). Some companies want to “lift and shift” and keep the feeling of control they are used to, while still offloading most of the management to the vendor.

I wrote this summary after reading the excellent article of my colleague Jack Vanlightly: BYOC, Not “The Future Of Cloud Services” But A Pillar Of An Everywhere Platform. This article goes into much more technical detail and is a highly recommended read for any architect and developer.

Benefits of WarpStream’s BYOC Implementation for Kafka

Most vendors have dubios BYOC implementations.

For instance, if the vendor needs to access the VPC of the customercheaper than AK self managed because cloud native (zero disks, zero interzone networking fees) and headaches for responsibilities in the case of failures.

WarpStream’s BYOC-native implementation differs from other vendors and provides various benefits because of its novel architecture:

WarpStream does not need access to the customer VPC. The data plane (i.e., the brokers in the customer VPC) are stateless. The metadata/consensus is in the control plane (i.e., the cloud service in the WarpStream VPC).
The architecture solves sovereignty challenges and is a great fit for security and compliance requirements.
The cost of WarpStream’s BYOC offering is cheaper than self-managed Apache Kafka because it is built with cloud-native concepts and technologies in mind (e.g., zero disks and zero interzone networking fees, leveraging cloud object storage such as Amazon S3).
The stateless architecture in the customer VPC makes autoscaling and elasticity very easy to implement/configure.

The Main Drawbacks of BYOC for Apache Kafka

BYOC is an excellent choice if you have specific security, compliance or cost requirements that need this deployment option. However, there are some drawbacks:

The latency is worse than self-managed Kafka or serverless Kafka as WarpStream directly touches the Amazon S3 object storage (in contrast to “normal Kafka”).
Kafka using BYOC is NOT fully managed, like e.g. Confluent Cloud, so you have more efforts to operate it. Also, keep in mind that most Kafka cloud services are NOT serverless but just provision Kafka for you and you still need to operate it.
Additional components of the data streaming platform (such as Kafka Connect connectors and stream processors such as Kafka Streams or Apache Flink) are not part of the BYOC offering (yet). This adds some complexity to operations and development.

Therefore, once again, I recommend to only look at BYOC options for Apache Kafka in the public cloud if a fully managed and serverless data streaming platform does NOT work for you because of cost, security or compliance reasons!

BYOC Complements Self-Managed and Serverless Apache Kafka – But BYOC Should NOT be the First Choice!

BYOC (Bring Your Own Cloud) offers a flexible and powerful deployment model, particularly beneficial for businesses with specific security or compliance needs. By allowing organizations to manage applications within their own cloud VPCs, BYOC combines the advantages of cloud infrastructure control with the flexibility of third-party service integration.

But once again: If a data streaming project goes to the cloud, fully managed Kafka (and Flink) should always be the first option as it is much easier to manage and operate to focus on fast time to market and business innovation. Choose BYOC only if fully managed does not work for you, e.g. because of security requirements.

In the data streaming domain around Apache Kafka, the BYOC model complements existing self-managed and fully managed options. It offers a middle ground that balances ease of operation with enhanced privacy and security. Ultimately, BYOC helps companies tailor their cloud environments to meet diverse and developing business requirements.

What is your deployment option for Apache Kafka? A self-managed deployment in the data center or at the edge? Serverless Cloud with a service such as Confluent Cloud? Or did you (have to) choose BYOC? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud) appeared first on Kai Waehner.

Data Streaming Archives - Kai Waehner

Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker

Business Value of Agentic AI in the Enterprise

Model Context Protocol (MCP) + Agent2Agent (A2A): New Standards for Agentic AI

Why Apache Kafka Is a Better Fit Than an API (HTTP/REST) for A2A and MCP

MCP + Kafka = Open, Flexible Communication

Stream Processing as the Agent’s Companion

Technology Flexibility for Agentic AI Design with Data Contracts

Microservices, Data Products, and Reusability – Agentic AI Is Just One Piece of the Puzzle

Agentic Al Needs Integration with Core Enterprise Systems

Agentic AI Requires Decoupling – Apache Kafka Supports A2A and MCP as an Event Broker

Replacing Legacy Systems, One Step at a Time with Data Streaming: The Strangler Fig Approach

What is the Strangler Fig Design Pattern?

Why Strangler Fig is Better than a Big Bang Migration or Rewrite

Better Than Reverse ETL: A Migration with Data Consistency across Operational and Analytical Applications

Why Data Streaming with Apache Kafka and Flink is an Excellent Fit for the Strangler Fig Pattern

1. True Decoupling of Old and New Systems

2. Real-Time Replication with Persistence

3. Supports Any Technology and Communication Paradigm

4. No Time Pressure – Migrate at Your Own Pace

5. Intelligent Processing in the Data Migration Pipeline

Allianz’s Digital Transformation and Transition to Hybrid Cloud: An IT Modernization Success Story

Event-Driven Innovation: Community and Knowledge Sharing at Allianz

The Future of IT Modernization and Legacy Migrations with Strangler Fig using Data Streaming

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Why Replace Legacy OT Middleware?

Challenges: Proprietary, Monolithic, Expensive

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Apache Flink: The Real-Time Processing Engine for OT Data

Unifying Operational (OT) and Analytical (IT) Workloads

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

Key Benefits of Offloading OT Middleware to Data Streaming

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

How?

Result?

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

How?

Result?

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

How?

Result?

Why This Matters: The Future of OT is Real-Time & Open for Data Sharing

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

CIO Summit: The State of AI and Why Data Streaming is Key for Success

Key Learnings on the State of AI

AI is Still in Its Early Stages – Beware of the Buzz vs. Value

Generative AI vs. Predictive AI – Understanding the Differences

Good Data Quality is Non-Negotiable

Context Matters – AI Needs Real-Time Decision-Making

Automate First, Then Apply AI

ROI Matters – AI Must Drive Business Value

Data Streaming and AI – The Perfect Match

Predictive AI + Data Streaming

Generative AI + Data Streaming

The Future of AI: Agentic AI and the Role of Data Streaming

The Path to AI Success

How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025

1. IT Excellence: Driving Telecom Innovation and Cost Efficiency

How Data Streaming Powers IT Excellence

2. Tackling Telecom Emissions: A Sustainable Future

How Data Streaming Supports Sustainability Efforts

3. Shaping the Future of 6G: Beyond Connectivity

How Data Streaming Powers the 6G Revolution

4. Generative AI: A Profitability Game-Changer for Telcos

How Data Streaming Enhances Gen AI in Telecom

5. AI-Driven Software Development: Faster, Smarter, Better

How Data Streaming Transforms AI-Driven Software Development

Data Streaming as the Ultimate Competitive Advantage for Telcos

Data Streaming as the Technical Foundation for a B2B Marketplace

Subscription Commerce with a Digital Marketplace

The Competitive Landscape for Subscription Commerce

What Makes a B2B Data Marketplace Technically Unique?

Data Streaming as the Backbone of a B2B Data Marketplace