Introduction
Event-driven architecture (EDA) is transforming how systems are built, enabling real-time responsiveness and loose coupling between services. But where do traditional SQL databases fit into this paradigm? There’s a common perception that one must adopt NoSQL or specialized event stores to achieve an event-driven system at scale. In truth, relational databases like MySQL and PostgreSQL can keep up with event-driven architectures – provided we employ the right patterns and technologies. In this comprehensive guide, we’ll explore how SQL databases can serve as the backbone of an event-driven system, covering Change Data Capture (CDC), the Transactional Outbox pattern, CQRS, and how cloud providers support these use cases. We’ll also highlight real-world examples and best practices, offering insights for both hands-on architects and technology decision-makers.
Introduction to Event-Driven Architecture (EDA)
Event-driven architecture is an approach where events – significant changes in state or activity – drive the flow of information between decoupled components. Instead of one component directly calling another (as in a REST API call), a component emits an event (such as “order placed” or “user updated”) and moves on. Other services react to those events asynchronously, which leads to a highly decoupled and scalable system. This style is prevalent in modern microservices architectures and real-time systems, as it improves responsiveness and resilience. For example, a payment service might emit an event when a payment is completed, and independent listener services (for invoicing, shipping, analytics, etc.) will consume that event and act accordingly, without tight integration.
One core challenge in EDA is ensuring reliability and consistency: if one service updates a database and another service needs to know about it, how do we transmit that event without losing data or breaking atomicity? This is where patterns like Change Data Capture (CDC) and the Outbox pattern come into play. These techniques allow a traditional relational database to become a first-class participant in an event-driven system, acting as both the system of record and an event publisher.
Can Relational Databases Work in an Event-Driven World?
At first glance, relational databases (RDBMS) seem at odds with the asynchronous, distributed nature of EDA. Relational systems like MySQL and PostgreSQL have historically been used for synchronous, transactional workloads. Yet, they hold critical business data and maintain strong consistency guarantees (ACID transactions) – qualities we often want to preserve even in a modern architecture. The good news is that SQL databases can indeed be integrated into event-driven workflows. In fact, some of the world’s largest tech companies have successfully leveraged relational databases for real-time event streams:
- Netflix built an internal tool called DBLog to capture changes from MySQL and PostgreSQL databases and stream them to target systems for search and analytics. This CDC pipeline ensures Netflix can propagate data changes (e.g. new movie added or user preferences updated) in sub-second time without heavy batch jobs.
- Airbnb developed a CDC platform called SpinalTap to reliably stream MySQL changes across its systems. This was crucial for keeping listings availability and pricing in sync across their platform in real time.
- Uber likewise created a real-time CDC solution to synchronize data across multiple datastores, with requirements of extreme low latency and zero data loss – again using databases (among other sources) as the point of change.
What makes these integrations possible are patterns that capture database changes as they happen, and deliver them as events. Two fundamental approaches are:
- Change Data Capture (CDC): reading the database’s transaction log (commit log) to identify and extract changes (inserts, updates, deletes) in order, then publishing those changes as events. CDC is typically log-based and can stream events without locking or slowing down the source database.
- Transactional Outbox Pattern: designing your application to write events into an “outbox” table within the same transaction as your business data changes, ensuring atomicity. A separate process then pulls from this outbox table and publishes events to a message broker.
These patterns allow a system to maintain a single source of truth in an SQL database while also achieving the asynchronous propagation of events to other components.
In the following sections, we’ll dive deeper into how MySQL and PostgreSQL implement CDC, how the Outbox pattern works in practice, how CQRS ties into event-driven databases, and what managed services from AWS, GCP, and Azure can offer. We’ll keep both a technical lens (for architects and developers) and a strategic lens (for CTOs and decision-makers) to evaluate the trade-offs and benefits.
Change Data Capture (CDC) in MySQL and PostgreSQL
Change Data Capture (CDC) is a technique for identifying and capturing changes in a database and then delivering those changes elsewhere (often in real time). Instead of periodically polling the database for new or updated records, CDC lets us react to changes as soon as they occur. This is typically accomplished by tapping into the database’s append-only log of transactions.
Every modern RDBMS has some form of transaction log (also called write-ahead log in Postgres or binary log in MySQL). These logs were originally intended for replication and recovery, but they turn out to be a goldmine for CDC: by reading the log, we get a sequence of all committed changes in the database, in order. Importantly, reading the log does not lock any tables or impede the primary workload, making it efficient for capturing high-volume changes.
Let’s examine how CDC works in MySQL and PostgreSQL:
MySQL and CDC (Binlog-Based Change Capture)
MySQL uses a binary log (binlog) to record all database write operations (transactions) in sequential order. The binlog can be configured in row-based mode, which records every row change (insert/update/delete) along with the changed data. This is ideal for CDC, because it provides a detailed stream of data changes.
How MySQL CDC works: A CDC tool (or connector) can register as a replica to the MySQL server and start reading the binary log just like a MySQL replica would. For example, the open-source Debezium connector for MySQL operates this way – it connects and reads the binlog to produce change events for each row-level insert, update, or delete, and then emits those events to a downstream system (often Apache Kafka). This allows client applications or microservices to consume a real-time stream of database changes from Kafka topics, rather than polling the database.
MySQL’s binlog must be enabled (on managed services it often is by default, but requires configuration of parameters). Typically, one ensures binlog_format = ROW and log_bin = ON in MySQL settings. Tools like mysqlbinlog (a command-line utility) can directly read and even replay the binlog. MySQL binlogs are used not only for CDC but also for point-in-time recovery and replication – meaning the mechanism is robust and proven. Many CDC solutions, including third-party and cloud offerings, rely on this. For instance, AWS Database Migration Service (DMS) can read MySQL RDS binlogs to continuously replicate changes to a streaming target like Amazon Kinesis.
It’s worth noting that CDC via binlog has minimal performance impact on the source MySQL database. Since the database is already writing the binlog for its own purposes (replication, crash recovery), a CDC reader just tails this log. There is some overhead in writing the binlog itself (disk I/O and extra storage), so you should monitor the DB’s I/O and allocate enough storage for logs (or use a retention policy). MySQL binlogs can be purged after a certain timeframe; CDC systems typically ensure they keep up so as not to lose information when old logs rotate out.
MySQL CDC in action (Real-World): A famous example is Airbnb’s SpinalTap service, which monitors MySQL binlogs (as well as other data sources) to detect data mutations with low latency. Airbnb uses this to keep caches and search indexes in sync with the source-of-truth database changes in real time. Another example is Netflix, which built a tool called DBLog – it captures MySQL changes via the binary log and streams them for uses like feeding Elasticsearch for their search feature.
PostgreSQL and CDC (Logical Decoding)
PostgreSQL approaches CDC through a feature called logical decoding, available since PostgreSQL 9.4. PostgreSQL’s primary write-ahead log (WAL) is normally used for physical replication (binary replication to standbys), but when wal_level is set to logical, Postgres will record sufficient information to allow the WAL to be decoded into individual row changes. Logical decoding requires creating a replication slot and (in most cases) a decoding plugin that transforms the binary log records into a user-friendly format (e.g., SQL statements, JSON, etc.).
How PostgreSQL CDC works: A logical decoding plugin attaches to a replication slot and reads the WAL as it is generated. The plugin outputs each change (e.g., “row X inserted in table Y with these values”) in a format that a consumer can understand. For instance, the popular plugin wal2json converts each change into a JSON object, which can then be consumed by an application or CDC tool. Another plugin, pgoutput, is built-in and used for Postgres’s native logical replication feature – it outputs a standard logical replication stream that can be consumed by another Postgres instance or by tools like Debezium.
Debezium’s PostgreSQL connector creates a logical replication slot and can use the pgoutput plugin (or others) to capture all row-level changes from the WAL. The first time it connects, it may take a consistent snapshot of the current state, and then it continuously streams new changes as they commit. Those changes are then published as events (for example, into Kafka topics, one topic per table by default). The end result is similar to MySQL’s case: a stream of insert/update/delete events that consumers can react to.
Setup considerations: To use logical decoding, you need to configure the Postgres server: set wal_level=logical (or the cloud-specific equivalent flag) and ensure there’s a replication role with appropriate permissions. In a managed environment like Cloud SQL for PostgreSQL (Google Cloud) or Azure Database for PostgreSQL, you can enable logical replication via flags. For example, Cloud SQL supports logical decoding by enabling the cloudsql.logical_decoding flag, and it even supports output plugins like wal2json by default. Azure’s flexible server for Postgres likewise allows wal_level=logical and supports plugins or the native logical replication feature.
One powerful aspect of logical decoding is that you can create publications to stream only specific tables if you want (for example, you might not need changes from every table in the database, just a subset relevant to your use case). This can reduce the volume of events and target it to relevant consumers.
PostgreSQL CDC in action: Debezium’s reference showcases many companies using Postgres logical decoding. Netflix uses CDC (via their DBLog tool) to capture changes from PostgreSQL as well as MySQL. Netflix needed to synchronize data (like their catalog of shows) with downstream systems (search indexes, analytics) in real time without impacting the main database – logical decoding made that feasible. Uber built a similar pipeline for their platforms, using PostgreSQL CDC to ensure data consistency across services handling rides and food orders. These examples underline that Postgres, when configured for logical decoding, can handle high-throughput change streams that feed an event-driven architecture.
Triggers vs. Log-Based CDC
It’s worth briefly noting that one could use database triggers to capture changes (e.g., on insert/update in a table, write a message to a queue or a log table). While triggers can work for small-scale scenarios, they have some drawbacks in an event-driven design. Triggers execute as part of the transaction, which means they can add latency to the user’s transaction and potentially impact throughput. They also couple the database tightly to external systems if, say, the trigger directly calls an API or sends a message (not to mention the complexity of handling failures in triggers). In contrast, log-based CDC (reading the WAL/binlog) cleanly separates the concerns: the database focuses on committing transactions, and an external agent (CDC connector) asynchronously pulls the changes off the log. This tends to be more scalable and resilient – the source database doesn’t slow down for the sake of notifying others, aside from the manageable overhead of writing its log.
Thus, most modern EDA setups with SQL databases prefer log-based CDC (with tools like Debezium, AWS DMS, etc.) for efficiency. Even cloud vendors have embraced this: for example, Amazon DMS’s CDC functionality and Google Cloud’s Datastream service both rely on reading DB logs to stream changes.
Key Takeaway: Both MySQL and PostgreSQL provide mechanisms to continuously stream database changes. MySQL’s binary log and Postgres’s logical WAL decoding enable external systems to receive events for every data change with minimal impact on the operational workload. These features, combined with CDC platforms (open-source or cloud-based), allow a relational database to act as an event producer in an event-driven architecture.
The Transactional Outbox Pattern for Reliable Event Publishing
One of the thorniest problems in distributed architecture is ensuring that when something changes in your database, the event notification about that change is delivered reliably. Imagine a service that, in one operation, must update its database and send an event (message) to a message broker (like Kafka or RabbitMQ) to notify other components. If we simply do these two actions one after the other, we introduce a potential race condition: what if the database update commits but the service crashes before the event is sent? Or what if the event is sent, but the database transaction rolls back? We’d end up with an inconsistent system state. This is often referred to as the “dual-write” problem – two separate systems (DB and message broker) being updated and the challenge of keeping them in sync.
Distributed transactions (e.g., two-phase commit across the DB and the message broker) are generally not available or not advisable (Kafka, for instance, does not participate in XA transactions). Instead, the industry uses the Transactional Outbox pattern to solve this reliably.
What is the Outbox pattern? It’s a design where any service that needs to publish events does so by writing to its own database. In practice, you add an “outbox” table to the service’s database. Whenever you handle a request that modifies data and needs to emit an event, you perform one transaction that (a) writes the necessary changes to your business tables and (b) inserts a record into the outbox table representing the event to send. Because this is one atomic transaction, you guarantee that either both the data and the event are recorded, or neither is. The event record in the outbox typically contains the event type, the payload (often as JSON), timestamps, and a status (like “pending”). The service does not directly communicate with the message broker in the request’s flow.
How do the events actually get sent then? This is handled by an Outbox Relay (also known as a publisher or dispatcher). This could be a background thread in the same service, a separate process, or even an external CDC mechanism. The relay process continuously looks at the outbox table for new events that haven’t been sent yet. It then publishes those events to the message broker (e.g., produces a message to a Kafka topic or sends to an event bus), and after confirmation, marks them as sent (or removes them) in the outbox table. If the publish fails, the relay can retry – since the event is still sitting in the database, it’s not lost. This decouples event sending from the main transaction that created the data.
The Transactional Outbox pattern ensures that changes (e.g., new orders) and their corresponding events are saved in the same database transaction. A separate outbox relay process reads the Outbox table and publishes events to a message broker asynchronously. This guarantees that an event is never lost – if the transaction commits, the event will eventually be delivered.
The outbox pattern’s benefits are clear: it guarantees that if your state change is stored, the event describing that change will not be forgotten. Conversely, if the state change fails, no event gets sent. There’s no need for distributed transactions or 2PC coordination between DB and broker. Events will also be published in the same order as the transactions that produced them, because the outbox relay will read in commit order (for example, by an auto-increment ID or commit timestamp). This preserves causality (e.g., if transaction T1 updated an order status and transaction T2 canceled the order afterwards, you won’t have an “OrderCanceled” event arriving before an “OrderUpdated” event).
However, there are trade-offs and considerations:
- The application code becomes responsible for writing to the outbox. It’s extra logic, though patterns and frameworks can abstract it. Developers must remember to include the event in the transaction, otherwise the pattern fails (one mitigation is using domain event publishing mechanisms within your ORM/unit-of-work).
- The outbox table will grow; you need a strategy to archive or prune old events after they are sent and no longer needed. This could be a scheduled job or an automated cleanup (ensuring not to delete unsent events).
- The outbox relay might publish an event more than once in failure scenarios. For example, if it crashes after sending an event but before marking it as sent, it will send it again on restart. This means consumers of events must be idempotent (able to handle duplicate events). In practice, this is usually handled by including unique event IDs and having consumers track processed IDs, or designing events so that processing twice has no adverse effects. This is a general requirement even outside the outbox pattern, because message brokers often deliver duplicates or at-least-once by default.
- There is a small delay between when a transaction commits and when its event is observed by others (the time for the relay to read and forward it). In a well-tuned system, this delay can be on the order of milliseconds to a few seconds. For most use cases (even many real-time-ish ones), this is acceptable, but truly synchronous requirements might not fit this model. The trade-off for guaranteed reliability is a bit of latency.
Implementing the Outbox Relay: There are two common approaches:
- Polling Publisher: The service (or a companion service) queries the outbox table periodically (e.g., every few seconds or via a short-lived polling loop) to find new events and publish them. This is straightforward and works with any database. The polling interval can be tuned to balance load vs. latency.
- Transaction Log Tailing (CDC): This is a slick approach where you don’t explicitly poll the table; instead, you use CDC on the outbox table itself. For example, Debezium has an Outbox Event Router module that is designed to recognize “outbox” table entries and directly route them to Kafka topics. Essentially, Debezium tails the DB log (as described earlier) and when it sees a new row in the outbox table, it produces an event to the configured topic and can even transform the event using the payload in that row. This approach can reduce custom coding and offers very low latency – as soon as the outbox insert is in the WAL/binlog, Debezium will capture it and forward it. Many teams choose this pattern: they get the atomicity of the outbox and the convenience of CDC to handle the publishing.
From a business perspective, the outbox pattern significantly reduces the risk of data loss or inconsistency in cross-system communication. It does introduce some operational overhead (managing the relay and outbox table), but numerous frameworks and tools exist. For instance, in the Java ecosystem, the MicroProfile Outbox or Spring Boot extensions can automate outbox writes; in .NET, there are libraries to support outbox (like CAP or MassTransit’s outbox). Many organizations have rolled their own lightweight outbox frameworks internally given how common the need is.
Why not just integrate the broker directly? – Decision-makers might ask why we need this complexity instead of simply writing to a queue within the transaction. The reality is that most databases and brokers do not share transactions, so you can’t commit them atomically together. The outbox is a pragmatic pattern that uses the DB’s strengths (strong transactions) to avoid losing events, which is often more palatable than dealing with half-baked two-phase commit solutions. It leverages the existing infrastructure (the database) to ensure consistency.
In summary, the outbox pattern enables reliable event publication from an SQL database, turning database changes into durable events without exotic technology. It’s a key piece that allows SQL to play nicely in an event-driven architecture – you get the ACID guarantees for your primary data and eventual consistency for other services that subscribe to the events, all with a simple design.
CQRS and SQL in an Event-Driven Context
Another concept frequently mentioned alongside event-driven systems is CQRS (Command Query Responsibility Segregation). CQRS is a pattern that suggests separating the write model and read model of your data. In other words, the part of the system that handles commands (writes, state changes) is distinct from the part that handles queries (reads, presenting data). By segregating these, each side can be optimized and scaled independently for its specific workload. For example, the write side focuses on transactional integrity and business rules, while the read side can be tailored for efficient queries (perhaps even using different data schemas or technologies to serve queries faster).
In an event-driven architecture, CQRS often goes hand-in-hand with event propagation. Typically, the write side will publish events (using patterns like outbox + CDC) whenever it changes state, and the read side will listen to those events to update its own data store (which could be a read-optimized database, cache, search index, etc.). This way, the read store remains in sync with the latest changes, but it might have a different structure (e.g., denormalized for fast retrieval, precomputed views, etc.).
Using SQL databases in CQRS:
- The write model could be a traditional relational database (e.g., an orders service using PostgreSQL to store orders).
- The read model could also be a relational database (or multiple) that contain materialized views or denormalized tables optimized for queries – or it could be something like Elasticsearch, depending on needs. In the simplest CQRS, the read model might even be just a read replica of the primary database, but more advanced scenarios might use a different schema or DB technology entirely.
When both the command side and query side are using SQL databases, events are the logical way to keep them in sync. For instance, if the orders service (write side) commits a new order and puts an event “OrderCreated” in its outbox, a listener on the query side (could be a separate service or a database trigger in the read DB) would catch that event and insert a corresponding entry in a reporting table or update a pre-aggregated summary in the read database. Because the systems are decoupled by events, the read side could even be in a different physical database or a different type of database.
The challenge here is similar to earlier: you want to ensure that the event publishing is reliable because if the read model misses an event, it gets out of sync. That’s why CQRS often leverages the outbox/CDC approach on the write side as well. The Microsoft Azure Architecture guidance notes that when using separate stores, a common pattern is the write model publishes events that the read model uses to update itself. It also warns that since you can’t have a single transaction spanning the write DB and the message broker, you need to handle the possibility of inconsistency (hence patterns like Outbox to mitigate it) and use idempotent processing on the read side – again reflecting the importance of our earlier discussion.
Practical example: Suppose we have a customer service (with its own DB) and a customer reporting service (with a separate DB optimized for analytics queries). Under CQRS:
- The customer service (write side) handles updates to customer info. On each change, it writes to its PostgreSQL and also inserts an outbox event like “CustomerUpdated” with the new details.
- A CDC process (Debezium or similar) picks the “CustomerUpdated” event and publishes it to, say, a Kafka topic.
- The reporting service (read side) subscribes to that topic. When it receives “CustomerUpdated”, it updates its own read database (which might have a customer profile table, or multiple tables combining customer data with other stats) accordingly.
This way, queries to the reporting service hit a database that is designed for fast reads (maybe an OLAP schema or simply indexes tailored for frequent access patterns), without bogging down the primary write database. And thanks to the event-driven update, the read store is updated almost in real-time after each write on the primary side.
For architects and CTOs, the key appeal of CQRS in an event-driven system is scalability and performance: you can scale out your read replicas or even choose different database tech for the read side (for example, use a SQL database for writes but push events to NoSQL or a search index for reads). It also helps in security and data segregation – you can expose a read database to many querying clients (or even to external partners) without giving them access to the write database.
However, CQRS does introduce eventual consistency – the read model will lag slightly behind the write model. In practice, if the lag is small (say, sub-second or a few seconds), many applications find this acceptable. The pattern works best when the domain can tolerate that slight delay. Tools like materialized views or refreshing caches were older ways to handle read/write concerns, but CQRS with event-driven updates is often more flexible and scalable.
It’s important to mention that CQRS does not require event sourcing (where the primary source of truth is an event log). Here, we are still using a traditional database for primary records. We are simply using events as a means to replicate state changes to other interested parties (like the read store). In a sense, it’s like one-way sync via events. Because we ensure reliability through patterns like outbox, we can trust that the read store will eventually get all the events in order.
In summary, CQRS and event-driven patterns complement each other. SQL databases on the write side ensure strong consistency and transactional safety for critical data, while a decoupled read side (which might also use SQL or other stores) can serve queries efficiently. The glue between them is the event stream that carries changes from one to the other. Using relational databases doesn’t preclude CQRS – in fact, many CQRS implementations use relational DBs for the command side and sometimes for the query side too. The main thing is to design the event flow carefully and ensure that no events are lost (again, the Outbox + CDC combo is a proven way to do that in a SQL-based system).
Managed Cloud Services Supporting Event-Driven SQL
All major cloud providers have recognized the importance of integrating databases with streaming and event-driven architectures. Whether you’re on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, there are services and features to help get data out of relational databases and into event streams with minimal fuss. Let’s look at each in turn:
AWS: CDC and Events with RDS/Aurora
AWS offers managed relational databases via Amazon RDS (for MySQL, PostgreSQL, SQL Server, etc.) and Amazon Aurora (a cloud-optimized version of MySQL/PostgreSQL). In an AWS environment, you typically have a few options to capture and stream database changes:
- AWS Database Migration Service (DMS): Despite its name, AWS DMS isn’t just for one-time migrations – it can be configured for ongoing replication using CDC. It supports sources including RDS MySQL, RDS PostgreSQL, Aurora MySQL/Postgres, and more. DMS can read the binary log or WAL from these databases (with minimal setup – e.g., enabling binlog and creating a replication user) and then stream the changes to various targets. Notably, since 2022 AWS DMS can stream directly to Amazon Kinesis Data Streams. This means you can set up a pipeline where, say, every insert/update in your Aurora MySQL is captured by DMS and pushed to a Kinesis stream in real time. Once in Kinesis, a whole world of AWS services can consume the data: AWS Lambda functions can be triggered by stream events, Kinesis Data Firehose can batch them to S3 or Redshift, Kinesis Analytics (Managed Flink) can process them, etc.aws.amazon.com. Essentially, DMS acting as a CDC capture tool decouples your database from the consumers. An example use case: an e-commerce site on AWS can use DMS to stream new orders from its RDS database into Kinesis, then Lambdas pick up those events to update caches, send notifications, or trigger other workflows – all without polling the DB.
- Amazon MSK (Managed Streaming for Kafka) and MSK Connect: AWS offers Amazon MSK to run Apache Kafka as a managed service. A common pattern is to use Debezium connectors on MSK Connect to capture database changes. For instance, AWS has published examples of using the Debezium MySQL connector with Amazon Aurora MySQL. You would enable binlog on Aurora (through the parameter group, as needed) and then MSK Connect (which is Kafka Connect in AWS) can run Debezium and pipe data into Kafka topics. Once in Kafka, you handle events as you would in any Kafka-based architecture (with consumers, stream processors, etc.). The advantage is you’re using open-source connectors but in a managed environment. This approach might be chosen if you already are invested in Kafka as your event bus.
- Aurora Database Activity Streams (advanced): Aurora (especially Aurora PostgreSQL) has a feature called Database Activity Streams that can push a stream of database activity to services like Kinesis. However, this is more oriented toward security monitoring and auditing (it streams every statement for auditing) rather than data integration. For CDC purposes, DMS or Debezium are more targeted solutions.
- Triggers to EventBridge or Lambda: AWS doesn’t natively support triggers that call Lambda from RDS (unlike, say, DynamoDB which has streams that trigger Lambda). A few creative solutions exist (like using AWS Lambda with RDS Proxy to poll or react to changes, or using Aurora with integrated Lambda triggers – Aurora MySQL has a feature where you can call a Lambda from a stored procedure trigger, but it’s somewhat limited and synchronous). In general, people lean on the log-based methods above for reliability.
AWS’s strength is that it has a robust ecosystem for the consumers of the events. For example, after using DMS to put changes in Kinesis, you can fan out those changes to numerous real-time consumers (serverless or otherwise) and analytics destinations. This means an architecture can use a classic relational database as the source-of-truth and still enjoy the benefits of stream processing and serverless reactions. AWS even has whitepapers and blog posts emphasizing decoupling applications by streaming changes instead of direct DB calls.
From a cost/management perspective, AWS DMS is a managed service, but one must provision replication instances for it, which adds cost. If you have modest throughput, a small DMS instance suffices; for very high volume, you scale it up. Alternatively, MSK is a bigger investment if you’re not already using Kafka. The good news is that enabling CDC on AWS relational databases is fairly straightforward (e.g., just enabling binlog on RDS MySQL as shown earlier and ensuring your user has replication privileges). This lowers the barrier for organizations to adopt an event-driven approach without switching databases.
Google Cloud: Datastream and Pub/Sub for Database Changes
On Google Cloud, the managed relational service is Cloud SQL (for MySQL, PostgreSQL, etc.), and Google also offers AlloyDB (a Postgres-compatible, high-performance database). GCP provides a service called Google Cloud Datastream for CDC. Datastream is a serverless change data capture service that (currently) supports sources like MySQL, PostgreSQL (including Cloud SQL, AlloyDB, and even RDS/Aurora as sources). It was initially launched for streaming data into BigQuery and Cloud Storage, but it can also stream into Cloud Pub/Sub or into Dataflow pipelines for custom processing.
With Cloud SQL for PostgreSQL, to use Datastream or any logical decoding, you set the cloudsql.logical_decoding flag on your instance cloud.google.com. Under the hood, Datastream will create a logical replication slot and a publication on the database. You, as the user, configure what tables you want to replicate (it can be all tables or a selection). Datastream takes care of reliability – it uses the WAL to read changes (without locking), and if network issues occur, it resumes from the last known WAL position using the replication slot. This ensures ordered, exactly-once delivery into the Datastream pipeline.
Google Cloud also emphasizes integration with its messaging/streaming systems:
- You can send the stream to Cloud Pub/Sub (Google’s global event bus). This would allow any number of subscribers to get the events. For example, a Pub/Sub topic “customer-changes” could receive all changes from a Cloud SQL Postgres customer table; Cloud Functions or microservices can then subscribe to that topic to handle events.
- Or you can use Dataflow (Apache Beam) to process the stream in flight – perhaps to filter, transform, or join with other streams – and then write out to storage or Pub/Sub or another sink.
There is also a newer service, Cloud Spanner Change Streams, but Spanner is a different type of database (NewSQL). For Cloud SQL, Datastream is the go-to solution for CDC.
Additionally, Google has broad support for Debezium on Google Cloud. One can run Kafka Connect with Debezium on GKE (Kubernetes) or even use Confluent Cloud which is available on GCP. Some organizations use Cloud Pub/Sub as a Kafka alternative – interestingly, there are open-source connectors that allow Debezium to publish events to Pub/Sub instead of Kafka if needed.
In short, GCP’s approach is to provide Datastream as an easy on-ramp. If a CTO asks “How do I get data changes from my Cloud SQL into my data warehouse or into my event system?”, GCP’s answer is “use Datastream with minimal configuration – we’ll handle the heavy lifting of CDC.” The service is serverless and scales under the hood. This lowers maintenance overhead compared to self-managing Debezium.
For MySQL on Cloud SQL, the concept is similar: enable binlog, and Datastream will tail it. Google even supports Oracle CDC via Datastream. So they intend it as a universal change capture for heterogeneous databases.
Azure: Change Data Capture in Azure Databases and Event Grid
Microsoft Azure, with its heritage in enterprise databases, also offers solutions for CDC:
- Azure SQL Server/Azure SQL Database: If using Azure’s managed SQL Server, Microsoft provides a built-in CDC feature (similar to SQL Server CDC on-prem). However, since our focus is MySQL/Postgres, we’ll skip SQL Server details.
- Azure Database for MySQL and PostgreSQL: Azure’s flexible server offerings support logical replication (for Postgres) and binary logging (for MySQL). As we saw, you can read MySQL binlogs on Azure to feed Debezium or other tools. Azure even published a tutorial using Debezium + Kafka on Azure Event Hubs to capture MySQL changes. Azure Event Hubs is Azure’s big data streaming platform (akin to Kafka as a service – in fact it has a Kafka-compatible interface). So an architecture on Azure could be: Azure Database for MySQL -> Debezium Connector -> Azure Event Hub (Kafka topic) -> consumers (maybe Azure Functions or other microservices).
- Azure Data Factory/Synapse: For more data integration and ETL scenarios, Azure Data Factory supports change data capture mappings. But if we focus on event-driven microservices rather than BI ETL, one would more likely use Event Hubs or Service Bus to distribute events.
One interesting Azure service is Azure Event Grid. Event Grid is a fully managed event routing service, often used for events from Azure resources (like blob created, etc.), but it also can handle custom events. Azure doesn’t automatically put database changes on Event Grid – you’d still need something to raise those events. However, one could imagine using an Azure Function that listens to Debezium output and then pushes to Event Grid or Service Bus if needed.
Azure’s guidance for Postgres (Flexible Server) suggests enabling wal_level=logical and using pglogical or native logical replication. Pglogical (an extension) is supported on Azure Postgres and can replicate data to another Postgres or even to some listeners, but for event-driven, one might not choose pglogical (which is more for DB-to-DB replication). Instead, capturing WAL via Debezium or similar is common.
Azure also recently introduced Azure Event Hubs for Kafka which can host Kafka Connect. Or one might use Azure’s Fabric Real-Time (preview) which, according to some docs, can directly ingest CDC from databases (there are references to adding Postgres CDC as a source in Azure’s new analytics Fabric). This is more on the analytics side though.
From a business perspective, Azure caters a lot to enterprise use cases, where you might have a mixture of legacy and new systems. The ability to use familiar SQL databases and still get real-time integration is valuable. For example, a financial application might keep using Azure Database for PostgreSQL for core transactions, but use CDC to publish events to a .NET microservices ecosystem via Event Hubs. This avoids dramatic re-architecture or adoption of new databases just to get event streams. It also means leveraging existing skills (T-SQL or PL/pgSQL knowledge, etc.) while modernizing the integration approach.
In summary, all cloud providers offer managed pathways to incorporate SQL databases into event-driven pipelines:
- AWS emphasizes DMS and MSK (Kafka) for CDC into streaming services.
- GCP provides Datastream and Pub/Sub for serverless change capture and distribution.
- Azure leverages its database features and Event Hubs to accomplish similar goals.
Each of these reduce operational burden: instead of running your own change capture infrastructure 24/7, you can offload some of it to the cloud service. That said, many advanced users still opt for open-source Debezium on Kubernetes for full control or use Kafka Connect for complex transformations. The good news is, whether you choose cloud-native services or open source, the fundamental capability is there: SQL changes can be turned into event streams in the cloud.
Real-World Case Studies: SQL in Event-Driven Action
To solidify our understanding, let’s look at a couple of real-world scenarios and how companies applied these patterns:
Case Study 1: Streaming Giant – Netflix
Netflix is often cited as a pioneer in using data streams. They have a complex microservices ecosystem and need to propagate data changes instantly for personalized recommendations, search indexing, and operational visibility. Netflix uses MySQL and Postgres in various parts of their system. To avoid slow nightly batch updates, Netflix engineers built a change capture system named DBLog.
What DBLog does: It taps into the database transaction logs (similar to Debezium’s approach) and captures every data change (CDC) in MySQL and Postgres, then streams those changes into their real-time data pipeline. For example, if a new show is added to the catalog (insert in a SQL DB), DBLog streams that event so the search index service can immediately index it, and the recommendation service can adjust what it suggests to users. Netflix’s requirements were stringent – they wanted sub-second data integration and zero impact on the source DB performance. By using log-based CDC, they achieved real-time sync without locking the databases or slowing transactions.
Netflix also pushes database changes to Elasticsearch (for their user-facing search) and to various analytics systems for monitoring playback quality, etc.. The success of this approach is evident: despite handling massive volumes of data and millions of users, Netflix can keep their systems highly in sync in real time, all while continuing to use SQL databases as a core storage layer. This shows the viability of combining SQL and EDA at extreme scale.
Case Study 2: Marketplace Platform – Airbnb
Airbnb operates a global home-rental marketplace with a need for strong consistency (no double-booking a room) and real-time updates (listing availability, pricing adjustments, etc.). They primarily used MySQL as their system of record in early days. To evolve into a real-time data company, Airbnb developed SpinalTap, a CDC pipeline.
SpinalTap’s role: It captures data mutations from MySQL (and other stores like DynamoDB) and feeds them into Apache Kafka. Once in Kafka, those events drive multiple consumers: a search service updates availability caches, a pricing service recalculates trends, etc. For example, when a host updates the price of a listing (a row update in MySQL), SpinalTap ensures an “UpdateListing” event is emitted so that all interested services get the new price. This avoids each service querying the database repeatedly; instead they react to the event.
Airbnb’s CDC platform had to be reliable and ordered (they guarantee that events from a MySQL binlog come out in commit order to Kafka). They also designed it to be general-purpose, handling schema changes and failures gracefully. The reason Airbnb invested in this was because they recognized that customer experience (CX) depends on timely, consistent data. A guest shouldn’t see a room as available if it was booked an instant ago – thus the booking service’s DB change triggers an event that the availability service uses to update its store near-instantly.
Airbnb’s story demonstrates how an event-driven approach with SQL at the core can scale to a globally distributed system. By using SQL + CDC + Kafka, they achieved sub-second integration between services, something not possible with traditional nightly ETLs or synchronous RPC calls at that scale. It also allowed them to isolate services – e.g., the search index doesn’t directly query the bookings database (which could be a performance bottleneck); it simply updates via events. This isolation improves reliability and decouples deploy cycles of services.
Case Study 3: Ride Sharing – Uber
Uber’s business – rides and food delivery – is extremely time-sensitive. They use a mix of datastores, including relational ones. Uber engineers created their own change capture platform (sometimes referenced in tech talks, not sure of a public name; one component Uber built is Atlas and uReplicator for data pipelines). Uber’s CDC system was designed for **“sub-second” latency and fault tolerance with zero data loss.
For instance, if a driver’s status changes to unavailable (a SQL update), that event needs to propagate to the dispatch system immediately to avoid sending new ride requests to that driver. By capturing such changes from their databases and streaming them, Uber ensures all services have a consistent view of the world in real time. They mention the need to not bog down the source DB – their CDC reads from logs asynchronously.
Uber’s platform is an example of home-grown solutions proving the concept, which has since been made easier by open-source tools. Not every company will build from scratch like Netflix or Uber did, because now projects like Debezium exist. But these cases proved that even at hyper-scale, using SQL as a source of truth and streaming changes out of it is a robust architectural choice.
Case Study 4: Storyblocks (Small Company Modernization)
On a smaller scale, consider Storyblocks, a media service (as documented in a Confluent case study). They started with a monolithic MySQL database and moved towards microservices. Instead of abandoning MySQL, they embraced an event-driven approach to synchronize data. They set up Apache Kafka with Kafka Connect and Debezium to capture changes from their MySQL monolith and feed them to microservices via Kafka topics. This allowed them to gradually carve out microservices (each with its own database or cache) fed by the stream of changes from the central MySQL. The outcome was reduced direct coupling – services didn’t call the monolith’s DB directly anymore, they subscribed to events. Over time, this pattern significantly reduced load on the primary DB and improved the system’s scalability and fault isolation (according to their blog, they reduced REST API call volume and converted many interactions to async events).
This example speaks to many mid-sized companies: you can “have your cake and eat it too” – keep the reliability of an ACID SQL database for core data, and still achieve the responsiveness and decoupling of EDA by layering CDC and messaging on top.
Open-Source and Community
Beyond companies, it’s worth noting the open-source ecosystem that has grown around these ideas. Debezium (affiliated with Red Hat) has become a popular CDC platform; it implements the patterns we discussed (including an outbox event router) and supports MySQL, Postgres, MongoDB, and more. Many community tutorials show how to use Debezium with Kafka Connect to stream outbox events. There’s also Maxwell’s Daemon (focused on MySQL binlog to JSON), used in some setups for simpler needs. These tools mean that even if you’re not on a cloud with a dedicated CDC service, you can implement a reliable pipeline with relatively little effort. The community has also put forth best practices on schema evolution (ensuring your event consumers handle DB schema changes), monitoring (making sure your CDC connector lag is low), and error handling (e.g., what if an event fails to send – how to alert and replay). Adopting SQL in EDA is no longer an exotic approach – it’s quite mainstream.
Best Practices and Considerations
Bringing SQL databases into an event-driven design does require mindful implementation. Here is a summary of best practices to ensure success:
- Enable and Tune Logging: Make sure to enable binary logging (MySQL) or logical WAL (Postgres) with appropriate settings (e.g., ROW mode for MySQL binlog). Monitor the log generation rate and retention. For Postgres, set an appropriate max_wal_size or logical replication slot timeout so that if a consumer lags, it doesn’t blow up the disk. On MySQL, watch the binlog retention (purge) period – ensure it’s long enough that a brief outage of your CDC consumer won’t lose data.
- Use a Dedicated Replication User: Whether using Debezium or DMS, use a separate DB user with replication privileges. Follow least privilege – e.g., Debezium’s docs specify exactly what privileges are needed. This avoids any interference with application users.
- Design Idempotent Consumers: As noted, consumers of events should handle duplicates gracefully. A simple way is to include a unique event ID (e.g., primary key or a UUID from the outbox) that consumers track. Many message processing frameworks (like Kafka Streams, Spark, etc.) have facilities for exactly-once or idempotent processing. This ensures if an event is re-delivered, it doesn’t cause a business error (like charging twice).
- Manage the Outbox Table: If using the outbox pattern, implement a routine to mark events as published (and perhaps move them to an archive table or delete after a certain period). You might keep, say, a week of events for debugging/auditing and then purge. Proper indexing on the outbox table (e.g., on “status” or “created_at”) will help the relay query efficiently for new events.
- Order and Partitioning: If using Kafka as the event bus, decide on partition keys wisely (e.g., maybe partition by aggregate ID to ensure ordering per entity). The outbox event should contain enough info to route events so that ordering semantics meet your needs.
- Monitor Lag: Treat your CDC pipeline as a critical piece of infra. Monitor the lag between the database commit and the event consumption. Both Debezium and cloud services expose metrics for this (e.g., Debezium has “seconds behind source” metric). If lag starts increasing, you might need to scale up the CDC connector or investigate bottlenecks. Keeping the lag low ensures your data stays fresh in the downstream systems.
- Schema Evolution Strategy: When the database schema changes (like adding a column), think about how it affects events. Many CDC tools will include the new field in events once it appears. If a consumer service doesn’t yet know about the field, it might ignore it (which is usually fine). Just ensure you coordinate major changes – e.g., removing a column should be done with awareness that some events might still carry that field for a while. Using a format like Avro with schema registry for Kafka events can help manage schema evolution.
- Transactional vs. Snapshots: Remember that CDC typically streams changes after an initial baseline. For new systems, you might snapshot existing data (Debezium can take a snapshot of existing rows). Ensure consumers handle the initial load appropriately (e.g., if they receive a bunch of “initial state” events). Some pipelines separate a one-time bulk sync from ongoing changes.
- Security and Compliance: Streaming sensitive data from a database means you should consider encryption in transit (TLS on the connectors), and fine-grained access control on who can subscribe to the events. Also, if data needs to be masked or filtered (e.g., don’t stream PII to all consumers), you may implement filtering logic either in the CDC tool (some allow column masking) or in the consumer code.
By adhering to these practices, companies have run stable event-driven systems on top of SQL for years. The operational maturity of relational databases combined with these patterns gives a solid foundation.
Conclusion: SQL Databases Can Keep Up with Event-Driven Architectures
In conclusion, the notion that you must abandon relational databases to achieve an event-driven, real-time architecture is a myth. MySQL and PostgreSQL, when used with modern techniques like change data capture and the outbox pattern, are fully capable of acting as publishers of events and sources of truth simultaneously. They offer the best of both worlds: the strong consistency and familiarity of SQL for transactional operations, and seamless integration into asynchronous, scalable pipelines for distributing changes.
When best practices are applied – from enabling the right DB features to ensuring reliable event delivery – a SQL-based system can support high-throughput event streams. We’ve seen how industry leaders like Netflix, Airbnb, and Uber harnessed their existing databases for real-time streaming at massive scale. But even at smaller scales, the same patterns empower organizations to decouple services, improve responsiveness, and avoid dual-writes or data loss.
For architects and CTOs, this means you don’t necessarily need to introduce a completely new datastore (and the complexity that comes with it) to implement EDA. You can extend your current relational infrastructure with CDC and messaging components to meet many real-time requirements. The cloud providers are making it even easier, offering managed CDC and streaming services to reduce operational overhead.
To directly answer “Can SQL keep up?” – Yes, it can. A well-tuned MySQL or PostgreSQL, augmented with CDC or outbox mechanisms, can publish events within milliseconds of a transaction commit. The viability has been proven across industries: finance, e-commerce, media, you name it. The key is to respect the constraints (e.g., eventual consistency on the consumer side) and design the event flow carefully.
In an era of event-driven design, SQL databases are not a hindrance but rather a powerful asset when used wisely. By leveraging their robustness and combining it with the flexibility of event streams, organizations can achieve systems that are both reliable and reactive. Event-driven architectures and SQL databases are not mutually exclusive – with the right architecture, they form a compelling duo for building the next generation of scalable, real-time applications.