Microservices — Transaction Management Patterns

Photo by Denise Johnson on Unsplash
  • Redo log: allows a dbms to redo a transaction. consists of files that store all changes made to the database as they occur. Every instance of an Oracle Database has an associated redo log to protect the database in case of an instance failure.

Two Phase Commit (2PC) ❌

Assumptions:
- One process is “coordinator” (C), the rest are “cohort”s (H).
- Coordinator can be initiator (or not).
- Storage is stable, no machine crashes.
- Write-ahead log exists at each site.

- not appropriate for microservices ❌
- if C fails, it's a single point of failure. 'Availability' impacted.
- depends on slowest response time, locks everywhere
- not supported by NoSQL dbs & msg brokers

Saga Pattern ✔️

  • Single independent unit of work
  • Every microservice should expose 1. participate 2. rollback (compensation) methods. (for n transaction -> n compensation)
  • If one in the chain fails, all the previously occurred ones’ rollbacks should be called.
  • Compensation service must be idempotent (cannot abort) and must be retried until success.
  1. Choreography Based:
    Each local trxn publishes domain events that trigger local trxns in other services. Preferred over orchestration-based.
- SAGA -> ACD ✔️ I ❌ : does not guarantee Isolation
Countermeasures:
* Versioning data (check version before update)
* Re-read data (before modifying)
* Semantic locking (lock flag preventing other trxns from accessing it)

CQRS (Command and Query Responsibility Segregation) Pattern

  • Queries (read)
  • Commands (update)
CQRS Implementation — https://betterprogramming.pub/cqrs-software-architecture-pattern-the-good-the-bad-and-the-ugly-e9d6e7a34daf
- complex
- database dependent, requires expertise in a variety of database technologies.
- using a large number of databases means more points of failure.
- need to handle data consistency in terms of CAP

Event Sourcing

  • All changes to app state are stored on a sequence of events (journal of domain events) (easy to roll back). (e.g. Axon)
  • Commonly combined with the CQRS pattern
  • Defines an approach where all the changes that are made to an object or entity (also known as the write model), are stored as a sequence of immutable events to an event store, as opposed to storing just the current state.
- suitable for event-driven architecture 
- makes it possible to reliably publish events whenever state changes
- audit log, easy to get state at a given time
- difficult to query event store (inefficient complex queries)

Transactional Outbox Pattern

microservices -> database transaction + publish events => should be atomic.
What happens in case of system failure? => Solution: outbox table (since this is also a database table)
There may be a scanning batch job to check if any such record exists (e.g column “status=PROCESSED”), read from there and then publish related event.

Change Data Capture (CDC)

  • It captures change events from a database log (or equivalent) (allowing application state to be externalized), forwards to consumers (e.g. Debezium).
  • Types: Change Columns (such as DATE_MODIFIED) (not good at capturing deletes), Trigger-Based (performance issues), Log-Based (most preferred)

Log-Based Change Data Capture

  • Directly integrated with database
    MySQL, Postgres, etc. -> transaction log (write-ahead log) ->
    Debezium -> Kafka Queues
Comparison of Attributes — https://debezium.io/blog/2020/02/10/event-sourcing-vs-cdc/
  • All your events and database writes must be idempotent (as in every microservice regardless of the pattern you are using).
Listen to Yourself Pattern — https://medium.com/@odedia/listen-to-yourself-design-pattern-for-event-driven-microservices-16f97e3ed066
- The risk of consistency is more since db write occurs later. 

Dead Letter Queue Pattern

  • Data (in the form of a message from a queue) cannot be processed due to:
    - the lack of access to some component part of the system
    - the input data is invalid or corrupt.
  • Solution => move the message to a “dead-letter queue”
    design your system in a way that it not only limits the number of re-processing attempts, but also postpones the re-processing of failed messages (parking-lot queue) so as to allow other (potentially not accessible) parts of the system to become available.
Parking Lot and Dead Letter Queues — https://www.ibm.com/cloud/architecture/architectures/event-driven-deadletter-queue-pattern/

Distributed System Models

Capture assumptions in a system model consisting of:

  • Node behaviour (e.g. crashes)
  • Timing behaviour (e.g. latency)

The Two Generals Problem

https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf
  • Both parts cannot know if their msgs are received without receiving an answer from the other. => This results in an infinite chain of msgs to check certainity.
https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf
https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf
  • There are enough goods in the shop but the shop did not get any information about the payment’s outcome => ask payment service if it did actually charge that card.
  • With the steps above, online shopping avoids having two generals problem.

The Byzantine Generals Problem

https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf
https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf
  • Honest generals don’t know who the malicious ones are.
  • The malicious generals may collude.
  • Nevertheless, honest generals must agree on plan.
https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nil Seri

Nil Seri

I would love to change the world, but they won’t give me the source code | coding 👩🏼‍💻 | coffee ☕️ | jazz 🎷 | anime 🐲 | books 📚 | drawing 🎨