Skip to content

FPX, DuitNow, and why OAuth alone won't save your Malaysian fintech

OAuth gets you past the login screen. Idempotency, mandate state machines, and settlement-file reconciliation are what keep production from burning.

· 8 min read

A subscription business in KL wires up a local payment gateway, adds OAuth-backed login on the customer side, and ships. For weeks, everything looks fine. Then a Sunday night AutoDebit charge hits insufficient-funds. The gateway marks it failed. The retry job runs Monday morning before the failure webhook arrives. By the time the webhook lands, the retry has gone through. The customer is charged twice. Support tickets open, refunds go out manually, and the team spends Tuesday tracing which transaction record is canonical. This is not an edge case. It is a scenario payments teams across KL have seen, and it surfaces reliably once a subscription base is large enough that some AutoDebit attempts will fail over the weekend. OAuth and a gateway SDK got you to production. They did not get you through that Monday morning.

Why the Stripe mental model breaks

The Malaysian payments stack is not Stripe wearing a local jersey. The architectural differences are significant enough that building on Stripe-tutorial patterns is actively dangerous here.

FPX is redirect-based. The customer authenticates at their bank portal, and the bank fires a webhook back to your endpoint with the result. The trust anchor is the bank, not your server. You never touch card data. The failure modes follow from this: network interruptions mid-redirect, browser-close events during authentication, duplicate webhook deliveries when the bank retries on timeout. Stripe-style charge-and-confirm logic does not map to this flow.

DuitNow AutoDebit has mandate semantics that Stripe Subscriptions hides. A mandate is a long-lived authorization object: the customer grants recurring debit rights, and that grant can be revoked, suspended, or expired. Each transition produces a webhook. Webhooks can arrive out of order. Treating AutoDebit like a Stripe Subscription misses the full state surface.

Bank Negara Malaysia sits in the regulatory loop in a way that has no clean US-market analogue for a startup of equivalent size. Transaction logging retention periods, PDPA obligations, and reporting thresholds apply from the first production transaction. Routing through a licensed gateway does not transfer those obligations if your system is still touching or storing the data.

Five patterns every Malaysian payments backend needs

Pattern 1: Idempotency everywhere, not just at intent creation

Adding an idempotency key when creating a payment intent is table stakes. It is also insufficient on its own, because the retry surface extends well past intent creation.

Consider the FPX redirect flow. Your callback endpoint processes the bank result and fires an event to an internal service that updates the order status. If that call drops, the callback handler retries. If the internal service already processed the first delivery and the second arrives before the first response is acknowledged, you have two state-update events for the same transaction. Without idempotency guards on the consumer, both attempt the same state transition, and any downstream effects (fulfillment, receipt email, accounting entry) fire twice.

Every endpoint that processes a payment event should accept an idempotency key, persist the key and response before returning, and return the cached response on duplicate delivery without reprocessing. This applies to internal service-to-service calls, not just gateway-facing endpoints. The scope extends through reconciliation: a deduplication check during T+1 settlement-file import catches any transaction credited by webhook that appears again in the file.

Structure keys so you can trace them. A key encoding the payment intent ID, event type, and sequence counter is debuggable. A raw UUID is an obstacle at 2am.

Pattern 2: Mandate lifecycle as a state machine

An AutoDebit mandate is not a boolean. It is not active or inactive. It has states, and those states have legal and operational meaning.

The lifecycle: a mandate starts pending after the customer completes the authorization flow. It moves to active once the bank confirms the standing authorization. From active, it can move to suspended (bank-initiated, usually due to an account issue), revoked (customer-initiated cancellation), or expired (mandate term ended). A suspended mandate can be reinstated to active. A revoked mandate is terminal. Attempting to debit a revoked mandate is an authorization failure, and more importantly a compliance problem: the customer has exercised their cancellation right.

The implementation consequence: you need an explicit state machine with guarded transitions, not a flag field with update logic scattered across handlers.

// Illustrative mandate state machine (pseudo-TypeScript)
type MandateState = 'PENDING' | 'ACTIVE' | 'SUSPENDED' | 'REVOKED' | 'EXPIRED'

const VALID_TRANSITIONS: Record<MandateState, MandateState[]> = {
  PENDING:   ['ACTIVE', 'REVOKED'],
  ACTIVE:    ['SUSPENDED', 'REVOKED', 'EXPIRED'],
  SUSPENDED: ['ACTIVE', 'REVOKED'],
  REVOKED:   [],  // terminal
  EXPIRED:   [],  // terminal
}

function applyMandateEvent(current: MandateState, next: MandateState): MandateState {
  if (!VALID_TRANSITIONS[current].includes(next)) {
    // Log and discard: this is an out-of-order or replayed webhook
    return current
  }
  return next
}

If a mandate.suspended webhook arrives before mandate.activated, the transition from PENDING to SUSPENDED is not in the valid table, so you discard and log. When mandate.activated arrives, you apply it. When a replay of mandate.suspended arrives later, you apply that too, ending at the correct terminal state.

Without explicit transition guards, out-of-order delivery silently corrupts mandate state. You discover it when a charge attempt fails against what your database records as active.

Pattern 3: Reconcile against the settlement file, not just the webhook stream

Webhooks are the fast path for user experience. They are not the source of truth for accounting.

PayNet publishes T+1 settlement files covering the previous business day's transactions: every settled transaction, every reversal, every chargeback, with amounts, timestamps, and reference numbers as they appear in the clearing ledger. This file is the canonical record of what money moved and when.

Webhooks can be lost, delayed by hours during gateway maintenance, or delivered multiple times. A webhook stream running several months in production will have gaps, and you will not know about them unless you have something to compare against.

The design separates the two jobs. The webhook consumer is the fast path: when a success webhook arrives, update the order and trigger fulfillment so the customer gets their confirmation quickly. The settlement reconciler is the correctness path: a batch job that runs after the T+1 file is available, importing every transaction and comparing against the webhook-derived ledger. Discrepancies fall into a few categories: lost webhooks (file has the transaction, ledger does not), amount mismatches, and reversals or chargebacks that arrived in the file without a prior webhook. Each has a remediation path.

An accounting system built only on the webhook stream will drift from reality. The settlement file is what corrects that drift.

Pattern 4: The BNM compliance surface is not optional, even when you are not licensed

Routing payment processing through a licensed payment gateway transfers the licensing obligation. It does not transfer the compliance obligations that attach to data handling and transaction record-keeping on your side.

BNM's transaction record retention requirements apply to any party that processes, records, or stores transaction data, not only the licensed principal. The relevant periods are in BNM's published regulatory framework at bnm.gov.my. If your system logs payment events, order records, customer financial data, or mandate history, those logs are in scope.

The Personal Data Protection Act 2010 imposes consent, minimization, and purpose-limitation obligations on personal data including financial records. Collecting more than you need, retaining it beyond the stated purpose, or sharing it outside the original consent scope are violations regardless of licensing status.

Practical engineering implications: log what you need and nothing more. Do not store full bank account numbers when a reference token and last four digits serve support purposes. Build automated deletion or anonymization at the retention boundary, wired to a scheduler, not a manual cleanup that never runs. Merchant category codes should be accurate at onboarding and reviewed when your service changes, because MCC determines reporting threshold triggers. A first-pass read of the relevant BNM fintech guidance is something any engineer on the team can do. Do it before you hit production volume.

Pattern 5: Webhook ordering and replay safety

A webhook is not a command. It is a state declaration, and state declarations can arrive in the wrong order.

The FPX authorization flow can produce an authorized event followed by an authorization_expired event if the settlement clock runs out before confirmation. If your handler treats these as commands and applies each in sequence, you end up with a record showing authorized when the reality is expired. Treating them as state declarations with transition guards produces the correct terminal state regardless of delivery order.

Replay safety overlaps in implementation but is a separate concern. A webhook delivered twice should produce the same outcome as one. The mechanism: before processing any webhook, check whether the event ID has already been processed. If it has, return 200 and stop. If not, record the event ID and update state atomically in the same database transaction.

The atomic step is where implementations break under load. Recording the event ID after processing lets a concurrent request pass the duplicate check before the first response is written, so both process the event. The record has to be in the same transaction as the state change.

Gateway providers retry on timeout. Your handler should return 200 before the retry window, and handle the retry cleanly when it does not.

Founder checklist: what your first payments engineer needs on day one

Hand this to whoever is building the payments backend before they write the first handler.

  • Idempotency keys at every boundary. Intent creation, callback processing, internal event publishing, and settlement import. Not just at the gateway call.
  • Mandate state machine with explicit transition guards. Valid transitions only. Log and discard invalid ones. Reconcile them later.
  • Settlement file reconciliation as a first-class job. Run it daily after T+1 files are available. Alert on discrepancies. Do not assume the webhook stream is complete.
  • BNM retention policy in the codebase. Not in a doc somewhere. An automated job that enforces it.
  • PDPA minimization from day one. Collect what you need. State why. Build the deletion path before you need it.
  • Webhook replay guard with atomic record-and-process. Event ID check inside the same transaction as the state write. No exceptions.
  • Exponential backoff on retries. A retry job that fires immediately on failure hammers the gateway during an outage and can trigger duplicate charges.
  • Observability on every state transition. If you cannot answer "when did this mandate move from pending to active" from your logs in two minutes, your observability is insufficient.
  • Runbook for stuck-pending payments. Some FPX transactions will land in pending and never receive a subsequent webhook. Define the manual resolution path before the first one appears on a Saturday night.
  • An onboarding doc covering the payment flow end-to-end. The engineer who built the system is not always available at 2am. Write it down.

Closing

The first production weekend will tell you how much of this you missed, and it will tell you through support tickets you cannot answer until Monday morning. A double charge on a Sunday night, a mandate showing active in your database but rejected at the gateway, a reconciliation mismatch nobody catches until month-end. These arrive when a payment system that looked fine in testing meets the real delivery characteristics of Malaysian payment rails. The patterns above are the backlog you want already shipped before that weekend, not discovered by it.

Written by

Faiz Kasman

Software engineer in Kuala Lumpur. Payments, multi-tenant SaaS, and inventory infrastructure. Currently building the Shell Malaysia ParkEasy app.

Keep reading