Field note · Android · iOS · Architecture

Offline-first mobile sync — the pattern that survived ship Wi-Fi.

April 22, 2026 ·9 min read ·by Christopher Renshaw

At Carnival Cruise Line we shipped Android apps that processed thousands of passengers a day in port and onboard. The catch: ship networks disappear without warning. The rule was hard — boarding cannot stop because the Wi-Fi did. This is the offline-first architecture that delivered, and the same pattern I'd reach for in any healthcare or field-operations mobile app.

The principle that organizes everything else

Most mobile apps are network-first. The user taps something, you call an endpoint, you get a response, you render. That works fine in an office. It collapses on a cruise ship.

Offline-first inverts the relationship: the local database is the source of truth, the network is a sync sink. Every user action writes to local storage immediately and returns. A separate process — usually WorkManager on Android, BackgroundTasks on iOS — picks up unsynced rows and tries to push them when the network is up.

The UI never awaits the network. It binds to the database. That sentence is the whole architecture.

The shape, in one diagram's worth of words

User action → Repository → Room (or Core Data on iOS) — synchronous, sub-millisecond.
Repository emits a row with sync_state = PENDING.
UI re-renders from the Room Flow. The user sees their action immediately.
WorkManager job, scheduled with a network constraint, drains PENDING rows to the backend.
On success, mark sync_state = SYNCED. On failure, leave PENDING with a retry counter.

// Sync worker — runs only when network is available, retries with backoff.
class SyncWorker(ctx: Context, params: WorkerParameters) :
        CoroutineWorker(ctx, params) {

    override suspend fun doWork(): Result {
        val pending = boardings.pendingSync(limit = 50)
        var failed = 0

        for (row in pending) {
            try {
                val resp = api.push(row.toDto())
                boardings.markSynced(row.id, resp.serverId)
            } catch (e: IOException) {
                failed++
                boardings.incrementRetry(row.id)
            }
        }

        return if (failed > 0) Result.retry() else Result.success()
    }
}

The five rules that matter more than the code

1. Generate IDs on the device

Server-assigned IDs are a network round-trip you don't have. Use UUIDs (or a ULID-style sortable variant) generated locally. The server stores yours and optionally returns its own; the client treats the device-generated ID as canonical for joining child records.

2. Soft state, not hard state

Every row has a sync_state column with values like PENDING, SYNCED, FAILED_PERMANENT. The UI can read this and show subtle indicators ("uploading…" / "saved"). Users on flaky networks tolerate latency much better when they can see what's happening.

3. Conflict resolution before you need it

Decide up front: is the client always right, or the server, or do you merge? At Carnival, for boarding state the device was authoritative — once a passenger was checked in on a tablet, that fact stuck even if the server briefly disagreed. For passenger profile data, the server won. Pick per entity, document it, write the test.

4. Idempotent endpoints

If you submit the same row twice (because the network blinked between request and response), the server should compute the same result. That means including a stable client-side request ID in every mutation and the backend deduping on it. Without this, retries create duplicates.

5. Bounded queues

"Sync everything pending" is a trap when "everything" is two days of offline operations. Bound the batch (e.g. 50 rows per worker invocation) so the device doesn't lock up on the first sync after a long outage. Pagination on the client side is just as important as on the server.

What this gets you

The UI is instantaneous. Tapping Submit feels native, not laggy.
The boarding line keeps moving when the gateway router reboots.
You can run a full QA pass with airplane mode toggled mid-flow and the app still behaves.
Battery is better — bursts of network rather than continuous polling.

Where I'd reach for it

Healthcare apps with PHI capture (the patient is in front of you, the chart can sync later). Field service apps for technicians on customer sites. Retail in-store tools where the corporate VPN is intermittent. Any time the user is offline more than 1% of the day, treat offline-first as the default and online-only as the exception.

Bottom line

Offline-first isn't a library. It's a small set of rules: local DB is truth, every write is async, every entity has a sync state, every endpoint is idempotent, every conflict has a documented resolution. Apply them and the network becomes a feature instead of a single point of failure.