Skip to content

Offline LLM

This page documents the mobile on-device LLM path and its integration with backend services in a hybrid online/offline architecture.

Why Offline Inference Exists

  • Disaster scenarios can have unstable or no connectivity.
  • Field users still need guidance and structured capture support.
  • Local inference allows partial continuity until sync is restored.

Mobile Runtime Architecture

graph TD
    subgraph Flutter
      UI[Chat/Help Screens]
      CH[LlmPlatformChannel]
      QUEUE[Local pending-action queue]
    end

    subgraph Android
      ACT[MainActivity MethodChannel handler]
      INF[InferenceModel MediaPipe session]
      PREFS[SharedPreferences model path]
      FILES[App filesDir model artifact]
    end

    subgraph Cloud
      API[Backend API]
      DB[(Firestore)]
      BOT[/chatbot/ask/]
    end

    UI --> CH
    CH --> ACT
    ACT --> INF
    ACT --> PREFS
    ACT --> FILES

    UI -->|online mode| API
    API --> DB
    API --> BOT
    UI -->|offline mode| INF
    UI --> QUEUE
    QUEUE -->|reconnect sync| API

Online/Offline Routing Logic

flowchart TD
    A[User submits prompt or help action] --> B{Network available?}
    B -- Yes --> C[Call backend APIs]
    C --> D[Use cloud chatbot + central persistence]
    B -- No --> E[Run on-device inference]
    E --> F[Return local guidance immediately]
    F --> G[Store unsynced action locally]
    G --> H{Connectivity restored?}
    H -- Yes --> I[Replay queued actions to backend]
    I --> J[Firestore becomes source of truth]

Sync Semantics

  1. Local actions are timestamped and queued when offline.
  2. On reconnect, queue replay posts actions to backend in order.
  3. Backend validates and persists canonical records.
  4. Mobile marks queue items synced only after API confirmation.

Operational Notes

  • The current mobile docs already include model download and inference internals via MethodChannel.
  • This overview models the target production behavior where mobile also submits requests/chats to backend when online.
  • Offline output is advisory; dispatch authority remains on backend-controlled operational workflows.