Offline AI: Deploying LLMs on Android Devices
Offline AI: Deploying LLMs on Android Devices
One of the largest barriers to equitable education technology is reliable internet access. When deploying the PBot AI Tutor into rural Malaysian schools, we recognized that a purely cloud-based API architecture would leave many students behind.
To solve this, we brought the AI directly onto the device.
The Challenge
Running Large Language Models (LLMs) on mobile devices presents heavy computational and memory constraints.
The Solution
We selected Google's Gemma 3 1B as the base model. Its parameter count makes it incredibly efficient. We utilized Unsloth + LoRA to fine-tune the model off-device using curated, multi-turn Malaysian curriculum QA datasets, achieving high precision in both Bahasa Melayu and English without bloating the model weights.
The Code
We converted the model to TFLite format and integrated it using Kotlin Multiplatform. Here is a simplified code snippet of our local inference block wrapper:
class LocalAITutor {
private var interpreter: Interpreter? = null
fun initialize(context: Context, modelName: String) {
val options = Interpreter.Options().apply {
setNumThreads(4)
useNNAPI = true // Hardware acceleration
}
val modelBuffer = loadModelFile(context, modelName)
interpreter = Interpreter(modelBuffer, options)
}
fun generateResponse(prompt: String): String {
// Tokenization logic omitted for brevity
val inputValues = tokenize(prompt)
val outputBuffer = FloatArray(MAX_TOKENS)
interpreter?.run(inputValues, outputBuffer)
return decode(outputBuffer)
}
}
References and Resources
To replicate or learn more about this approach:
- Unsloth AI GitHub Repository - The fastest fine-tuning library out there.
- Google Gemma Release Notes
- Kotlin Multiplatform Mobile (KMM)
It's clear that on-device AI is the next frontier, providing privacy, low latency, and offline equity.