Building an AI Tutor for the Masses

At Pandai, scaling PBot was a massive engineering challenge. To provide a reliable, low-latency AI tutoring experience for students, we needed an architecture capable of processing over 1.2B+ tokens while maintaining a near-zero error rate.

The Challenge

Initially, latency and inference costs were our primary bottlenecks. Every token generated by the model needs to feel instantaneous to the student, yet we also required a robust guardrail system to ensure pedagogical accuracy.

The Solution

We implemented a custom architecture using Gemma 3 1B deployed efficiently, combined with a local vector retrieval stack (ObjectBox). By utilizing Unsloth + LoRA, we fine-tuned the model natively for the Malaysian curriculum.

Key improvements included:

Offline Support: Embedding the model directly on Android devices (Kotlin Multiplatform) for low-connectivity rural areas.
RAG Pipeline: Vector-based semantic search to provide accurate, reliable answers referencing approved curriculum materials.
Observability: Integrating Langfuse to monitor hallucination rates in real-time, allowing our team to iterate safely and effectively.

The result is a fast, specialized AI tutor accessible anywhere, transforming how students learn every day.