(25Qs) Swahili Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, Jacaranda & Omnilingual

(Updated Jan 2026)
This page shows a test of many Swahili (Kiswahili) speech‑to‑text systems and, in some cases, Swahili → English translation pipelines.
eval image
We use the same Swahili audio clips for every system. Then we compare each system’s text output to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

No.WorkflowAccuracy (Mean)Median Latency (s)
0GPT-4oAudio0.505.49
1GPT-Realtime0.455.13
2Jacaranda + GPT-5.10.944.05
3Jacaranda + Gemini 3 Pro0.968.84
4Jacaranda + GPT-5.1 + Goog MT0.914.13
5Omni + GPT-5.1 + GoogMT0.924.87
6Omni + Gemini 3 Pro0.968.54
7Omni + Gemini 3 Pro + GoogM0.969.12
8Gemini 3 Pro0.929.80
9Jacaranda + Gemini 3 Flash0.925.32
10Jacaranda + GPT-4.10.893.62
11Gemini 3 Flash0.855.95

On this page you can:

  • See which Swahili system or pipeline gets the best score
  • Compare different Swahili ASR and Swahili→English models side by side
  • Choose the best system for your app, call center, research, or product
  • Download all results for deeper analysis and custom reporting
Gooey Workflows
Input Data Spreadsheet
Loading...
Input Columns

Loading...



Evaluation Workflows


Run cost = 1 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

API: Compare Output Text (from input_audio) Download

Loading...


Aggregate:Mean

Loading...

Loading...


API: Compare Run Time (Median) Download

Loading...


Aggregate:Median

Loading...

Loading...