(25Qs) Swahili Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, Jacaranda & Omnilingual

(Updated Jan 2026)
This page shows a test of many Swahili (Kiswahili) speech‑to‑text systems and, in some cases, Swahili → English translation pipelines.
eval image
We use the same Swahili audio clips for every system. Then we compare each system’s text output to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

No.	Workflow	Accuracy (Mean)	Median Latency (s)
0	GPT-4oAudio	0.50	5.49
1	GPT-Realtime	0.45	5.13
2	Jacaranda + GPT-5.1	0.94	4.05
3	Jacaranda + Gemini 3 Pro	0.96	8.84
4	Jacaranda + GPT-5.1 + Goog MT	0.91	4.13
5	Omni + GPT-5.1 + GoogMT	0.92	4.87
6	Omni + Gemini 3 Pro	0.96	8.54
7	Omni + Gemini 3 Pro + GoogM	0.96	9.12
8	Gemini 3 Pro	0.92	9.80
9	Jacaranda + Gemini 3 Flash	0.92	5.32
10	Jacaranda + GPT-4.1	0.89	3.62
11	Gemini 3 Flash	0.85	5.95

On this page you can:

See which Swahili system or pipeline gets the best score
Compare different Swahili ASR and Swahili→English models side by side
Choose the best system for your app, call center, research, or product
Download all results for deeper analysis and custom reporting

3mo ago

Gooey Workflows

Input Data Spreadsheet

Show as Links

Input Columns

Output Columns

Evaluation Workflows

⚙️ Settings

Run cost = 1 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

API: Compare Output Text (from input_audio) Download

Aggregate:Mean

API: Compare Run Time (Median) Download

Aggregate:Median

🐞 Debug

🙋🏽‍♀️ Need more help? Join our Discord

(25Qs) Swahili Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, Jacaranda & Omnilingual

Gooey Workflows

Input Data Spreadsheet

Input Columns

Output Columns

Evaluation Workflows

🛠️ Developer Tools and Functions

Aggregate:Mean

Aggregate:Median

GET STARTED

LEARN

DEVELOPERS

SOCIAL

CONNECT

EXTRAS