Compares Gemini3, GPT5.2, KissanAI, LLAMA4 Maverick, Sarvam.AI, GPT4o and AgriLLM (Qwen3) for their responses being similar to our golden QnA of common small shareholder farmer questions and answers (in English) from ClearGlobal and Opportunity Intl. What makes a good golden eval QnA?
Results

Gooey Workflows
Input Data Spreadsheet
Loading...
Input Columns

Loading...



Evaluation Workflows


Run cost = 1 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

API: Evaluator for Relevancy Download

Loading...


Aggregate:Mean

Loading...

Loading...


API: Evaluator for Completeness Download

Loading...


Aggregate:Mean

Loading...

Loading...


API: Evaluator for Accuracy Download

Loading...


Aggregate:Mean

Loading...

Loading...


API: Evaluator for Conciseness Download

Loading...


Aggregate:Mean

Loading...

Loading...