LLM-Assisted Farm Management Insights

Data Scientist · 2025 · 4 months · 5 people · 2 min read · Updated · doi.org (opens in new tab)

Built a human-in-the-loop LLM pipeline that converts agricultural research into actionable farm management recommendations, published in Scientific Reports.

GPT-4.1-miniGemini 2.5DeepSeekLLaMA 3.3Python

Overview

Developed a system that uses multiple large language models with expert oversight to systematically screen academic literature and generate evidence-based soybean farm management plans.

Problem

Increasing food production sustainably requires translating a growing body of agricultural research into practical guidance for farmers—a process that is slow, labor-intensive, and difficult to scale manually.

Approach

Designed a multi-stage pipeline: systematic literature search using the PICO framework on Web of Science, parallel screening by four LLMs with expert arbitration, extraction of evidence and relevance assessments by the two top-performing models, inconsistency detection, and final LLM-generated management plan synthesis.

Constraints

  • Recommendations must be grounded in peer-reviewed research, not LLM hallucinations
  • Multiple LLMs needed cross-validation to ensure screening accuracy
  • Expert arbitration required at every stage to maintain scientific rigor

Key Decisions

Multi-LLM consensus screening instead of single-model extraction

No single LLM was reliable enough on its own. Having four models independently screen studies with expert arbitration caught errors that any individual model would miss.

Alternatives: Single LLM with manual spot-checkingFully manual systematic review

Human-in-the-loop architecture over fully automated pipeline

Farm management advice directly affects livelihoods. Expert validation at each stage was non-negotiable for producing trustworthy recommendations.

Result & Impact

Demonstrated that LLM-assisted systematic review can accelerate the translation of agricultural research into practical farm management guidance, with findings published in Scientific Reports.

Learnings

  • Multi-model consensus is more robust than relying on a single LLM for high-stakes information extraction.
  • Gemini's Deep Research tool produced stronger general recommendations, but structured extraction tasks favored GPT-4.1-mini.