CocoaGraph: A Knowledge Graph Approach to Modeling the Cocoa Supply Chain

LLM-Agentic Data Acquisition and Supply Chain Modeling

Authors

Alejandro Soumah, Dr. Megumi Naoi

Affiliations: University of California San Diego

Cocoa, the key ingredient in chocolate, saw a 172% price surge in 2024, highlighting the opacity and information asymmetry in its supply chain. The complexity of cocoa's value chain, with misaligned stakeholder incentives across legal and geographic contexts, has made supply chain modeling difficult. This lack of transparency burdens manufacturers with higher costs, challenges traders with margin calls, and leaves even the International Cocoa Organization (ICCO) struggling to identify the drivers of volatility. Fragmented and underutilized data—from geospatial information to financial transactions—further compounds the issue.

We propose integrating knowledge graphs with large language models (LLMs) to address these challenges. Knowledge graphs provide structured, dynamic representations of entities and relationships, while LLMs synthesize unstructured data for analysis. Together, they can enhance transparency, enable data-driven decision-making, and equip stakeholders with actionable insights, offering a critical step toward addressing systemic inefficiencies in cocoa's global market.

01. Introduction

The early 2020s saw a surge in commodity prices driven by inflation, supply chain disruptions, and trade wars, emphasizing the need for supply chain transparency. Cocoa, largely produced in Côte d'Ivoire and Ghana, faces challenges like traceability, weather impacts, and deforestation, leading to regulations like the EU Deforestation Regulation (EUDR).

While initiatives like Ferrero's efforts and the Cocoa & Forests Initiative (CFI) aim to improve transparency, supply chains remain opaque due to fragmented and static datasets. Projects such as Trase offer insights but lack real-time updates and fail to map smaller actors, limiting comprehensive traceability.

To address these gaps, innovative solutions like real-time forecasting, predictive models, and tools like knowledge graphs (KGs) and large language models (LLMs) are key. Integrating physical models with digital datasets can enhance transparency. LLMs can further automate source identification, validate data, and streamline research, enabling faster progress in supply chain transparency and efficiency.

02. Validation

Cumulative Arrivals Per Port Over the Year
FIGURE 4: Plotted Cumulative Arrivals Per Port Over the Year
Location of ECOM Warehouses
FIGURE 5: Plotted Location of Warehouse of ECOM
Importer Countries and Cooperative Partnerships
FIGURE 6: Plotted Importer Countries & Cooperative Partnerships of ECOM

04. Methodology

This work focuses on automating source identification, enabling real-time forecasting, and managing rare supply chain events using two components:

  • Agent for Data Storage and Decision-Making:
    • Performs targeted internet searches using knowledge graph data
    • Stores data in AWS S3 with metadata tagging and redundancy filtering
    • Extracts entities and relationships to refine the knowledge graph
  • LLM-Based NER, RE, and Entity Disambiguation:
    • Identifies and categorizes entities (e.g., manufacturers, suppliers, farmers)
    • Maps supply chain relationships through explicit and inferred patterns
    • Resolves ambiguous entities using metadata and similarity scoring

05. Results

Cooperatives & Trader Groups Geospatial
FIGURE 2: Plotted Result of Cooperatives & Trader Groups - CocoaGraph Geospatial Representation
Cooperatives & Trader Groups Topology
FIGURE 3: Plotted Result of Cooperatives & Trader Groups - CocoaGraph Topological Representation

06. Conclusion

This study demonstrates the potential of integrating knowledge graphs and LLMs to address opacity in the cocoa supply chain.

Key findings include:

  • Improved Data Integration: Combining structured and unstructured data enables comprehensive supply chain modeling
  • Real-Time Analysis: LLMs enhance forecasting and adaptability to rare events
  • Scalable Automation: Automating source identification and relationship extraction streamlines data collection, reducing manual effort

07. References

  1. International Cocoa Organization (ICCO). Cocoa Statistics.
  2. Global Forest Watch. Forest Loss in Ivory Coast.
  3. European Commission. Regulation (EU) 2023/1115 on Deforestation-Free Products.
  4. Guye, V. (2024). Method Documentation for the Database: Ivorian Cocoa Cooperatives and Their Buyers (IC2B) — v1.0.
  5. Guye, V., et al. (2024). Côte d'Ivoire Cocoa Supply Chain v.1.1 (2020–2022).
  6. Ministry of Agriculture and Rural Development. (2017). Répertoire Des Sociétés Coopératives.
  7. Coffee-Cocoa Council (Ivory Coast). Registered Traders and Export Licenses for 2022–2023.
  8. Neo4j. (2023). GraphRag Manifesto.
  9. United Nations. UN Comtrade Database.
  10. Bill of Lading Records.
  11. Liu, Y., et al. (2023). A Knowledge Graph Perspective on Supply Chain Resilience.
  12. Mao, S., et al. (2023). Research on Supply Chain Knowledge Graph Inference Method Based on Quaternion Embedding.

Transform Your Supply Chain with AI

Learn how RISO Data's knowledge graph and LLM solutions can enhance your supply chain transparency and efficiency.

Schedule a Demo