The Brief
Acumatica is a cloud ERP platform — the infrastructure layer for mid-market companies managing finance, distribution, and project accounting. The engineering team builds custom modules and integrations on top of the Acumatica framework.
The engagement was to accelerate this development work. The specific problem: requirements written in Jira were taking too long to translate into working code. The handoff between product and engineering was lossy, and the validation loop between code and tests was slow. The hypothesis was that LLM orchestration could compress this pipeline significantly.
The Approach
The architecture that emerged was a three-agent workflow with shared state:
- Requirements Agent: Reads a Jira issue, extracts structured requirements, identifies ambiguities, and requests clarification if needed. Produces a machine-readable specification.
- Code Generation Agent: Consumes the specification, has access to the Acumatica framework documentation and the existing codebase (via Azure AI Search), and generates implementation code.
- Test Validation Agent: Reviews the generated code against the original requirements, generates test cases, and produces a validation report.
The three agents share a stateful context stored in Cosmos DB. This is what makes the orchestration useful rather than just novel: the Test Validation Agent can reference the original requirements as parsed by the Requirements Agent, not just the code as generated. The loop is closed.
"The orchestration value is in the shared memory. Three isolated agents solve three separate problems. Three agents with shared state solve one problem together."
The Build
The stateful context schema was the foundational design decision. Every piece of information flowing through the pipeline needed a home in the schema:
- Original Jira issue (raw)
- Parsed requirements (structured)
- Clarification requests and responses
- Codebase context retrieved from search
- Generated code
- Test cases generated
- Validation results
- Human review flags
LangChain was the orchestration layer. The agent routing logic was explicit — not learned — because the routing decisions (requirements → code → test) are deterministic. LLM-based routing adds latency and failure modes without adding value when the sequence is known.
Azure AI Search indexed the Acumatica framework documentation and the team's existing codebase. Hybrid search (keyword + semantic) outperformed pure semantic search for code retrieval, which is expected: code search benefits heavily from exact identifier matching.
The Acumatica framework is .NET-based. The Code Generation Agent was given extensive context about the framework's specific patterns: BQL (Business Query Language), PXGraph for data access, PXCache for entity management. Without this domain context, the generated code would be idiomatic C# but non-functional Acumatica code.
workflow = StateGraph(PipelineState)
workflow.add_node("parse_requirements", requirements_agent)
workflow.add_node("generate_code", code_agent)
workflow.add_node("validate_tests", test_agent)
workflow.add_edge("parse_requirements", "generate_code")
workflow.add_edge("generate_code", "validate_tests")
workflow.add_conditional_edges(
"validate_tests",
route_on_validation,
{"pass": END, "revise": "generate_code", "escalate": "human_review"}
)
The Outcome
The pipeline reduced the time from Jira ticket to reviewed implementation by measurably compressing the requirements-to-code and code-to-tests phases. The clarification mechanism — the Requirements Agent surfacing ambiguities before code generation begins — reduced the rate of implementations that had to be substantially rewritten after the fact.
The test generation is not a replacement for human test authorship; it is a scaffold. The generated test cases are reviewed and extended by engineers. The value is in the scaffold: starting from a generated test that covers the happy path and the edge cases the requirements agent identified is faster than starting from nothing.
Lessons
Multi-agent systems fail at the handoffs. The communication between agents — the schema that defines what one agent passes to the next — is where most of the implementation work is, and most of the debugging work too.
Invest in the schema before writing agent logic. A vague interface between agents means every ambiguity in requirements produces a different broken output. A precise interface means failures are predictable and fixable.
The second lesson: human review gates are not optional. The pipeline produces code; engineers ship it. The automation compresses the loop, it does not eliminate judgment. Design the human review step as a first-class part of the system, not an afterthought.