The Brief
Fresh Earth operates at the intersection of agriculture, carbon markets, and ESG compliance. The company's IBIS platform tracks carbon sequestration and emissions across agricultural supply chains — a technically demanding problem involving satellite data, sensor networks, supply-chain provenance records, and the complex, sometimes ambiguous methodologies of voluntary carbon market standards.
IBIS-SIM extends this with simulation capabilities: given a planned land-use change or farming practice change, what is the projected carbon impact over a defined time horizon?
As Chief of AI, the mandate was twofold: improve the ML systems underpinning both platforms, and increase the velocity at which the team could ship new features and model updates.
The Approach
Carbon accounting at scale has two hard problems. The first is data heterogeneity: satellite imagery, IoT sensor readings, logistics records, manual survey data, and regulatory filings all need to feed into the same models using consistent methodology. The second is auditability: carbon credits are financial instruments, and the models producing them need to produce explainable outputs that can be reviewed by auditors.
The architecture centered on a serverless ML pipeline — AWS Lambda for computation, S3 for data storage, SQS for job queuing — that could scale horizontally for batch processing of large agricultural datasets while remaining cost-efficient during low-volume periods.
"The auditability requirement changed every model architecture decision. If you cannot explain why the model produced a specific carbon estimate, the credit is not saleable."
The Build
The pipeline architecture treated each data source as a distinct ingestion stream with its own normalization layer. Satellite imagery went through a preprocessing pipeline (cloud masking, atmospheric correction, NDVI calculation) before being ingested. IoT sensor data went through anomaly detection and gap-filling before joining the main pipeline. This separation of concerns made it straightforward to update individual ingestion streams without touching the core model.
The ML models themselves — primarily gradient-boosted trees for tabular supply-chain data, and CNN-based models for the satellite imagery — were managed through a lightweight MLOps layer. Model versioning, A/B testing infrastructure, and rollback capabilities were built early and used often.
The explainability layer used SHAP values to produce per-prediction feature attribution. For a given carbon estimate, the system can produce a ranked list of factors that drove the estimate and their relative contributions. This was not a nice-to-have — it was a requirement for the auditor-facing report generation.
IBIS-SIM used a Monte Carlo simulation approach: run the carbon model with sampled parameter distributions to produce probability distributions over outcomes rather than point estimates. This is more honest about the inherent uncertainty in carbon projection, and more useful for decision-making.
class CarbonSimulator:
def simulate(
self,
baseline: LandUseScenario,
intervention: LandUseScenario,
n_samples: int = 10_000,
horizon_years: int = 10
) -> SimulationResult:
samples = [
self._run_single(baseline, intervention, horizon_years)
for _ in range(n_samples)
]
return SimulationResult(
p10=np.percentile(samples, 10),
p50=np.percentile(samples, 50),
p90=np.percentile(samples, 90),
)
The Outcome
The 30% release-velocity improvement came from three sources: automated testing infrastructure that caught regressions before they reached production, a model registry that made it straightforward to promote a new model version without manual coordination, and refactored deployment pipelines that cut the deploy time from 45 minutes to under 10.
The IBIS platform is processing millions of data points per month. The explainability reports are used in carbon credit audits. IBIS-SIM is in active use for land-use planning decisions.
Lessons
In ML systems that feed financial instruments, the model is not the product. The audit trail is the product. Every design decision should ask: can an auditor who knows nothing about our codebase understand why the model made this prediction?
Serverless ML has a cold-start problem that matters at the wrong times. For Fresh Earth's use case — batch processing with some latency tolerance — this was acceptable. For real-time applications where a user is waiting, provisioned concurrency or a different architecture is required.
The simulation approach (probability distributions over point estimates) is almost always more honest and more useful. The additional complexity is worth it.