Discover proven strategies lean teams use to lead on sustainability. Get the guide →

Industry

How to Estimate the Carbon Footprint of AI-Written Code: A Methodology Walkthrough

CNaught Team

May 12, 2026

Sustainability teams need defensible numbers for emissions that don't show up on any cloud sustainability dashboard, run on third-party infrastructure, and produce activity that's hard to attribute. AI coding agent emissions check all three boxes.

‍

This walkthrough covers how we translate agent activity into kWh and kg CO2e, with every assumption made visible. It's one of three pieces on this topic. The audience-facing overview covers the why and what to do about it. The Claude Code launch analysis covers a related question about market dynamics. If you haven't read those yet, you can start with either.

Why Precision is Hard Right Now

Market concentration · last 30 days

of AI commits come from just three tools, yet 30 tools show meaningful activity

Three things make precise per-developer numbers difficult.

1. The Variance is Real

‍Token consumption per task varies by an order of magnitude depending on model choice, task complexity, and how the agent is configured. A complementary study to the work cited below documented up to 30x run-to-run variance on identical tasks. Any single number you publish has wide error bars, and being honest about that is more credible than precision theater.

‍

2. The Market is Moving

In the past year, the top of the leaderboard has shuffled more than once. Tools that led the category in mid-2025 have since dropped out of the top tier, and tools that didn't exist as agents a year ago now rank among the most active. A precise per-engineer number published today is a good way to be wrong by next quarter.

The top 10 tools by commit volume

Claude Code1.1M

Copilot SWE Agent542k

Jules (Google)105k

Cursor43k

OpenAI Codex12k

Replit Agent9.3k

Copilot8.8k

OpenHands7.9k

Cline4.1k

Devin3.6k

Daily commits · last 30 days. Hover a row for share of total.

3. The Methodology is Evolving

‍The Jegham et al. 2025 framework we use is the current best-available public methodology for LLM inference energy estimation. Newer frameworks can build on existing ones and extend to more recent models, time of day, and hopefully additional disclosure from model providers. What's defensible today won't be the final word.

‍

The right response to all three is the same: capture raw inputs (tokens, model, provider, and time) that stay valid as methodology evolves, apply the current best framework at report time, and document your assumptions. That's what this walkthrough demonstrates.

The Inference Chain: From a Push to a Kilogram of CO2e

‍

The full chain we model:

The full chain we model

push A batch of commits sent to GitHub

→

commit(s) Individual code changes within a push

→

tokens consumed The compute units the model processed

→

kWh Energy used by the data center hardware

→

kg CO2e Carbon emissions based on grid intensity

Each step has assumptions. We make them visible so they can be challenged or improved.

‍

Step 1 — Pushes to Commits

A push is what a developer does (git push); a commit is what the agent produces.

‍

We empirically derive 1.29 commits per push from an analysis of GitHub Archive PushEvent data from September 2025, the last full month before GitHub Archive removed the relevant data field on October 7, 2025. We winsorized the data as there were a number of outliers that appeared to be copying entire codebases, not the commits we were looking for.

‍

Step 2 — Commits to Tokens

This is the assumption that varies most by task type, and the one most worth scrutinizing. Xiao et al. (2025), Reducing Cost of LLM Agents with Trajectory Reduction, measured agent token consumption directly on SWE-bench Verified, a benchmark of real-world GitHub issues that's the closest public proxy for "produce one commit that fixes one real problem." They report ~1.0 million tokens per issue on average across baseline runs, with ~40 reasoning steps per trajectory.

‍

A complementary study, How Do AI Agents Spend Your Money?, confirms the order of magnitude across eight frontier LLMs but documents up to 30x run-to-run variance on identical tasks, and notes that Claude Sonnet 4.5 consumes over 1.5 million more tokens per task than GPT-5.We use 1 million tokens per commit as a midpoint for what is a wide distribution.

‍

Step 3 — Tokens to GPU energy

Claude runs on AWS H100/H200 nodes. Anthropic's models default to extended-reasoning mode, which Jegham et al. (2025) prices at 7.3x the per-token energy of standard inference. Applied to Claude's hardware class with shared-node concurrency (~6 requests per GPU) and 90% node utilization, the effective per-token energy is ~1.3 Wh per 1,000 tokens.

‍

For 1 million tokens, that's ~1.3 kWh of GPU energy per commit.

‍

Step 4 — GPU Energy to Site Energy

Apply the AWS data-center power usage effectiveness (PUE = 1.14, per Jegham et al. Table 3) for cooling and overhead:

1.3 kWh × 1.14 = ~1.46 kWh per commit at the site meter

Step 5 — Site energy to CO2e

Inference energy by provider

AWS / Anthropic 9.8M

Azure / Microsoft 1.3M

Other / Unknown 160k

Google 154k

kWh · trailing 365 days

Apply the AWS carbon intensity factor for the US-East regions where most Claude inference runs (CIF = 0.287 kg CO2e/kWh, per Jegham et al. Table 3):

1.46 kWh × 0.287 kg CO2e/kWh = ~420 g CO2e per commit

Per-Seat Walkthrough: A Heavy Claude Code User

‍

GitClear's analysis of GitHub history from 2020 to 2024 finds the median full-time developer produces 673 commits per year. That works out to about 56 commits per month, or 2.8 commits per active workday. It's a conservative starting point, since it doesn't factor in recent productivity gains attributable to AI tools.

‍

Drawing on Stack Overflow's 2025 Developer Survey, Faros AI's 2026 Engineering Report, and DORA's 2025 State of AI-Assisted Software Development, we estimate a hypothetical AI coding agent adopter might make about 50 agent-attributed commits per month from roughly 39 pushes per month.

The Math, End-to-End

Step	Assumption	Source	Energy	Carbon
AI user activity	39 pushes/month	GitClear; Stack Overflow 2025; DORA 2025	–	–
× Commits per push	× 1.29 → 50 commits/month	GH Archive Sept 2025 (CNaught analysis)	–	–
× Tokens per commit	× 1.0M → 50M tokens/month	Xiao et al. 2025 (SWE-bench Verified)	–	–
× Per-token energy + PUE	1.3 Wh/1K tokens × 1.14 PUE	Jegham et al. 2025 (AWS, Table 3)	–	–
× CIF	× 0.287 kg CO2e/kWh	Jegham 2025 (AWS US-East, Table 3)	73 kWh/month	–
Annualized	× 12	–	–	–
200-developer org	× 200	–	–	–

Inference chain · per-developer estimate 0 / 7

Translating to Anchors a Sustainability Team Will Recognize

‍

Using EPA-referenced aviation emission factors and standard per-passenger intensities:

Per developer · per year

~252 kg

CO2e/year

That's equivalent to ≈ 1.5 economy round-trip flights between Los Angeles and San Francisco

200-developer org · per year

~50 tCO2e

CO2e/year

That's equivalent to ≈ 11 average US passenger vehicles driven for a year

(EPA: 4.6 t CO2e/year per typical vehicle)

These numbers are small relative to emissions in most industries, but material for a digital-first SaaS company. They are also growing rapidly.

Methodology Caveats: What We Don't Measure

‍

Accounting requires being explicit about what's outside the boundary. The numbers above leave several things uncounted.

‍

Public repositories only

‍Per GitHub's Octoverse reporting, roughly 80% of developer contribution volume happens in private repos, which our pipeline doesn't see. The volume and emission numbers should be read as the observable trajectory. It is also likely that agent usage is higher in private repositories where employers pay for expensive subscriptions.

‍

Output Tokens Only

‍Our token estimate covers the tokens the agent generates. It doesn't include input tokens used to prime the task — context windows, retrieved documents, multi-step reasoning prompts. That underestimates real-world consumption, especially for long-running agent tasks.

‍

Inference, Not Training

‍Model training is energy-intensive (the Stanford AI Index estimates Grok 4's training emissions at 72,816 tCO2e), but it's a one-time cost that's already been incurred and is amortized across all subsequent inference. We model only the inference cost of agent-attributed commits, which is the marginal footprint each new piece of AI-written code adds.

‍

Autocomplete Is Not Counted

‍GitHub reports more than 20 million all-time Copilot users; none of that activity is visible in commit attribution because the human is still the committer. Tab completion, editor-mode suggestions, and similar tools have their own energy footprint that this methodology doesn't capture.

‍

Water and Hardware Manufacturing Are Not Counted

‍Water use is supported by the Jegham framework but not yet extended in our pipeline. Hardware manufacturing emissions for the GPUs running this inference (a meaningful contribution to total lifecycle footprint) aren't included either.

‍

Provider Grid Intensity is Regional

‍We use AWS US-East values for Claude inference because that's where most of it runs, but the actual region for any given workload may differ. Per-provider grid intensity values used in this analysis: 0.287 kgCO2e/kWh (AWS / Anthropic), 0.35 kgCO2e/kWh (Azure / Microsoft), 0.287 kgCO2e/kWh (Google).

‍

Net effect: the published numbers are a floor, not a ceiling. Real total footprint is meaningfully larger.

Try Carbonlog: Measure This for Your Own Team

‍

The methodology in this piece tells you how to estimate AI coding emissions in principle. Carbonlog is how you do it in practice. It's our open-source Claude Code plugin that tracks CO2 and energy consumption per AI coding session in real time, using the same Jegham et al. framework walked through above. It produces the raw inputs your sustainability team will need for Scope 3 reporting.

‍

The full detection pipeline, methodology, and historical data behind this analysis are also open source. We welcome contributions, especially from teams testing this against their own telemetry.

Get Started

Track CO2 and energy consumption per Claude Code coding session, free and open source.

Try Carbonlog

Source code, methodology documentation, and historical data.

Explore the repository

If you're building Scope 3 measurement for AI-driven code at your organization, we can help.

Talk to us