Accuracy at Scale: How Emmi’s Machine Learning Outperforms Traditional Models
- Skye Frugte
- May 7
- 5 min read
As reporting requirements begin to ramp up globally from 2025, investors increasingly need to quantify their portfolio’s ‘financed emissions’. As we move ever closer to 2030, and beyond, the implications of carbon footprints for risk and investment management will also only get greater.
The extent of the data available for these purposes will depend on reporting requirements for each investee company or asset. For many, there will be very little to go on.

The Superior Accuracy of Emmi's Machine Learning Model
Emmi’s Machine Learning (ML) model generates emissions with superior accuracy when compared to other products in the market. Emmi's model uses the reported data universe more intelligently, to inform emissions calculations, avoiding the linear averages and assumptions that limit traditional factor-based approaches. Emmi then translates these emissions into climate risk analysis across all major asset classes.
Emmi’s model is the result of pioneering research undertaken by its co-founder Dr Ben McNeil, and climate-finance researchers from the University of Otago and Griffith University. While machine learning has a long history of use in finance, its application in the field of carbon in this way is cutting edge.
The model identifies complex, non-linear, patterns in the financial data and reported emissions of ~10,000 public companies. It learns the relationships between classification, geography, scale, financials, and historical emissions.
The ML model then uses this understanding, and available financial data, to ‘now-cast’ emissions for public companies (before reported data are available) and overcome carbon data gaps in other markets such as fixed income and private equity.
The Emmi model’s superior accuracy stems from its understanding that emissions do not scale proportionally with revenue across different sectors and countries (below is a worked example). It also employs a specialised intensity model for high-materiality sectors with systematic Scope 3 underreporting.
Emmi's approach also provides comprehensive breakdowns across all three scopes, applicable to companies of all sizes and industries
Our latest research demonstrates that, when comparing methods across the reported universe, Emmi’s ML approach outperforms industry factors by up to seven times the accuracy, when using the same data and sparse set of input features.
So what are ‘factor approaches’ to emissions modelling?
As we have noted, the research behind Emmi’s ML model is ground-breaking in the industry. The most common approach to estimating company emissions is Spend-based Industry Emission factors, as they are somewhat easy to apply, and directly translate financial values into CO2e per dollar spent or earned.
These models, such as the Environmentally Extended Input-Output (EEIO), have a number of limitations. First and foremost, they map broad sector trends and then apply the same emissions intensity factors (CO2e per dollar) to all companies in a sector. This completely disregards significant differences in processes, efficiency, technology, and scale. In addition, they cannot effectively separate Scope 1, 2, and 3 emissions, often underestimating or entirely overlooking crucial Scope 3 components such as ‘Product Use’ and ‘Product Processing’.
Factor models also depend on static older datasets with limited coverage. The gaps in factor-method’s data are filled with analysts' assumptions and uplift factors, at both sector and emissions levels, embedding biases that predefine risk trends.
The factor approaches lack the granularity necessary for complex assets held by sophisticated investors, limiting their coverage and the detail available. The black-box structure reduces portfolio sensitivity and the accuracy of materiality assessments.
Due to evolving policies, markets, and technologies, the static, multi-period sectoral models have become outdated.
So, in short – Emmi vs Factors...
Emmi ML Estimates | Traditional EEIO Factor Estimates |
Reported data from ~10,000 public companies, ensuring accurate company-level emissions estimates | Static sector-average data that fail to capture company-specific variations |
Learns non-linear relationships between classification, geography, scale, and emissions | Assume emissions scale proportionally with revenue |
Tracks reported data more intelligently with less bias | Linear sector averages resulting in large biases and errors |
Uses Scope 1, 2 and 3 separately for independent estimates | No breakdown of Scope 1, 2 and 3 emissions |
Covers all categories of Scope 3 from the reported data | Downstream Scope 3 not covered, despite being the most material category |
Full understanding of uncertainties against the reported data | No understanding of uncertainties |
Adaptable to regulatory requirements and market conditions | Static and rigid models. |
Covers all major public and private asset classes | Limited coverage and detail for complex asset classes |
Transparent emissions insights that connect emissions to financial implications | Black-box approach hinders a systematic portfolio sensitivity and materiality assessment |
Example: Non-linear emissions change with the company's growing size based on revenue.
Taking the example of the United States steel manufacturing sector, let’s assume the model suggests a sector revenue of $100 billion which produces 500 million tonnes of CO2e. The resulting EEIO emission factor is therefore 5 tonnes CO2e per $1000 revenue.
To estimate emissions for a specific steel company (Company A) you multiply their revenue by the factor (e.g. $2 billion x 5 tonnes/$1000 = 10 million tonnes CO2e).
Let’s say Company B is in Australia and has the same revenue as Company A, but uses electric arc furnaces for more efficient production. Using EEIO both companies will have the same emissions estimate.
On the other hand, the Emmi ML method uses actual emissions patterns from reported data. It learns that emissions may not always increase at the same rate as revenue within and across sectors and countries.
For instance, in the steel manufacturing industry, Emmi ML might learn that emissions per dollar of revenue increase slowly up to $500 million, then increase at a faster rate between $500 million and $2 billion, but see diminishing increases beyond $2 billion.
To demonstrate this concept (graph below), we present the Scope 1 emissions as a function of revenue for Utilities, which has the highest emissions (around 2.3 billion tonnes globally) of any industry.
This graph shows reported 'actual' emissions compared to both the Industry Factor Estimate and the Emmi ML estimate - the non-linear relationship between revenue and emissions is clear. While the ML estimates accurately follow this non-linearity, the average factor method consistently overestimates the carbon footprint of many companies. In particular, for companies with revenues ranging from $100 million to $1 billion the industry factor method results in estimates that are 10-100 times higher emissions than reported.
(We should note one limitation of the ML method is that there are not enough training data for the models to estimate the carbon emissions of companies with less than $100 million in revenue. As such, these footprints are assumed by Emmi to be a constant ~1,000 tonnes.
(However, at an investor level, these smaller footprints have minimal impact on overall financed emissions calculations.)
Electric Utilities (Total Revenue: $2,212,755,323,721; Total Emissions: 2,345,313,950 tCO2e)
Emmi’s proprietary machine-learning model gives investors a more accurate and actionable view of financed emissions across all major asset classes. As carbon footprints become increasingly material to investment and risk management decisions, relying on static factor models is no longer enough.
Our research shows that the choice of modelling method can materially shift risk exposure and outcomes. For a more apparent connection between emissions, financial value, and portfolio strategy, Emmi’s ML model offers a trusted and scalable solution.
Explore our latest research to see how better carbon data supports better investment management.
Emmi’s proprietary machine-learning model provides financed emissions data and climate risk analysis across all major public and private asset classes.
Built on objective, complete, accurate, and timely data principles, Emmi’s methodology ensures that every data point is transparent and reusable.
Our tools also translate emissions into financial implications, based on climate and pricing scenarios. This gives our clients actionable insights about their carbon exposure.
Comments