Correcting the Factor Mirage: A Research Protocol for Causal Factor Investing
By Marcos López de Prado & Vincent Zoonekynd, ADIA Lab (2024)
This paper lays the theoretical and methodological groundwork for a new approach to factor investing: one that replaces correlational techniques with causal inference. Building on earlier critiques of p-hacking and data mining, the authors focus on a subtler, more damaging problem—the widespread use of factor models that are statistically valid but causally wrong.
What’s the Problem?
Traditional factor investing relies heavily on tools like linear regression, p-values, and maximising in-sample performance. While this can generate strong backtests, it often leads to poor out-of-sample results because it ignores the true cause-and-effect structure behind returns. The paper introduces the term "factor mirage" to describe this issue: models that appear sound but are based on flawed causal logic.
Two common issues are:
Confounder bias: arises when a hidden variable affects both a factor and asset returns, leading to false attribution.
Collider bias: occurs when a model controls for a variable that is the consequence of both a factor and returns, artificially inducing a correlation.
Core Contribution
The authors present a practical seven-step research protocol for designing factor models grounded in causal reasoning rather than naive correlation.
1. Variable Selection
Use machine learning tools (e.g. SHAP values, mutual information) to identify candidate variables related to returns, without assuming linearity.
2. Causal Discovery
Build a directed acyclic graph (DAG) using domain expertise and discovery algorithms (e.g. PC, LiNGAM) to map causal relationships between variables.
3. Causal Adjustment
Apply do-calculus to identify which variables should be controlled for in a regression (i.e. confounders), and which must be excluded (e.g. colliders).
4. Estimating Causal Effects
Use techniques like double machine learning to estimate the magnitude of causal effects and understand whether they explain, predict, or rank asset returns.
5. Portfolio Construction
Translate causal effects into position sizing while minimising exposure to non-causal risks. Include hedging, stress-testing, and tracking distortions from constraints or transaction costs.
6. Backtesting
Avoid relying solely on walk-forward backtests. Use resampling and Monte Carlo simulation to model uncertainty in the data-generating process, incorporating economic theory into simulations.
7. Multiple Testing Adjustments
Account for the multiple comparisons problem using p-value adjustments or tools like the Deflated Sharpe Ratio (DSR), especially when testing many factors.
Practical Implications
This protocol is not just academic theory. It is designed to be used by:
Quantitative researchers building systematic strategies
Portfolio managers assessing new signals
Asset owners and consultants conducting due diligence
The paper includes a checklist in the appendix that can be used to evaluate whether a proposed strategy meets causal standards.
Why This Matters
The move from correlation to causation has already transformed fields like medicine and policy analysis. In finance, it promises more robust strategies, clearer attribution of performance, and fewer surprises when models move from research to production.
This paper provides both a critique of the current practice and a roadmap to do better. It is a call to replace the illusion of precision with disciplined, falsifiable reasoning. Factor investing can become more scientific—but only if it embraces causality.