1. What the engine computes#

The engine takes a pharmaceutical asset profile and produces a risk-adjusted net present value (rNPV) plus an implied deal value decomposition (upfront, milestones, royalties). It is calibrated to Phase 2 / Phase 3 licensing and co-development deals — the segment where intrinsic-value modeling matches how real negotiations anchor. Early-stage strategic upfronts, acquisitions, and approved-asset commercialization handoffs are explicitly out of scope (see limitations).

2. Fourteen modeling dimensions#

Modern biopharma deal valuation cannot be reduced to a single rNPV number. The Ambrosia engine composes fourteen independently-calibrated dimensions, each one a distinct correction to the simplistic “peak sales × PoS × discount rate” frame:

rNPV core. Phase-transition probabilities × peak sales × discount rate × market-access delay, summed over projected cash flows.
Monte Carlo. 10,000-path simulation with phase-dependent correlation matrix, conditional floors, and fat-tailed peak sales distribution.
Real options. CRR binomial-lattice overlay for early-stage assets where compound options dominate intrinsic NPV.
Ensemble valuation. Inverse-variance blend of rNPV, comparable-transaction median, and real-options value — headline number robust to method-specific bias.
Deal structure optimizer. Runs all five deal types (licensing, acquisition, codev, option, collaboration), ranks by total value to licensor, surfaces a recommendation only when ≥20% better than current.
Counterparty premiums. Per-buyer historical premium vs. peer medians across 39 large buyers (quarterly refresh from disclosed deals).
RWE auto-tuning. Weekly backtest against 164 approved-drug trajectories produces bounded (±15%) delta proposals for key calibration parameters.
Risk decomposition. 4-bucket attribution (clinical, commercial, manufacturing, regulatory) via counterfactual re-runs with each risk source neutralized.
Macro factors. Live 10Y Treasury yield, biotech beta, equity risk premium (daily FRED refresh), clamped so final WACC stays in [5%, 25%].
Subpopulation modeling. Mutation status, demographic, severity — 31 evidence-cited entries (FDA labels, IQVIA 2024-2025).
Patent cliff timing. Per-indication market-leader LOE + biosimilar dates, mapped to the asset’s projected launch year.
Combination therapy. Effective revenue multiplier when the asset is used in a combo regimen (comboFraction × revenueShare, floored at 0.25).
Geographic decomposition. US / EU5 / Japan / China / RoW revenue curves with per-geo launch delay, ramp, and pricing multiplier.
Time-windowed PoS. Rolling cohort phase-transition rates (2014-2024, 2019-2024, 2021-2024) for indications where PoS has shifted materially (Alzheimer’s, NASH, obesity).

Dimensions 1-4 always run. Dimensions 5-14 are opt-in via calculator inputs or feature flags — default off until backtest validates each one individually.

3. Calibration framework (Option B rigor)#

Most engines publish source citations and call it rigor. Ours does both that AND measures accuracy empirically. The discipline we follow:

Empirical filter first. A change ships only if it improves measured accuracy against real disclosed deals. Source citations justify the direction; the backtest validates the magnitude.
One-sided corrections beat symmetric scaling. Our failed rounds taught us that per-indication scaling that can go both up and down tends to increase dispersion. Floor-only and ceiling-only corrections, applied to specific failure modes, work.
Held-out validation split. 80% of the corpus is used to tune; 20% is held out as a test set the engine never sees during calibration. Deterministic hash on deal id means the split is stable across runs.
Every change cited. FDA CDER approval reports, Wong/Siah/Lo 2019 biostatistics paper, Nature Reviews Drug Discovery 2024-2025, DealForma deal database, company 10-K filings. When a calibration value can’t be sourced externally, we cite the backtest itself as empirical source.
Regression discipline. The full golden master test suite (110 baselines across therapeutic areas × phases × modalities) must stay green or be intentionally re-baselined with documentation. Silent drift is treated as a bug.

4. Accuracy measurement#

We score the engine’s predicted implied deal value against real disclosed terms from 251 biopharma licensing, co-development, acquisition, and collaboration deals (2017–2026). For each deal, the engine is fed the asset profile known at deal date and computes an implied upfront. The hit rate is the share of deals where the absolute error falls within a tolerance band.

Core scope (Phase 2 / 3 licensing + codev, n=69) is the primary calibration target — the segment where intrinsic-value modeling maps onto market clearing price. Full scope (all 251) is reported for transparency but includes segments (early- stage option value, approved commercialization) where single-asset rNPV is structurally the wrong frame.

Accuracy is measured on every calibration round against the held-out test set. We track failed rounds alongside wins in the internal iteration log — the calibration journey itself is part of the model’s methodology.

5. Honest limitations#

The model is not ready for:

Early-stage strategic upfronts. Phase 1 / preclinical licensing prices on option value, not expected NPV. The engine’s intrinsic-value output will be too low; we apply empirical floors at the backtest level, but the underlying rNPV math isn’t the right frame. Use for anchoring only, not as a primary number.
Acquisitions in competitive bidding. M&A premiums are auction-driven. Carmot/Roche, Inversago/Novo, Pandion/Merck priced at 10–30× NPV. The engine can’t predict auction dynamics.
Post-approval commercialization handoffs. Approved-asset licenses to territorial partners (Pharming→CSPC China, Epizyme→Ipsen ex-US) load most value into royalties + milestones, not upfront. The backtest applies a territorial dampener but the engine’s upfront formula is structurally biased for this segment.
Platform / multi-asset deals. Single-asset rNPV cannot price a bundle with a platform premium. Upcoming portfolio-rNPV extension will address this.
Non-cash consideration. Equity, warrants, and future contingent consideration (CVRs) are not modeled. These appear as disclosed deal value in the backtest but the engine’s predicted value is cash-upfront equivalent.
Unusual therapeutic areas. Rare diseases with fewer than 3 comparable deals in our corpus, or deals in emerging modalities (ADCs pre-2020, PROTACs through 2023), have thin priors. Confidence bands widen accordingly.

6. Software architecture#

The engine has a five-layer bug-detection system designed to prevent silent drift:

Layer 1 — Mathematical invariants. Assertions that NPV cannot exceed unadjusted NPV, cumulative PoS stays in [0, 1], etc. Violations log to Sentry but never throw (production safe).
Layer 2 — Cross-engine consistency. rNPV median vs. Monte Carlo p50 vs. ensemble headline must agree within documented tolerances. Divergence generates a surfaced warning.
Layer 3 — Indication regression tests. 36 curated test cases anchor each therapeutic area × phase × modality combination to an expected output range.
Layer 4 — Data-quality guardrails. Peak sales exceeding 80% of total addressable market triggers a hard cap; competitive density exceeding market-leader peak triggers a warning.
Layer 5 — Golden masters. 110 canonical reference calculations with tolerance-based assertions. Any intentional change requires explicit re-baselining with a documented rationale.

7. Reproducing the backtest#

The full backtest is reproducible in the open-source repo:

# Clone and install
git clone github.com/ikildani/ambrosia-benchmarker-
cd ambrosia-benchmarker && npm install

# Run baseline
npx tsx scripts/run-deal-backtest.ts

# A/B test individual flags
TIER4_MACRO=on npx tsx scripts/run-deal-backtest.ts
TIER4_SUBPOP=on npx tsx scripts/run-deal-backtest.ts

# Per-round calibration diff
git diff __tests__/backtest/baseline-errors.json

Every commit to __tests__/backtest/baseline-errors.json is the live accuracy state read by the internal accuracy tracking system.

8. Model governance#

For regulated users (banks, asset managers subject to SR 11-7), we treat the model like a model:

Every calibration round is a discrete, auditable commit in the git history with before/after numbers in the commit message.
Failed rounds are preserved in the git history and internal iteration log rather than rewritten out.
The iteration log at docs/calibration-iteration-log.md captures the rationale, source citations, and deltas for every change.
Golden master regressions require explicit re-baselining with documentation — silent drift is caught by the test suite.

Questions, critique, or your own backtest?

We welcome rigorous challenge. Email issa@ambrosiaventures.co with a deal you think we got wrong, and we’ll include it in the next calibration round with the reason documented in the iteration log.

Engine Methodology