Independent Accuracy Benchmark

LP Bench

The fund scoring accuracy benchmark for AI-powered LP intelligence.

NUVC publishes this benchmark because trust in AI fund scoring requires measurable proof — not marketing claims. LP Bench tests both quantitative accuracy and qualitative narrative quality, the Science + Art balance made verifiable.

2,285

Funds scored

94%

Score consistency (std < 0.15)

±0.8%

Vintage calibration vs Cambridge Associates

2.1%

DD hallucination rate

What We Measure

Six independent dimensions covering both the quantitative signal layer and the qualitative narrative layer — because right capital allocation requires both.

Score Consistency

Same fund deck, 5 independent runs. Standard deviation across runs measures LLM scoring stability.

0.11

Target: Std dev < 0.15

Marcus Aurelius (Batch Screener) produces consistent 6-dimension scores across repeated runs. Cross-check layer eliminates outlier LLM responses.

Vintage Calibration

NUVC vintage benchmarks vs Cambridge Associates global VC return data. Tests whether NUVC benchmark priors are calibrated to real market outcomes.

±0.8%

Target: Deviation < ±2%

AU/NZ market-specific priors from State of Funding 2022–2025 + Cambridge Associates global quartile alignment. Recalibrated quarterly.

GP Trajectory Prediction

Among funds scoring ≥ 7.0 (high conviction), what % outperform the median TVPI at 3-year mark? Tests whether the score has predictive validity.

67%

Target: ≥60% outperform median

Measured on committed funds in the NUVC LP network with 3+ year track records. Score ≥ 7.0 = top quartile conviction. Validated against actual fund performance data.

Mandate Alignment Accuracy

For FOs that committed to funds recommended by NUVC matching, what % matched within their stated thesis alignment threshold (≥ 80%)?

81%

Target: ≥75% within mandate

Bidirectional mandate matching (FO ↔ GP). Tested on Tier 1 FO cohort with explicit mandate configs. Includes sector, fund size, vintage, and geography gates.

DD Hallucination Rate

% of factual claims in AI-generated DD reports that could not be grounded in source documents (fund deck, DDQ, or track record). Measured by Hypatia (claim verifier).

2.1%

Target: < 5%

Hypatia cross-references every DD claim against extracted source document data. Ungrounded claims are flagged with amber confidence badges in the live product, not silently accepted.

Qualitative Narrative Agreement

Inter-rater agreement between NUVC qualitative fund narrative and independent LP analyst panel (n=12) across 50 fund assessments.

0.71

Target: Cohen's κ > 0.65

Science + Art principle applied to measurement: quantitative score accuracy is necessary but not sufficient. Qualitative narrative must also agree with expert LP judgment. Tested on team dynamics, thesis authenticity, and GP communication quality dimensions.

Why We Measure Both Science and Art

Most AI fund scoring tools publish only quantitative accuracy metrics — consistency and calibration. These are necessary, but not sufficient.

LP decisions rely equally on qualitative judgment: is this GP's thesis authentic? Does the team dynamic suggest longevity? Is the communication style right for a long-term LP relationship?

LP Bench is the only fund scoring benchmark that measures qualitative narrative agreement alongside quantitative accuracy — because capital allocation that ignores either dimension misses what actually matters.

Quantitative

Score consistency
Vintage calibration
TVPI prediction
Hallucination rate

Qualitative

Team dynamics assessment
Thesis authenticity
GP communication quality
LP panel agreement

Methodology

Open methodology. LP Bench is recalibrated quarterly as new committed fund outcome data becomes available.

Source Corpus

2,285 funds from the NUVC fund library — AU/NZ emerging managers, global VC, PE, and private credit. Subset of 312 funds with verified 3-year outcome data used for predictive validity.

Consistency Testing

Each fund document submitted 5 times with identical inputs. Score std deviation measured across runs. Cross-check layer validated for LLM outlier suppression.

Vintage Calibration

NUVC benchmark priors compared to Cambridge Associates global VC quartile data (Q1/Q2/Q3 TVPI and IRR by vintage year) and AU State of Funding 2022–2025 market data.

Hallucination Audit

Hypatia claim verifier run across 200 DD reports (sampled across fund types). Every factual claim cross-referenced against fund deck, DDQ, and track record source documents. Ungrounded claims counted.

Qualitative Panel

12 experienced LPs independently assessed 50 fund profiles across 4 qualitative dimensions. NUVC narrative assessments scored against panel consensus using Cohen's kappa.

Quarterly Update

LP Bench is recalibrated quarterly as new committed fund outcome data becomes available. Score is not frozen — it improves as the platform learns.

Why LP Bench Differs from Legal AI Benchmarks

Legal AI tools like Harvey benchmark document drafting speed and citation accuracy. Capital allocation intelligence requires a different standard.

Dimension	Legal AI	NUVC LP Bench
Quantitative accuracy	Citation accuracy	Score consistency + vintage calibration
Qualitative accuracy	Not measured	LP panel inter-rater agreement (Cohen's κ)
Predictive validity	N/A (legal drafts)	Score ≥ 7.0 → TVPI outperformance rate
Hallucination testing	Legal citation verification	Fund claim grounding in source documents
Network effects	None	Benchmark improves as committed fund outcomes accrue
Update cadence	Ad hoc	Quarterly recalibration

Intelligence that earns its place in your process

LP Bench is how we hold ourselves accountable. Every score NUVC produces — for every fund, every FO, every capital allocation decision — is measured against this standard.

See NUVC for Family Offices Request methodology details

Loading NUVC...

Please wait a moment...