NUVC Research — Methodology, Papers, Datasets and Open Benchmarks

What does NUVC research show about VC evaluation outcomes?

Product execution is the number-one predictor of VC evaluation outcomes (r²=0.773). Across 286 scored startup evaluations, product execution variance explained 77.3% of the outcome — versus 49.2% for team strength (r²=0.492). Product execution is 1.6 times stronger than team as a predictor of VC outcomes.

Problem and solution scoring is a binary gate, not a discriminator. Within pre-screened cohorts, problem/solution explained almost zero variance (r²=0.002). It works as a yes/no filter but does not help rank deals.

LLM judges provide 6 times more score discrimination than deterministic rules. On the same 286-deck calibration set, the LLM judge produced 6x more score variance than the rule engine. Rules anchor the floor; the LLM does the ranking. NuScore v5.4 fuses both signals when they disagree.

Melbourne is the most team-focused VC hub globally (24.1%). Across 7,150 investor thesis statements analysed, Melbourne VCs mention team 24.1% of the time — the highest of any major hub. Sydney is 18.4%. San Francisco is 14.2%.

Categorical labels beat continuous scores within funded cohorts. Within pre-screened cohorts, 10X Potential categorical labels (yes/no) outperformed continuous 0-10 scores by +15 percentage points in predicting actual fundraise success.

What datasets does NUVC use to calibrate NuScore?

NUVC maintains two open research datasets. The Investor Thesis x Outcome Dataset links 7,150 investor thesis statements (free-text, multi-language) across 82+ countries against 610+ scored startup evaluation outcomes. The NuScore Calibration Set contains 1,400+ labeled VC investment decisions from 4 independent sources: 85 known-outcome decks (raised vs failed), 110 Startmate accelerator evaluations, 172 VC deal memos, and 243 AI-scored production decks.

What academic papers is NUVC submitting?

Paper 1 — NuScore v5.4: A Multi-Signal Fusion Approach to Pre-Seed Venture Evaluation. Target: ACM International Conference on AI in Finance (ICAIF 2026). Draft submission Q3 2026. Authors: Tick Jiang, Duan Tianyi, NUVC Intelligence Team.

Paper 2 — Responsible AI in Pre-Seed Venture Evaluation: A Three-Layer Fairness Framework. Target: ACM Conference on Fairness, Accountability, and Transparency (FAccT 2027). The Arendt governance framework: statistical bias detection, deterministic ethical guardrails, and LLM-driven adversarial review.

Paper 3 — Eight Agents, Thirteen Layers: A Compounding Intelligence Architecture for Venture Capital. Target: Association for the Advancement of Artificial Intelligence (AAAI 2027). Documents the multi-agent orchestration architecture achieving sub-60-second end-to-end pitch deck analysis.

What are NUVC's 8 pipeline AI agents?

Mia (Extraction Agent) — Parses pitch deck PDFs across slides, identifies team, problem, solution, traction, financials, and ask. Multi-modal: handles charts, tables, and screenshots.

Olivia (Scoring Agent) — Runs NuScore v5.4 across 7 VC-grade dimensions (team, problem/market, solution/product, traction, financials, risk/fragility, conviction). Returns scores on a 0-10 scale with confidence.

Charlotte (Integrity Agent) — Cross-checks claims for contradictions, AI-generated content detection, financial consistency, and enrichment versus deck claim alignment.

Ava (Enrichment Agent) — Public data verification via LinkedIn for team, Crunchbase for company history, WHOIS for domain age. Async, never blocks scoring.

Isla (Matching Agent) — Semantic search plus multi-signal quality reranker across 9,000+ verified investors. Filter-first architecture (stage, sector, check size).

Sophia (Benchmarking Agent) — Stage-adjusted comparison against funded AU/NZ peers. Computes industry percentile and benchmark deltas per dimension.

Amelia (Intelligence Agent) — Synthesizes the score waterfall (ranked signal contributions), raise probability estimator (10-75% by tier), and the natural-language report narrative.

Grace (Feedback Agent) — Collects user corrections, score appeals, and outcome data. Routes to retraining queues for model recalibration. The data flywheel.

What are NUVC's 13 intelligence layers?

Aristotle (Deal Lens) — score re-weighting per audience mandate across 6 fund dimensions. Laozi (Thesis Matcher) — semantic alignment of deals and funds to investor thesis. Marcus Aurelius (Batch Screener) — LP-grade fund scoring with statistical priors. Da Vinci (Portfolio Fit) — post-screening diversification and correlation overlay. Sunzi (Macro Context) — quarterly market signals and vintage timing. Seneca (Fund Library AI) — natural-language search, comparison, and benchmarks. Galileo (Fund Doc Extractor) — GP deck, DDQ, and track record extraction. Epictetus (Fund Memo) — LP-side IC memo drafting. Zhuge Liang (NuDeal Memo) — VC-side IC memo drafting with deal-lens weighting. Newton (Mandate Presets) — pre-built lens configurations for Seed VC, Angel, Growth, and Deep Tech. Hypatia (Score Explainability) — per-signal traceable reasoning for every score. Darwin (Competitive Intelligence) — adjacent-deal comparables and cohort positioning. Arendt (AI Governance) — 3-layer fairness including statistical bias detection, rules-based audit, and adversarial LLM review.

How does NUVC ensure responsible AI in venture evaluation?

NUVC never uses founder gender, ethnicity, age, location, or educational prestige as scoring inputs. Every score has a traceable reasoning chain via the Hypatia explainability layer. Every founder can appeal a score. The Arendt layer runs automated bias detection monthly across all scored decks. If a low-confidence score is produced, it is labelled directional, not definitive.

Score My Deck — Free

Research

The intelligence
behind NUVC.

Research-led. Peer-reviewable. Open methodology.

Two founders. Open methodology. Zero black boxes.

Pipeline agents

Intelligence layers

Papers in submission

1,400+

Calibrated examples

9,000+

Investors

7,150

Theses analysed

Publications

3 papers under development.

Targeting top-tier AI / finance / fairness venues. Methodology is documented in production code today, formalised for peer review.

ICAIF 2026·Draft · Submission Q3 2026

NuScore v5.4: A Multi-Signal Fusion Approach to Pre-Seed Venture Evaluation

ACM International Conference on AI in Finance · Tick Jiang, Duan Tianyi, NUVC Intelligence Team

We present NuScore v5.4, a production scoring engine for pre-seed startup pitch decks that combines an LLM-as-judge architecture with deterministic rule calibration and engineered ML features. Calibrated on 610+ labeled VC investment decisions across four independent sources, NuScore achieves stage-adjusted AU benchmarking and reports raise-probability with confidence intervals. We document the cross-check protocol that resolves LLM/rule disagreement and the score waterfall decomposition that exposes ranked signal contributions for explainability.

multi-signal fusionLLM as judgeVC evaluationcalibrationexplainability

FAccT 2027·Methodology in development

Responsible AI in Pre-Seed Venture Evaluation: A Three-Layer Fairness Framework

ACM Conference on Fairness, Accountability, and Transparency · NUVC Intelligence Team, Tick Jiang

AI scoring systems used in venture capital evaluation can perpetuate or amplify funding biases. We present the Arendt governance framework — a three-layer fairness system combining (1) statistical bias detection across founder demographics, (2) deterministic ethical guardrails that exclude protected attributes from scoring inputs, and (3) LLM-driven adversarial review. We document the score appeal protocol, automated quarterly bias audits, and the outcome accountability loop that recalibrates the model when score-to-funding correlation drifts.

responsible AIfairnessventure evaluationbias auditappeal rights

AAAI 2027·Architecture documented

Eight Agents, Thirteen Layers: A Compounding Intelligence Architecture for Venture Capital

Association for the Advancement of Artificial Intelligence · Tick Jiang, NUVC Intelligence Team, Duan Tianyi

We present a multi-agent orchestration architecture deployed in production for venture capital intelligence. Eight named pipeline agents (Mia, Olivia, Charlotte, Ava, Isla, Sophia, Amelia, Grace) execute deck extraction, scoring, integrity checking, enrichment, matching, benchmarking, synthesis, and feedback collection in parallel. Underneath, thirteen named intelligence layers (Aristotle, Laozi, Marcus Aurelius, Da Vinci, Sunzi, Seneca, Galileo, Epictetus, Zhuge Liang, Newton, Hypatia, Darwin, Arendt) compound the analysis with audience-specific lens weighting, thesis matching, batch screening, portfolio fit, macro context, and governance oversight. We document the cross-agent dependency graph, the parallelism protocol, and the latency profile achieving sub-60-second end-to-end pitch deck analysis.

multi-agent systemsAI orchestrationventure capitalagent architecture

Datasets & Benchmarks

Open methodology. Verifiable claims.

7,150 × 610+

Investor Thesis × Outcome Dataset

Original research dataset linking 7,150 investor thesis statements (free-text, multi-language) across 82+ countries against 610+ scored startup evaluation outcomes. Used to validate NuScore correlation, identify dimension importance (product r²=0.773 vs team r²=0.492), and surface geographic priority variation. Citeable as a published Schema.org Dataset.

View dataset summary

1,400+ examples

NuScore Calibration Set

Labeled VC investment decisions from 4 independent sources: 85 known-outcome decks (raised vs failed), 110 Startmate accelerator evaluations (multi-evaluator scored), 172 VC deal memos (production memos with scoring rationale), and 243 AI-scored production decks. Used to calibrate the LLM judge against deterministic rules and to validate the cross-check protocol.

Methodology in NuScore v5.4 paper

Open scoring benchmark

LP Bench — Independent Fund Benchmark

The independent benchmark for AI-powered fund scoring accuracy. Measures consistency across re-runs, vintage calibration drift, hallucination rate on fund-fact extraction, and qualitative narrative accuracy. Subset of 312 funds with verified 3-year outcome data used for predictive validity. Reproducible methodology.

Visit LP Bench

The Architecture

8 agents. 13 layers. Named.

Each agent has a job. Each layer compounds the analysis. Every name is load-bearing in the production codebase.

Pipeline Agents — run in parallel on every deck

Mia

Extraction Agent

Parses pitch deck PDFs across slides. Identifies team, problem, solution, traction, financials, ask. Multi-modal handles charts, tables, and screenshots.

Olivia

Scoring Agent

Runs NuScore v5.4 across 7 VC-grade dimensions (team, problem/market, solution/product, traction, financials, risk/fragility, conviction). Returns 0-10 with confidence.

Charlotte

Integrity Agent

Cross-checks claims for contradictions, AI-generated content detection, financial consistency, and enrichment vs deck claim alignment.

Ava

Enrichment Agent

Public data verification — LinkedIn for team, Crunchbase for company history, WHOIS for domain age. Async, never blocks scoring.

Isla

Matching Agent

Semantic search + multi-signal quality reranker across 9,000+ verified investors. Filter-first architecture (stage, sector, check size).

Sophia

Benchmarking Agent

Stage-adjusted comparison against funded AU/NZ peers. Computes industry percentile and benchmark deltas per dimension.

Amelia

Intelligence Agent

Synthesizes the score waterfall (ranked signal contributions), raise probability estimator (10–75% by tier), and the natural-language report narrative.

Grace

Feedback Agent

Collects user corrections, score appeals, and outcome data. Routes to retraining queues for model recalibration. The data flywheel.

Intelligence Layers — compound underneath

Aristotle

Deal Lens

Score re-weighting per audience mandate (6 fund dimensions)

Laozi

Thesis Matcher

Semantic alignment of deals/funds to investor thesis

Marcus Aurelius

Batch Screener

LP-grade fund scoring with statistical priors

Da Vinci

Portfolio Fit

Post-screening diversification + correlation overlay

Sunzi

Macro Context

Quarterly market signals + vintage timing

Seneca

Fund Library AI

Natural-language search, comparison, benchmarks

Galileo

Fund Doc Extractor

GP deck, DDQ, and track record extraction

Epictetus

Fund Memo

LP-side IC memo drafting

Zhuge Liang

NuDeal Memo

VC-side IC memo drafting with deal-lens weighting

Newton

Mandate Presets

Pre-built lens configurations (Seed VC, Angel, Growth, Deep Tech)

Hypatia

Score Explainability

Per-signal traceable reasoning for every score

Darwin

Competitive Intelligence

Adjacent-deal comparables and cohort positioning

Arendt

AI Governance

3-layer fairness (statistical + rules + adversarial review)

Responsible AI · Arendt Governance Layer

Transparent scoring. Bias monitored. Appeal rights.

No demographic bias in scoring

AI never uses founder gender, ethnicity, age, location, or educational prestige as scoring inputs. A first-time founder from Melbourne is scored on the same merits as a repeat founder from Stanford.

Transparent by default

Every score has a traceable reasoning chain — which data was used, how each lens was weighted, what the AI could not verify. The Hypatia layer surfaces signal contributions per dimension. No black-box decisions.

Right to appeal

Every founder can appeal a score and request human review. Scores above 9.5 or below 1.0 require human review before being shown.

Active bias monitoring

The Arendt layer runs automated bias detection monthly across all scored decks — checking for gender bias, stage bias, geographic bias, and score clustering. Patterns trigger recalibration.

Honest confidence

If data is sparse, confidence says so. We never inflate certainty. A low-confidence score is labelled directional, not definitive.

Outcome accountability

We track whether scores predict actual funding outcomes. If they don't, we recalibrate. The model earns trust — it doesn't assume it.

What We've Learned

5 findings from the dataset.

Product execution is 1.6× stronger than team

Across 286 scored startup evaluations, product execution variance explained 77.3% of the outcome (r²=0.773) — versus 49.2% for team strength (r²=0.492). The 'team is everything' narrative is wrong. Solution quality dominates.

Read the analysis

Problem/solution is a binary gate, not a discriminator

Within pre-screened cohorts, problem/solution scoring explained almost zero variance (r²=0.002). It works as a yes/no filter — but doesn't help rank deals. Implication: stop spending 40% of your scoring weight on it.

Read the analysis

Melbourne is the most team-focused VC hub globally

Across 7,150 investor thesis statements analysed, Melbourne VCs mention 'team' 24.1% of the time — the highest of any major hub. Sydney is 18.4%. SF is 14.2%. Stage-adjusted hub variance is real and matters for fundraise targeting.

Read the analysis

LLM judges provide 6× more score discrimination than rules

On the same 286-deck calibration set, the LLM judge produced 6× more score variance than the deterministic rule engine. The rules anchor the floor; the LLM does the ranking. NuScore v5.4 fuses both signals when they disagree.

Read the analysis

Categorical labels beat continuous scores within funded cohorts

Within pre-screened cohorts (already passed the funding bar), 10X Potential categorical labels (yes/no) outperformed continuous 0-10 scores by +15pp in predicting actual fundraise success. Classification > regression once you're past the gate.

Read the analysis

Cite NUVC

Citing the research?

NUVC's research is published as Schema.org Dataset and ScholarlyArticle metadata for LLM and academic citation. ChatGPT, Claude, Gemini, and Perplexity can cite NUVC methodology directly via the structured data on this page.

@misc{nuvc2026,

title= {NuScore v5.4: Multi-Signal Fusion for Pre-Seed Venture Evaluation},

author= {Jiang, Tick and Tianyi, Duan and NUVC Intelligence Team},

year= {2026},

institution= {NUVC, Melbourne, Australia},

url= {https://nuvc.ai/research},

}

llms.txt for LLM agents·Tick Jiang on LinkedIn

Read. Cite. Build with us.

Read the papers. Cite the dataset. Use the API. The methodology is open — the production system is what we sell.

Read the papers Or visit the LP Bench Use the API

Loading NUVC...

Please wait a moment...

The intelligencebehind NUVC.

3 papers under development.

NuScore v5.4: A Multi-Signal Fusion Approach to Pre-Seed Venture Evaluation

Responsible AI in Pre-Seed Venture Evaluation: A Three-Layer Fairness Framework

Eight Agents, Thirteen Layers: A Compounding Intelligence Architecture for Venture Capital

Open methodology. Verifiable claims.

Investor Thesis × Outcome Dataset

NuScore Calibration Set

LP Bench — Independent Fund Benchmark

8 agents. 13 layers. Named.

Transparent scoring. Bias monitored. Appeal rights.

No demographic bias in scoring

Transparent by default

Right to appeal

Active bias monitoring

Honest confidence

Outcome accountability

5 findings from the dataset.

Product execution is 1.6× stronger than team

Problem/solution is a binary gate, not a discriminator

Melbourne is the most team-focused VC hub globally

LLM judges provide 6× more score discrimination than rules

Categorical labels beat continuous scores within funded cohorts

Citing the research?

Read. Cite. Build with us.

Loading NUVC...

The intelligencebehind NUVC.

3 papers under development.

NuScore v5.4: A Multi-Signal Fusion Approach to Pre-Seed Venture Evaluation

Responsible AI in Pre-Seed Venture Evaluation: A Three-Layer Fairness Framework

Eight Agents, Thirteen Layers: A Compounding Intelligence Architecture for Venture Capital

Open methodology. Verifiable claims.

Investor Thesis × Outcome Dataset

NuScore Calibration Set

LP Bench — Independent Fund Benchmark

8 agents. 13 layers. Named.

Transparent scoring. Bias monitored. Appeal rights.

No demographic bias in scoring

Transparent by default

Right to appeal

Active bias monitoring

Honest confidence

Outcome accountability

5 findings from the dataset.

Product execution is 1.6× stronger than team

Problem/solution is a binary gate, not a discriminator

Melbourne is the most team-focused VC hub globally

LLM judges provide 6× more score discrimination than rules

Categorical labels beat continuous scores within funded cohorts

Citing the research?

Read. Cite. Build with us.

The intelligence
behind NUVC.

The intelligence
behind NUVC.