How to Spot AI Hype in a Pitch Deck: 25 Red Flags Investors Should Know

Every startup in 2026 claims to be "AI-powered." The pitch decks are full of impressive-sounding terms: agents, knowledge graphs, proprietary models, intelligence layers. But how do you tell the difference between a company that has genuinely built something new and one that is dressing up basic software in AI language?

After analysing hundreds of pitch decks, we have identified 25 red flags across seven categories. None of them are automatic deal-breakers — but each one is a question worth asking. The more flags you spot in a single deck, the deeper you should dig.

Category 1: Borrowed Technology Dressed as Proprietary

These red flags appear when a startup presents someone else's technology as their own.

Red Flag 1: The AI model's name is in the architecture diagram

When a startup's system diagram has a box that says "Claude API" or "GPT-4o" or "Gemini" — that box IS the intelligence. The entire AI layer is a call to someone else's service.

Think of it this way. If a restaurant's kitchen diagram included a box that said "Uber Eats delivery," you would wonder what cooking actually happens in the kitchen. The same principle applies. The AI model is the ingredient supplier. The startup needs to show what they cook with it.

Real AI companies abstract the model layer. They can swap between different AI providers without the product changing. If a specific model name is featured in the architecture slide, the startup's value is likely thin — the intelligence belongs to the AI provider, not to them.

What to ask: "If Claude/GPT disappeared tomorrow, would your product still work? How long would it take to switch to a different AI provider?"

Red Flag 2: "MCP integration" listed as a feature

MCP stands for Model Context Protocol. It is an open-source standard that lets AI assistants connect to external tools and data sources. Think of it like a USB port — a universal way to plug things in.

It is useful infrastructure. But it is free, it is open-source, and it takes about a day to set up. Listing "MCP integration" as a competitive advantage is like listing "has an internet connection" as a feature.

The question is not whether you have MCP. It is: what tools have you built that are worth connecting to? The port does not matter. What you plug into it does.

What to ask: "What specific tools or data sources does your MCP server expose? How long did it take to build them?"

Red Flag 3: "Built on LangChain / CrewAI / AutoGen"

These are open-source software frameworks that help developers build AI applications. They are useful starting points — like building a house using standard bricks.

But listing an open-source framework as core technology is like a web company listing "built on React" as a competitive advantage. Thousands of companies use React. The framework is not the moat. What you build on top of it is.

What to ask: "What did you build that goes beyond what the framework provides out of the box?"

Red Flag 4: "RAG pipeline" as a differentiator

RAG stands for retrieval-augmented generation. In simple terms: before asking the AI a question, you first search a database for relevant information, then give that information to the AI along with the question. It is like giving a student the relevant textbook chapter before asking them an exam question.

Every serious AI application uses some form of this pattern. It is table stakes, not a differentiator. Claiming RAG as a competitive advantage is like claiming "we use a database" as a feature.

The real questions are: What is in the database? How good is the search? How do you decide what is relevant? That is where the value lives — not in the pattern itself.

What to ask: "What is in your retrieval database that nobody else has? How do you ensure the AI gets the right information?"

Category 2: Number Inflation

These red flags appear when a startup uses big numbers to imply capabilities that do not match reality.

Red Flag 5: "100+ AI agents"

This is one of the most common claims in AI pitch decks. When someone says they have dozens or hundreds of AI agents, ask a simpler question: how many different types of agent did they actually build?

Most "multi-agent" systems are really one AI model running the same basic task with different instructions. Imagine a call centre where every operator reads from a different script — that is not 100 different employees with 100 different skills. That is one job role with 100 scripts.

There is nothing wrong with this approach. But calling it "100+ agents" is like calling a mail merge "100 personalised letters."

What to ask: "How many distinct agent types did you build? How do they coordinate? What happens when one agent's output contradicts another's?"

Red Flag 6: "Trained on X million data points"

This sounds impressive, but the number is meaningless without context.

"Trained on 10 million data points" could mean two very different things:

They carefully labelled 10 million domain-specific examples from their particular industry — like writing a detailed textbook
They scraped 10 million web pages and fed them into a model — like photocopying a library

The difference between these two is enormous. A carefully labelled dataset of 50,000 examples is often more valuable than a noisy dataset of 50 million.

What to ask: "What kind of data? Who labelled it? How was quality controlled? What specific improvement did the training produce?"

Red Flag 7: "Processes X documents per second"

Processing speed is usually the AI provider's feature, not the startup's. If you are sending documents to Claude or GPT for analysis, the speed is determined by the AI provider's servers, not by your system.

The startup's contribution is what happens before the AI call (preparing the data, deciding what to analyse) and after (interpreting results, combining with other information, presenting to users). The middle part — the actual AI processing — belongs to someone else.

What to ask: "What processing do YOU do before and after the AI call? How much of the total time is spent on your systems vs the AI provider?"

Red Flag 8: "X% accuracy"

"95% accuracy" sounds great. But accuracy at what? Measured against what benchmark? Evaluated by whom?

A test set that the startup created and curated will naturally produce high accuracy numbers. Real-world data is messy, ambiguous, and full of edge cases that carefully constructed test sets do not include.

What to ask: "What benchmark did you measure against? Who created the test data? What is the accuracy on data you have never seen before — data from a customer who did not know they were being tested?"

Category 3: Vague AI Claims

These red flags appear when a startup uses AI terminology without substance behind it.

Red Flag 9: "AI-native" / "AI-first" / "AI-powered"

These terms are used so broadly that they have lost most of their meaning. Here is a simple test: remove the AI from the product description. Does the product still make sense?

AI-native: The product literally cannot exist without AI. Remove the AI and there is nothing left. A self-driving car is AI-native — without the AI, it is just a car that does not move.
AI-enhanced: A traditional product that uses AI to make certain tasks faster or better. Most enterprise software is here. A word processor with AI writing suggestions is AI-enhanced.
AI-branded: A traditional product with "AI" in the marketing. The AI does very little that matters.

Most decks claiming "AI-native" are actually AI-enhanced products. Nothing wrong with that — but the competitive moat is completely different. AI-enhanced products win on design and distribution. AI-native products win on data quality and learning loops.

What to ask: "What would this product look like without AI? Which specific features are impossible without it?"

Red Flag 10: "Knowledge graph" / "Context lake" / "Intelligence layer"

These terms appear in nearly half of AI pitch decks. They all describe a data layer — a place where the startup stores and organises information. But the sophistication varies enormously.

At the basic end: a search database with some AI-generated summaries. Any competent engineer could set this up in a weekend using off-the-shelf tools. It is useful, but it is not a competitive advantage.

At the sophisticated end: a database that maps relationships between entities (people, companies, deals, funds), tracks how information changes over time, records where each piece of data came from (so you can verify it), and learns from how users interact with it. This takes a team of engineers 6-12 months to build properly. And it only becomes truly valuable once enough data flows through it to create network effects.

Most startups claiming a "knowledge graph" have something much closer to the basic end.

What to ask: "What is actually in your data layer? How is it structured? Where does the data come from? How do you track what has changed over time?"

Red Flag 11: "Proprietary model"

There is a 100x difference in investment between writing a set of instructions for someone else's AI model and training your own custom model.

A system prompt is a set of instructions that tells an AI model how to behave — like a recipe card. "You are a financial analyst. When you see a pitch deck, extract these 15 fields." Anyone can write one. It costs nothing. It takes hours to create.

A trained model is a custom neural network that has been specifically taught to recognise patterns in your domain. It costs thousands to millions of dollars. It requires a team of ML engineers. It takes months to build.

When a startup says "proprietary model," ask which one they have. Most of the time, it is a system prompt — which is fine, but it is not a moat.

What to ask: "Did you train a model or write a prompt? What is the base model? How many training examples? What improved vs using the base model without training?"

Red Flag 12: "AI co-pilot"

This is the single most overused term in AI pitch decks. Every chatbot is now a "co-pilot." The word has become meaningless.

What to ask: "What specific decisions does the co-pilot make? What happens when the co-pilot is wrong? Who is responsible for the co-pilot's output? How does the co-pilot learn from mistakes?"

Red Flag 13: "Agentic workflows" without explaining the workflow

"Agentic" means the AI can take actions, not just answer questions. A chatbot is not agentic — it just responds. An agent can go search a database, make an API call, update a record, or trigger a process.

But "agentic" has become a buzzword. When a deck says "agentic workflows," ask:

What triggers the agent?
What decisions does it make?
What happens when it makes a wrong decision?
Who oversees the agent?
What is the fallback when it fails?

If the answers are vague, the "agentic workflow" is probably just a chatbot that can call an API.

What to ask: "Walk me through a specific workflow end to end. Where does the agent make a decision? What are the failure modes?"

Category 4: Demo and Product Red Flags

These red flags appear in how the product is shown, not just how it is described.

Red Flag 14: The demo looks like ChatGPT with a logo

If the product demo is a chat window with the startup's branding on it, the entire product might be a thin wrapper around an AI API.

Look for things that ChatGPT cannot do on its own:

Custom data visualisations and charts
Structured analysis outputs (not just text responses)
Domain-specific user interfaces
Information pulled from proprietary data sources
Multi-step workflows with decision points

If you could recreate the demo by pasting a prompt into ChatGPT, it is not a product — it is a prompt.

What to ask: "Can you show me a feature that requires your system specifically? Something I could not do by pasting a prompt into ChatGPT?"

Red Flag 15: Only one demo example, and it is perfect

Real products handle messy, incomplete, contradictory data. Demos only handle clean examples.

The best test: hand them a real document from your world — something they have never seen. Watch what happens. How long does it take? What does it get wrong? How does it handle ambiguity?

The difference between a demo and a product is the same as the difference between a rehearsed speech and a real conversation.

What to ask: "Can I give you a document right now that you have never seen? Let me watch what happens."

Red Flag 16: No error states or edge case handling shown

Every real product handles errors. Users upload corrupt files. Data is missing. The AI produces nonsense. Network connections fail.

If a demo only shows the happy path — everything working perfectly — ask: what does the product do when things go wrong? How does it tell the user that the AI is not confident? What happens when the AI does not have enough information to give a good answer?

Products that handle failure gracefully are products that have been used by real people. Products that only show success are products that have not been tested.

What to ask: "What happens when the AI gets it wrong? How does the user know the AI is not confident? Can you show me an error state?"

Category 5: Team Gaps

These red flags appear when the team does not match the technical claims.

Red Flag 17: No AI/ML engineer on the founding team

A founding team of business people and a "full-stack developer" claiming "AI-native" is a red flag. Building real AI systems requires people who understand data pipelines, model evaluation, embedding spaces, and cost optimisation — not just API integration.

Check the team's public profiles. Are there code repositories that show real AI infrastructure work? Or just code that calls someone else's API?

Using AI is like driving a car. Building AI systems is like designing the engine.

What to ask: "Who on the team has built AI systems before — not used them, but built them? Show me something they have shipped."

Red Flag 18: AI team hired after the deck was written

If the AI engineering team joined in the last 3 months but the deck claims 18 months of AI development, something does not add up. Check hiring dates against the deck's timeline claims.

What to ask: "When did your AI team start building? Who was the first AI engineer? What was the first AI feature shipped?"

Red Flag 19: Founder's AI experience is "used ChatGPT extensively"

Many founders genuinely believe that because they have used AI tools productively, they can build AI products. The gap is enormous. It is the difference between being a good driver and designing a car engine.

Look for founders who have: built AI products at previous companies, contributed to open-source AI projects, published research or technical writing about AI systems, or have engineering teams with deep AI infrastructure experience.

What to ask: "What is the most complex AI system your team has built before this company? What was the hardest technical challenge?"

Category 6: Economics That Do Not Add Up

These red flags appear when the AI cost structure does not match the business model.

Red Flag 20: Cannot answer "what does each analysis cost?"

This is the single most revealing question you can ask an AI startup.

Running a large AI model on a 50-page document costs roughly $0.15-0.50 per call. Now multiply that by the number of "agents" they claim to run and the number of documents they claim to process. The maths gets expensive fast.

Companies that have truly built AI at scale are obsessed with cost. They build smart shortcuts:

Simple rule-based filters that throw out obvious non-matches before the expensive AI ever touches the data
Caching systems that remember previous results instead of re-running the AI
Routing systems that send simple tasks to cheaper AI models and only use expensive models for hard problems
Batching systems that combine multiple requests into one to reduce overhead

If a founder cannot tell you their cost per analysis within 30 seconds, they probably have not built it at scale yet.

What to ask: "What is your cost per analysis? Break it down for me. What do you spend on AI per customer per month?"

Red Flag 21: Pricing too low for claimed AI volume

If a startup charges $2,000/year and claims to run hundreds of AI analyses per customer, do the maths. At $0.20 per AI call with 10 agents per analysis and 100 analyses per customer, that is $200/customer just in AI costs. Add server costs, engineering salaries, and everything else — does the pricing actually work?

Sometimes the answer is "we are subsidising to grow" — which is a valid strategy. But if the founder has not done this calculation, the AI claims probably exceed reality.

What to ask: "Walk me through your unit economics. How much does each customer cost you in AI inference? At what scale does this become profitable?"

Red Flag 22: No discussion of cost optimisation

Cost optimisation is one of the most important disciplines in building AI products. Real AI companies talk about:

Pre-filtering: Using cheap, fast rules to eliminate data that does not need expensive AI analysis
Caching: Remembering results so you do not re-run the same analysis twice
Model routing: Using cheaper AI models for simple tasks and expensive ones only for hard problems
Batching: Combining multiple small requests into fewer large ones

If none of this comes up in a conversation about the product, the AI cost structure has not been thought through — which usually means the AI is not doing as much as claimed.

What to ask: "How do you keep AI costs down? What percentage of requests actually need your most expensive AI processing?"

Category 7: Structure and Process Red Flags

These red flags appear in how the AI system is described structurally.

Red Flag 23: No evaluation or improvement framework

How does the startup know whether their AI is getting better?

"We look at the outputs" is a fine answer when you have 10 customers. It is not fine at 1,000. At scale, you need systematic measurement — test sets that track accuracy over time, feedback loops that capture user corrections, and processes that use those corrections to improve the AI.

If the answer to "how do you know your AI is improving?" is "we look at it" — the AI is probably not improving.

What to ask: "How do you measure AI quality? What is your accuracy now vs six months ago? How do user corrections feed back into the system?"

Red Flag 24: Context window or token counts as features

"Supports 1 million token context" or "processes documents up to 500 pages" — these are the AI provider's capabilities, not the startup's. If the deck mentions token counts or context windows as competitive advantages, the value is being attributed to the underlying model, not to what was built on top.

It is like a taxi company advertising "top speed of 200mph" because that is what the car can do. The car's capability is not the taxi company's feature.

What to ask: "What is YOUR contribution beyond what the underlying AI model provides?"

Red Flag 25: "Fine-tuned model" without specifics

Fine-tuning is the process of teaching an existing AI model to be better at a specific task using your own data. It can produce real improvements — but the value depends entirely on the details.

Key questions:

What model did you start with?
How much training data did you use? (100 examples is a toy. 10,000 is serious. 100,000+ is rare.)
What specifically improved? By how much?
How do you prevent the model from getting worse at things it used to be good at?

"We fine-tuned a model" without answers to these questions is like saying "we customised a car" — it could mean a paint job or a new engine.

What to ask: "What is the base model? How many training examples? What was accuracy before and after fine-tuning? Can you show me the evaluation results?"

The Scoring Approach

None of these red flags are automatic deal-breakers. A startup might have three or four of them and still be building something genuinely valuable. The question is always: does the reality match the claims?

We think about this on a spectrum:

Technical Credibility Spectrum

Level

What It Means

Verified

Technical claims match external evidence. Team has built what they describe.

Plausible

Claims are reasonable given team background and stage, but cannot be independently verified yet.

Unverifiable

Not enough evidence to assess either way. Could be real, could be vapour.

Overstated

Some real technology exists, but the deck significantly exaggerates what has been built.

Contradicted

External evidence actively contradicts the deck's claims.

The goal is not to catch liars. Most founders genuinely believe in what they are building. The goal is to calibrate — to understand the gap between vision and reality.

How We Think About This

We are building intelligence that assesses technical credibility the same way we assess founder quality or market opportunity — with structured dimensions, weighted scoring, confidence levels, and evidence requirements.

The six dimensions we evaluate:

AI Technical Credibility Dimensions

1. Architecture depth

Is this one AI model with different instructions, or genuinely different systems that work together?

2. Claim verification

Can we find external evidence (code repositories, published packages, patents, conference talks) that supports the technical claims?

3. Cost awareness

Does the team understand and communicate the economics of their AI system?

4. Data advantage

Is the data layer a real competitive moat, or off-the-shelf infrastructure?

5. Team depth

Has the engineering team built AI systems before — not used them, but built them?

6. Quality framework

How does the startup measure and improve AI quality over time?

Because in 2026, every startup is "AI-powered."

The question is: powered by what, exactly?

This post was produced by the NUVC Intelligence Team. NUVC is an AI-native venture capital intelligence platform. We score startups, match them with investors, and screen funds for family offices.

Frequently Asked Questions

How do I evaluate AI claims in a pitch deck?

Look for 25 red flags across seven categories: borrowed technology (listing AI model names or open-source frameworks as proprietary), number inflation (vague "100+ agents" or "millions of data points" claims), vague AI terminology ("AI-native" without substance), demo gaps (chat windows with a logo), team gaps (no ML engineers), economics that do not add up (cannot explain cost per analysis), and missing process frameworks (no evaluation or improvement system). The more flags you spot, the deeper you should dig.

What is the difference between AI-native and AI-enhanced?

AI-native means the product literally cannot exist without AI — remove the intelligence and there is nothing left. AI-enhanced means a traditional product uses AI to make specific tasks faster or better. Most startups claiming "AI-native" are actually AI-enhanced. The distinction matters because the competitive moats are completely different: AI-native products win on data quality and learning loops, while AI-enhanced products win on design and distribution.

What should I ask an AI startup about their costs?

Ask "what does each analysis cost?" If the founder cannot answer within 30 seconds, they probably have not built at scale yet. Then ask about cost optimisation techniques: pre-filtering (cheap rules before expensive AI), caching (not re-running the same analysis), model routing (cheaper models for simple tasks), and batching. Companies that have genuinely built AI at scale are obsessed with these economics.

How can I tell if a startup has a real AI model vs just a prompt?

Ask whether they trained a model or wrote a prompt. A system prompt is a set of instructions for someone else's AI — it costs nothing and takes hours. A trained model is a custom neural network that costs thousands to millions and takes months. Most "proprietary models" are system prompts. Neither is inherently bad, but the competitive moat is 100x different.

Category 1: Borrowed Technology Dressed as Proprietary

These red flags appear when a startup presents someone else's technology as their own.

Red Flag 1: The AI model's name is in the architecture diagram

When a startup's system diagram has a box that says "Claude API" or "GPT-4o" or "Gemini" — that box IS the intelligence. The entire AI layer is a call to someone else's service.

What to ask: "If Claude/GPT disappeared tomorrow, would your product still work? How long would it take to switch to a different AI provider?"

Red Flag 2: "MCP integration" listed as a feature

The question is not whether you have MCP. It is: what tools have you built that are worth connecting to? The port does not matter. What you plug into it does.

What to ask: "What specific tools or data sources does your MCP server expose? How long did it take to build them?"

Red Flag 3: "Built on LangChain / CrewAI / AutoGen"

These are open-source software frameworks that help developers build AI applications. They are useful starting points — like building a house using standard bricks.

What to ask: "What did you build that goes beyond what the framework provides out of the box?"

Red Flag 4: "RAG pipeline" as a differentiator

Every serious AI application uses some form of this pattern. It is table stakes, not a differentiator. Claiming RAG as a competitive advantage is like claiming "we use a database" as a feature.

The real questions are: What is in the database? How good is the search? How do you decide what is relevant? That is where the value lives — not in the pattern itself.

What to ask: "What is in your retrieval database that nobody else has? How do you ensure the AI gets the right information?"

Category 2: Number Inflation

These red flags appear when a startup uses big numbers to imply capabilities that do not match reality.

Red Flag 5: "100+ AI agents"

There is nothing wrong with this approach. But calling it "100+ agents" is like calling a mail merge "100 personalised letters."

What to ask: "How many distinct agent types did you build? How do they coordinate? What happens when one agent's output contradicts another's?"

Red Flag 6: "Trained on X million data points"

This sounds impressive, but the number is meaningless without context.

"Trained on 10 million data points" could mean two very different things:

They carefully labelled 10 million domain-specific examples from their particular industry — like writing a detailed textbook
They scraped 10 million web pages and fed them into a model — like photocopying a library

The difference between these two is enormous. A carefully labelled dataset of 50,000 examples is often more valuable than a noisy dataset of 50 million.

What to ask: "What kind of data? Who labelled it? How was quality controlled? What specific improvement did the training produce?"

Red Flag 7: "Processes X documents per second"

What to ask: "What processing do YOU do before and after the AI call? How much of the total time is spent on your systems vs the AI provider?"

Red Flag 8: "X% accuracy"

"95% accuracy" sounds great. But accuracy at what? Measured against what benchmark? Evaluated by whom?

Category 3: Vague AI Claims

These red flags appear when a startup uses AI terminology without substance behind it.

Red Flag 9: "AI-native" / "AI-first" / "AI-powered"

These terms are used so broadly that they have lost most of their meaning. Here is a simple test: remove the AI from the product description. Does the product still make sense?

AI-native: The product literally cannot exist without AI. Remove the AI and there is nothing left. A self-driving car is AI-native — without the AI, it is just a car that does not move.
AI-enhanced: A traditional product that uses AI to make certain tasks faster or better. Most enterprise software is here. A word processor with AI writing suggestions is AI-enhanced.
AI-branded: A traditional product with "AI" in the marketing. The AI does very little that matters.

What to ask: "What would this product look like without AI? Which specific features are impossible without it?"

Red Flag 10: "Knowledge graph" / "Context lake" / "Intelligence layer"

These terms appear in nearly half of AI pitch decks. They all describe a data layer — a place where the startup stores and organises information. But the sophistication varies enormously.

Most startups claiming a "knowledge graph" have something much closer to the basic end.

What to ask: "What is actually in your data layer? How is it structured? Where does the data come from? How do you track what has changed over time?"

Red Flag 11: "Proprietary model"

There is a 100x difference in investment between writing a set of instructions for someone else's AI model and training your own custom model.

When a startup says "proprietary model," ask which one they have. Most of the time, it is a system prompt — which is fine, but it is not a moat.

What to ask: "Did you train a model or write a prompt? What is the base model? How many training examples? What improved vs using the base model without training?"

Red Flag 12: "AI co-pilot"

This is the single most overused term in AI pitch decks. Every chatbot is now a "co-pilot." The word has become meaningless.

What to ask: "What specific decisions does the co-pilot make? What happens when the co-pilot is wrong? Who is responsible for the co-pilot's output? How does the co-pilot learn from mistakes?"

Red Flag 13: "Agentic workflows" without explaining the workflow

But "agentic" has become a buzzword. When a deck says "agentic workflows," ask:

What triggers the agent?
What decisions does it make?
What happens when it makes a wrong decision?
Who oversees the agent?
What is the fallback when it fails?

If the answers are vague, the "agentic workflow" is probably just a chatbot that can call an API.

What to ask: "Walk me through a specific workflow end to end. Where does the agent make a decision? What are the failure modes?"

Category 4: Demo and Product Red Flags

These red flags appear in how the product is shown, not just how it is described.

Red Flag 14: The demo looks like ChatGPT with a logo

If the product demo is a chat window with the startup's branding on it, the entire product might be a thin wrapper around an AI API.

Look for things that ChatGPT cannot do on its own:

Custom data visualisations and charts
Structured analysis outputs (not just text responses)
Domain-specific user interfaces
Information pulled from proprietary data sources
Multi-step workflows with decision points

If you could recreate the demo by pasting a prompt into ChatGPT, it is not a product — it is a prompt.

What to ask: "Can you show me a feature that requires your system specifically? Something I could not do by pasting a prompt into ChatGPT?"

Red Flag 15: Only one demo example, and it is perfect

Real products handle messy, incomplete, contradictory data. Demos only handle clean examples.

The best test: hand them a real document from your world — something they have never seen. Watch what happens. How long does it take? What does it get wrong? How does it handle ambiguity?

The difference between a demo and a product is the same as the difference between a rehearsed speech and a real conversation.

What to ask: "Can I give you a document right now that you have never seen? Let me watch what happens."

Red Flag 16: No error states or edge case handling shown

Every real product handles errors. Users upload corrupt files. Data is missing. The AI produces nonsense. Network connections fail.

Products that handle failure gracefully are products that have been used by real people. Products that only show success are products that have not been tested.

What to ask: "What happens when the AI gets it wrong? How does the user know the AI is not confident? Can you show me an error state?"

Category 5: Team Gaps

These red flags appear when the team does not match the technical claims.

Red Flag 17: No AI/ML engineer on the founding team

Check the team's public profiles. Are there code repositories that show real AI infrastructure work? Or just code that calls someone else's API?

Using AI is like driving a car. Building AI systems is like designing the engine.

What to ask: "Who on the team has built AI systems before — not used them, but built them? Show me something they have shipped."

Red Flag 18: AI team hired after the deck was written

If the AI engineering team joined in the last 3 months but the deck claims 18 months of AI development, something does not add up. Check hiring dates against the deck's timeline claims.

What to ask: "When did your AI team start building? Who was the first AI engineer? What was the first AI feature shipped?"

Red Flag 19: Founder's AI experience is "used ChatGPT extensively"

What to ask: "What is the most complex AI system your team has built before this company? What was the hardest technical challenge?"

Category 6: Economics That Do Not Add Up

These red flags appear when the AI cost structure does not match the business model.

Red Flag 20: Cannot answer "what does each analysis cost?"

This is the single most revealing question you can ask an AI startup.

Companies that have truly built AI at scale are obsessed with cost. They build smart shortcuts:

Simple rule-based filters that throw out obvious non-matches before the expensive AI ever touches the data
Caching systems that remember previous results instead of re-running the AI
Routing systems that send simple tasks to cheaper AI models and only use expensive models for hard problems
Batching systems that combine multiple requests into one to reduce overhead

If a founder cannot tell you their cost per analysis within 30 seconds, they probably have not built it at scale yet.

What to ask: "What is your cost per analysis? Break it down for me. What do you spend on AI per customer per month?"

Red Flag 21: Pricing too low for claimed AI volume

Sometimes the answer is "we are subsidising to grow" — which is a valid strategy. But if the founder has not done this calculation, the AI claims probably exceed reality.

What to ask: "Walk me through your unit economics. How much does each customer cost you in AI inference? At what scale does this become profitable?"

Red Flag 22: No discussion of cost optimisation

Cost optimisation is one of the most important disciplines in building AI products. Real AI companies talk about:

Pre-filtering: Using cheap, fast rules to eliminate data that does not need expensive AI analysis
Caching: Remembering results so you do not re-run the same analysis twice
Model routing: Using cheaper AI models for simple tasks and expensive ones only for hard problems
Batching: Combining multiple small requests into fewer large ones

If none of this comes up in a conversation about the product, the AI cost structure has not been thought through — which usually means the AI is not doing as much as claimed.

What to ask: "How do you keep AI costs down? What percentage of requests actually need your most expensive AI processing?"

Category 7: Structure and Process Red Flags

These red flags appear in how the AI system is described structurally.

Red Flag 23: No evaluation or improvement framework

How does the startup know whether their AI is getting better?

If the answer to "how do you know your AI is improving?" is "we look at it" — the AI is probably not improving.

What to ask: "How do you measure AI quality? What is your accuracy now vs six months ago? How do user corrections feed back into the system?"

Red Flag 24: Context window or token counts as features

It is like a taxi company advertising "top speed of 200mph" because that is what the car can do. The car's capability is not the taxi company's feature.

What to ask: "What is YOUR contribution beyond what the underlying AI model provides?"

Red Flag 25: "Fine-tuned model" without specifics

Fine-tuning is the process of teaching an existing AI model to be better at a specific task using your own data. It can produce real improvements — but the value depends entirely on the details.

Key questions:

What model did you start with?
How much training data did you use? (100 examples is a toy. 10,000 is serious. 100,000+ is rare.)
What specifically improved? By how much?
How do you prevent the model from getting worse at things it used to be good at?

"We fine-tuned a model" without answers to these questions is like saying "we customised a car" — it could mean a paint job or a new engine.

What to ask: "What is the base model? How many training examples? What was accuracy before and after fine-tuning? Can you show me the evaluation results?"

The Scoring Approach

We think about this on a spectrum:

Technical Credibility Spectrum

Level

What It Means

Verified

Technical claims match external evidence. Team has built what they describe.

Plausible

Claims are reasonable given team background and stage, but cannot be independently verified yet.

Unverifiable

Not enough evidence to assess either way. Could be real, could be vapour.

Overstated

Some real technology exists, but the deck significantly exaggerates what has been built.

Contradicted

External evidence actively contradicts the deck's claims.

The goal is not to catch liars. Most founders genuinely believe in what they are building. The goal is to calibrate — to understand the gap between vision and reality.

How We Think About This

The six dimensions we evaluate:

AI Technical Credibility Dimensions

1. Architecture depth

Is this one AI model with different instructions, or genuinely different systems that work together?

2. Claim verification

Can we find external evidence (code repositories, published packages, patents, conference talks) that supports the technical claims?

3. Cost awareness

Does the team understand and communicate the economics of their AI system?

4. Data advantage

Is the data layer a real competitive moat, or off-the-shelf infrastructure?

5. Team depth

Has the engineering team built AI systems before — not used them, but built them?

6. Quality framework

How does the startup measure and improve AI quality over time?

Because in 2026, every startup is "AI-powered."

The question is: powered by what, exactly?

This post was produced by the NUVC Intelligence Team. NUVC is an AI-native venture capital intelligence platform. We score startups, match them with investors, and screen funds for family offices.

Category 1: Borrowed Technology Dressed as Proprietary

Red Flag 1: The AI model's name is in the architecture diagram

Red Flag 2: "MCP integration" listed as a feature

Red Flag 3: "Built on LangChain / CrewAI / AutoGen"

Red Flag 4: "RAG pipeline" as a differentiator

Category 2: Number Inflation

Red Flag 5: "100+ AI agents"

Red Flag 6: "Trained on X million data points"

Red Flag 7: "Processes X documents per second"

Red Flag 8: "X% accuracy"

Category 3: Vague AI Claims

Red Flag 9: "AI-native" / "AI-first" / "AI-powered"

Red Flag 10: "Knowledge graph" / "Context lake" / "Intelligence layer"

Red Flag 11: "Proprietary model"

Red Flag 12: "AI co-pilot"

Red Flag 13: "Agentic workflows" without explaining the workflow

Category 4: Demo and Product Red Flags

Red Flag 14: The demo looks like ChatGPT with a logo

Red Flag 15: Only one demo example, and it is perfect

Red Flag 16: No error states or edge case handling shown

Category 5: Team Gaps

Red Flag 17: No AI/ML engineer on the founding team

Red Flag 18: AI team hired after the deck was written

Red Flag 19: Founder's AI experience is "used ChatGPT extensively"

Category 6: Economics That Do Not Add Up

Red Flag 20: Cannot answer "what does each analysis cost?"

Red Flag 21: Pricing too low for claimed AI volume

Red Flag 22: No discussion of cost optimisation

Category 7: Structure and Process Red Flags

Red Flag 23: No evaluation or improvement framework

Red Flag 24: Context window or token counts as features

Red Flag 25: "Fine-tuned model" without specifics

The Scoring Approach

How We Think About This

Frequently Asked Questions

How do I evaluate AI claims in a pitch deck?

What is the difference between AI-native and AI-enhanced?

What should I ask an AI startup about their costs?

How can I tell if a startup has a real AI model vs just a prompt?

AI product insights, weekly

Loading NUVC...

Category 1: Borrowed Technology Dressed as Proprietary

Red Flag 1: The AI model's name is in the architecture diagram

Red Flag 2: "MCP integration" listed as a feature

Red Flag 3: "Built on LangChain / CrewAI / AutoGen"

Red Flag 4: "RAG pipeline" as a differentiator

Category 2: Number Inflation

Red Flag 5: "100+ AI agents"

Red Flag 6: "Trained on X million data points"

Red Flag 7: "Processes X documents per second"

Red Flag 8: "X% accuracy"

Category 3: Vague AI Claims

Red Flag 9: "AI-native" / "AI-first" / "AI-powered"

Red Flag 10: "Knowledge graph" / "Context lake" / "Intelligence layer"

Red Flag 11: "Proprietary model"

Red Flag 12: "AI co-pilot"

Red Flag 13: "Agentic workflows" without explaining the workflow

Category 4: Demo and Product Red Flags

Red Flag 14: The demo looks like ChatGPT with a logo

Red Flag 15: Only one demo example, and it is perfect

Red Flag 16: No error states or edge case handling shown

Category 5: Team Gaps

Red Flag 17: No AI/ML engineer on the founding team

Red Flag 18: AI team hired after the deck was written

Red Flag 19: Founder's AI experience is "used ChatGPT extensively"

Category 6: Economics That Do Not Add Up

Red Flag 20: Cannot answer "what does each analysis cost?"

Red Flag 21: Pricing too low for claimed AI volume

Red Flag 22: No discussion of cost optimisation

Category 7: Structure and Process Red Flags

Red Flag 23: No evaluation or improvement framework

Red Flag 24: Context window or token counts as features

Red Flag 25: "Fine-tuned model" without specifics

The Scoring Approach

How We Think About This

Frequently Asked Questions

How do I evaluate AI claims in a pitch deck?

What is the difference between AI-native and AI-enhanced?

What should I ask an AI startup about their costs?

How can I tell if a startup has a real AI model vs just a prompt?