The R&D tax credit for AI startups
The short answer
Model training, fine-tuning, and evaluation work usually qualifies for the R&D tax credit. The sticking point is separating genuine experimentation from prompt tweaks against a vendor's API.
What qualifies, and what fights you
AI companies that train or fine-tune their own models are running textbook research and development. Architecture choices, hyperparameter sweeps, and evaluation design all involve real uncertainty about whether an approach will work, which is exactly what the credit rewards.
The harder question is companies built as a thin layer on top of a foundation model API. Calling GPT or Claude with a well-crafted prompt and shipping the result is a product decision, not research, unless your team is running structured experiments to measure and improve model behavior. The line is whether you are measuring and iterating against a hypothesis, or just trying prompts until one feels right.
Retrieval-augmented generation systems sit in between. Building the retrieval pipeline, testing chunking strategies, and tuning ranking algorithms against a real evaluation set qualifies. Wiring a vector database into a chat UI with default settings usually does not.
The four-part test, applied to AI startups
Qualified purpose is straightforward for AI companies: the work improves a model or a product built around one. The technological requirement is met through machine learning, statistics, and software engineering.
Elimination of uncertainty shows up clearly in training runs. Nobody knows in advance whether a new architecture, a different training data mix, or a fine-tuning approach will improve accuracy. The process of experimentation is the training and evaluation loop itself: hypothesize, train, measure against a benchmark, adjust, and repeat.
New to the test itself? Read what software work qualifies as R&D first.
Work that usually qualifies
Training runs and architecture experiments
Comparing model architectures or training configurations to see which one generalizes better on your data qualifies as core experimentation.
Evaluation harness development
Building a system that scores model outputs against a benchmark or a labeled test set, so you can measure whether a change actually helped, is qualifying technical work.
Fine-tuning experiments with uncertain outcomes
Testing whether fine-tuning on domain-specific data improves accuracy over a base model, and measuring the result, qualifies. Running one fine-tune and shipping it without measurement is a weaker claim.
Retrieval pipeline design
Testing chunking strategies, embedding models, and ranking approaches against a real evaluation set to improve retrieval quality qualifies.
Inference optimization
Quantizing a model or redesigning a serving stack to cut latency while holding accuracy steady involves genuine engineering trade-offs and testing.
Work that usually does not
Prompt tweaks against a stable API
Rewording a prompt sent to a vendor's model until the output looks better, with no structured measurement, does not meet the process of experimentation requirement.
Wrapping a vendor API in a product feature
Calling a foundation model's API as documented and displaying the result does not involve technological uncertainty, even if the product built around it is new.
Which expenses count
GPU compute is the expense that sets AI companies apart. Cloud compute used for training runs, fine-tuning jobs, and evaluation sweeps counts as a qualifying expense. Inference compute serving live production traffic generally does not, since it is not development work.
Wages for ML engineers and researchers count, prorated to time spent on qualifying experimentation. That includes the engineers designing eval harnesses and running training jobs, not just the ones writing model code.
US-based contractors, including specialized ML consultants, count at 65% of what you pay them. Data labeling work can also count if it directly supports a qualifying experiment, such as building a new evaluation set.
A worked example
Hypothetical example. An AI startup has 5 ML engineers earning a blended average of $180,000, spending about 75% of their time on training, fine-tuning, and evaluation work.
At 6 to 10% of total QRE, the federal credit lands between about $59,550 and $99,250. A pre-revenue or early-revenue company under $5 million can apply up to $500,000 of that against payroll taxes each year.