The most common reason ML projects fail is not a bad model. It is bad scoping: attempting an ML solution to a problem that does not need ML, underestimating the cost of labels, ignoring data quality issues, building for a success metric that does not match the business goal, or discovering halfway through that the data needed to solve the problem does not exist.
Good ML project scoping takes one to three days and prevents months of wasted work. The five questions below are the minimum set you must answer before committing engineering resources to an ML project.
Question 1: Is This Actually an ML Problem?
Machine learning is the right tool when: the task is too complex to define explicit rules for, you have sufficient labeled data or can obtain it, the patterns in historical data predict future outcomes, and the value of improved accuracy exceeds the cost of building and maintaining an ML system.
ML is the wrong tool when: a set of deterministic rules would solve the problem adequately, the data you have does not predict the outcome you care about, the problem changes so frequently that a trained model would be outdated constantly, or the cost of errors is high enough to require human judgment in all cases.
A concrete test: can you write down the rules that a human expert uses to make this decision? If yes, implement those rules first. If the rule-based system is accurate enough (and it often is), you are done. ML projects carry substantial ongoing maintenance costs; rule-based systems can be maintained by any engineer. Only add ML complexity when you can quantify a performance gap that rules cannot close.
Common problems that do not need ML: calculating a shipping estimate from business rules, routing support tickets by keyword matching, flagging accounts over a certain spending threshold, validating form inputs, generating rule-based reports.
Common problems that benefit from ML: predicting which users will churn next month, classifying the sentiment of open-ended customer feedback, detecting anomalies in server logs, matching job candidates to job descriptions, forecasting demand for products with seasonal patterns.
Question 2: Do You Have the Data You Need?
This question kills more ML projects than any other. Be ruthless.
The data questions to answer:
Does the data exist? If you want to predict customer lifetime value, do you have customer history going back far enough to observe meaningful lifetime outcomes? If you want to predict equipment failure, do you have historical sensor readings aligned with failure events? Many ML projects assume data exists that was never collected.
Is the data labeled? Supervised ML requires labeled examples. If your data is unlabeled, you need a labeling strategy (human annotation, programmatic labeling, weak supervision, or a shift to unsupervised methods).
How much labeled data do you have? As rough minimums: simple binary classification requires hundreds to low thousands of examples. Fine-tuning a pretrained model requires thousands. Training from scratch requires tens of thousands to millions. If you have 100 labeled examples and need a custom model, your first month should be spent labeling data, not training models.
Is the data representative of what you will see in production? Training data collected in 2022 may not represent 2025 behavior. Data from one geographic region may not generalize to another. A model trained on data from your largest customers may fail on SMB customers. State these assumptions explicitly and verify them.
Is there feature leakage? Can you construct all the features you plan to use at prediction time, without using information that would not be available then? This is often not true on first examination. Invoice payment status cannot be used to predict invoice default if the status is only available after the default occurs.
Question 3: What Is the Precise Success Metric?
"Improve recommendations" is not a success metric. "Reduce customer churn" is not a success metric. These are goals. The success metric is the specific, measurable number you will use to determine whether the ML system is working.
The metric must be:
- Measurable: You can compute it from data you have
- Tied to the business goal: Improving the metric should causally improve the business outcome
- Achievable: There exists a model that could achieve a meaningful improvement given your data
Common disconnects between ML metrics and business metrics:
A recommendation system optimized for click-through rate maximizes clicks, which may not correlate with purchase conversion or customer satisfaction. An email classifier optimized for accuracy performs poorly on the rare emails that matter most. A churn model optimized for AUC ranks customers well relative to each other but the absolute probabilities may be miscalibrated.
Define: what does "good enough" look like? What improvement over the current baseline (which may be a rule-based system or human decision-making) justifies the investment? If the current system is 85% accurate and ML gets you to 87%, is that worth six months of engineering work?
Question 4: What Is the Human-in-the-Loop Plan?
Very few production ML systems operate with zero human involvement. The responsible scoping question is not "can we automate this?" but "how will humans interact with this system's outputs?"
The human-in-the-loop considerations:
Confidence thresholds: For predictions below some confidence level, route to human review rather than automated action. High-confidence predictions are automated; low-confidence predictions get human eyes.
Appeals and corrections: When the model makes a decision (auto-reject a loan application, auto-flag content), what is the process for a human to override it? How do those corrections feed back into model improvement?
Monitoring and escalation: Who monitors model performance in production? What triggers a human review of model behavior? Gradual distribution shift and sudden data quality issues both require human response.
Regulatory requirements: In regulated industries (financial services, healthcare, hiring), automated decisions may require explainability, audit trails, and appeal mechanisms regardless of model accuracy. Build these requirements into scope from day one.
Question 5: What Is the Total Cost of Ownership?
ML systems are not built once and forgotten. They require ongoing maintenance that many projects underestimate:
Retraining: Models degrade as the world changes. How frequently will you retrain? Who owns the retraining pipeline? What triggers a retrain (scheduled? performance-based? data drift detection)?
Monitoring: You need monitoring for model performance (are predictions still accurate?), data quality (is incoming data formatted correctly?), and distribution shift (has the input distribution changed?). Building a model without monitoring is building a system that fails silently.
Infrastructure: Serving an ML model at production scale requires infrastructure: a model serving endpoint, versioning, A/B testing capability, rollback capability. This is engineering work separate from model training.
Labeling pipeline: If your model needs ongoing labeled data for retraining, who provides the labels? What is the labeling cost per example? For 10,000 examples retraining quarterly, you need a sustainable labeling process.
The Minimum Viable ML Project Checklist
Before starting ML work, verify:
- The problem cannot be solved adequately with deterministic rules
- Historical data exists that would have predicted the outcome you care about
- You have or can obtain sufficient labeled examples (or a plan to get them)
- There is no obvious feature leakage in your proposed feature set
- You have a specific, measurable success metric with a defined "good enough" threshold
- You have a baseline to beat (rule-based system, human accuracy, or prior model)
- You know who maintains the system and the retraining pipeline post-launch
- You have a plan for handling low-confidence predictions
- Regulatory and explainability requirements are understood
- Estimated total cost of ownership is justified by the projected business value
Build vs. Buy: ML Components Decision
Should you build a custom model or use a vendor ML API?
Use a vendor API (OpenAI, Anthropic, Google, AWS AI services) when: the task is generic (sentiment analysis, image classification, speech-to-text), volume is low enough that API costs are manageable, time to production matters more than cost optimization, or you need capabilities (large language model reasoning, multimodal processing) that would require massive resources to replicate.
Build a custom model when: the task is domain-specific enough that general models perform poorly, data privacy prevents sending data to external APIs, volume is high enough that API costs exceed the cost of a custom solution, or you need precise control over model behavior and updates.
The default for most organizations: use APIs to validate the concept, measure performance and cost at your actual volume, then decide whether custom models are worth the investment.
Good scoping does not guarantee ML project success. But bad scoping almost guarantees failure. Spending two to three days answering these five questions rigorously is the highest-return investment you can make in any ML project.
Keep Reading
- Machine Learning Complete Guide for Software Developers -- the technical implementation that follows good scoping
- ML Model Evaluation Metrics Guide -- choosing the right success metric in detail
- We Replaced 6 SaaS Tools with One: What Happened -- practical lessons from building ML-adjacent product decisions
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace -- chat, projects, time tracking, AI meeting summaries, and invoicing -- in one tool. Try it free.