The Problem with Manual RICE Scoring
RICE scoring is one of the most popular prioritization frameworks in product management for good reason. It forces you to evaluate features across four dimensions: Reach, Impact, Confidence, and Effort. But anyone who has run a RICE exercise with a team knows the dirty secret: most of the numbers are educated guesses.
Reach estimates often come from gut feel rather than data. Impact scores collapse into "high, medium, low" buckets that everyone interprets differently. Confidence is the score people adjust last to make their preferred feature win. And effort estimates from engineering are notoriously optimistic.
AI does not fix all of these problems. But it can ground your RICE scores in data rather than intuition for at least two of the four dimensions. Here is how to do it practically, with prompts you can use this week.
If you are new to the framework, start with the RICE framework overview before diving into AI-assisted scoring.
Using AI to Estimate Reach
Reach is the dimension where AI adds the most value because it is fundamentally a data analysis problem.
Feed your analytics to an LLM
Export your product analytics for the relevant user segment: monthly active users, feature usage data, funnel conversion rates. Paste this into Claude or ChatGPT with a prompt like:
Here is our product analytics for the past 90 days:
[paste data]
We are considering a feature that [description].
This feature would be available to users who [eligibility criteria].
Based on this data, estimate how many users per quarter
would encounter this feature. Show your reasoning step by step.
Provide a range (low/mid/high) with confidence level for each.
What this looks like in practice
Say you are building a bulk export feature. Your analytics show 12,000 monthly active users, 3,400 of whom use the export function at least once per month, and 800 who export more than five times per month. AI can analyze these segments, apply reasonable adoption curve assumptions, and estimate that 1,500 to 2,200 users per quarter would use bulk export within the first six months.
That is a much better starting point than "I think a lot of people want this."
Limitations to watch for
AI will confidently produce a number even when the data is insufficient. Always check whether the model is extrapolating beyond what the data supports. If your analytics only cover 30 days, say so in the prompt and ask the model to flag assumptions it is making about seasonal patterns or growth trends.
Using AI to Assess Impact
Impact is harder to quantify with AI, but you can improve your estimates by using AI to synthesize qualitative signals into structured scores.
Combining multiple data sources
Feature: [description]
Customer feedback data:
- [X] support tickets mentioning this pain point in last 90 days
- [Y] feature requests in our feedback portal
- Average NPS comment sentiment for this topic: [score]
- Competitor [name] launched similar feature [date]
Our impact scale:
3 = Major (moves primary KPI by 5%+)
2 = High (moves primary KPI by 2-5%)
1 = Medium (moves primary KPI by 0.5-2%)
0.5 = Low (minimal KPI impact)
Score this feature's impact and explain your reasoning.
Compare against [2-3 recently shipped features] as calibration points.
The calibration step is critical. Without reference points, AI will default to "High impact" for everything because most feature descriptions sound important when you are pitching them.
When AI impact scoring fails
AI cannot assess strategic impact. A feature that positions your product for an emerging market segment or blocks a competitive threat may score low on direct KPI impact but high on strategic value. Use AI for the data-driven impact estimate, then apply your strategic judgment as an adjustment.
Using AI to Generate Confidence Scores
Confidence is the dimension PMs most often abuse. It is supposed to reflect how certain you are about your Reach and Impact estimates, but in practice it becomes a fudge factor.
AI can make Confidence more honest by forcing a structured assessment.
For each of the following, rate our evidence level as
Strong (direct data), Moderate (indirect/analogous data),
or Weak (assumption only):
1. Reach estimate of [X] users/quarter
Evidence: [what you have]
2. Impact estimate of [score]
Evidence: [what you have]
3. Technical feasibility
Evidence: [what you have]
Based on these evidence ratings, assign a RICE Confidence
percentage using this scale:
100% = All three have Strong evidence
80% = Two Strong, one Moderate
60% = One Strong, two Moderate
40% = All Moderate or mixed
20% = Any dimension has only Weak evidence
Explain your scoring.
This approach is valuable because it surfaces gaps in your evidence before you commit to a priority ranking. If the AI flags that your reach estimate is based on weak evidence, that is a signal to gather more data before prioritizing.
Using AI to Estimate Effort
Effort estimation is where PMs should be most cautious with AI. Engineering effort depends on your specific codebase, technical debt, team composition, and architecture. No LLM knows these details.
What AI can do
AI can help you structure the effort breakdown so you ask engineering better questions.
Feature: [description]
Break this into implementation components:
- Frontend changes
- Backend/API changes
- Database changes
- Third-party integrations
- Testing requirements
- Documentation
For each component, list the key technical decisions
that would affect effort estimates.
What AI cannot do
Do not ask AI to estimate person-weeks or story points. It will give you a number, but that number is based on generic software projects, not your team building on your stack. Instead, use the structured breakdown above as an input to your engineering lead's estimate. It saves time by ensuring you have considered all the components before the estimation conversation.
Putting It All Together: The AI-Assisted RICE Workflow
Here is a step-by-step process for running AI-assisted RICE scoring across your backlog.
Step 1: Prepare your data packet. For each feature candidate, gather analytics data, customer feedback counts, support ticket volumes, and any competitive intelligence. This takes 15 to 20 minutes per feature.
Step 2: Run reach and impact prompts. Feed your data to the LLM using the prompt templates above. Review each output for reasonableness. This takes 5 to 10 minutes per feature.
Step 3: Assess confidence honestly. Use the structured confidence prompt. Accept low confidence scores. They are telling you something useful.
Step 4: Get human effort estimates. Use the AI-generated component breakdown as a starting point for engineering conversations.
Step 5: Calculate and compare. Plug your scores into the RICE calculator and rank your features. Look for surprises. If a feature you expected to rank high scores low, dig into which dimension is dragging it down.
Step 6: Apply strategic judgment. RICE gives you a data-informed starting point, not a final answer. Adjust for strategic considerations, dependencies, and sequencing that the framework does not capture. Use Compass to map how your top-ranked features align with your product direction.
What AI-Assisted RICE Scoring Actually Changes
Teams that adopt this approach report three consistent improvements.
Faster prioritization cycles. The data gathering and initial scoring that used to take a full sprint planning session now takes a few hours of async prep. The team meeting focuses on discussing trade-offs rather than debating numbers.
More honest confidence scores. When AI surfaces evidence gaps, teams are more willing to admit uncertainty. This leads to better prioritization because low-confidence, high-impact features get the "gather more data" treatment instead of being ranked on optimism.
Better calibration over time. By recording AI-estimated scores alongside actual outcomes, you build a feedback loop. After three to four quarters, you learn where AI overestimates and where it underestimates for your specific product context.
Limitations and Honest Caveats
AI-assisted RICE scoring is not a silver bullet. Here is where it breaks down.
- Garbage in, garbage out. If your analytics data is unreliable or your feedback is not systematically captured, AI just adds a veneer of precision to bad data.
- Strategic bets do not score well. RICE inherently favors incremental improvements with clear data support. If you are making a strategic bet on a new market or capability, RICE scores will undervalue it. Use the framework for feature prioritization within a strategy, not to set the strategy itself.
- Team buy-in matters. If your engineering team does not trust the AI-generated estimates, the exercise creates friction instead of clarity. Introduce AI scoring as a starting point for discussion, not a replacement for team judgment.
For generating strategy documents that frame your prioritization decisions for stakeholders, try Forge. For a broader look at how AI is changing PM workflows, see the AI product management guide.