Skip to content
Menu

Concept testing that gets to real signal — not survey averages

Hundreds of real conversations with real customers, in their language, at the depth of a senior moderator. Test any concept — sentence, slide, image, or video — in days, not weeks.

[ the problem ]

Why most concept tests don't actually de-risk launches

Most concept tests ask respondents to rate something they haven't fully understood. A static PDF designed for a product manager — not for a Tier-2 shopper encountering the category for the first time. The comprehension gap is invisible to a 1–5 scale.

A score of 3.8 on a concept nobody understood is noise dressed as research. Launch decisions made on that number aren't de-risked — they're randomised.

Survey averages also hide the variance that matters. A mean purchase-intent of 4.1 could be a tight cluster or a bimodal split between enthusiasts and rejecters. The only way to see it is to have the conversation.

Concept A — Brand survey
Appeal3.8/5Why?
Purchase intent4.1/5Why?
Brand fit3.6/5Why?

Scores recorded. No follow-up questions asked.

[ how alchemic solves it ]

Three things that work together.

01

Test with stimuli they actually understand

The comprehension gap is the most common failure mode in concept testing — and the least measured. A score of 3.8 on a concept nobody understood is noise dressed as research.

Alchemic turns your brief into an interactive concept card: structured, expandable, tappable. Key claims surface individually. The AI tracks engagement before any question is asked.

Text, slide, image, or video — the format is a parameter, not a constraint.

Before: static PDF
New Product Brief — Draft v3

A compact, low-sugar snack bar with 12g protein, targeting urban professionals aged 25–40. Positioned as a guilt-free mid-morning alternative to biscuits.

How appealing is this concept? (1–5)
12345
No follow-up. No why.
After: Alchemic concept card
Concept AProtein Snack Bar
12g protein. No guilt. Built for your 10am slump.
+Tap to see more details
Respondent expanded spec block at 0:14 — engaged.
AI interviewer

You tapped the protein detail — what caught your attention there?

02

Probe in real conversation. Get the why behind every rating.

The AI listens for the hesitation, the partial agreement, the vague 'it's okay' hiding a real objection. When someone rates appeal at 3, it asks why 3 and not 4.

Adaptive in real time. Consistent across every respondent. In 57 languages with native probing — not a translation layer applying English research logic to other languages.

Survey averages hide the variance that matters. Every theme links to respondents, every respondent to the verbatim, every verbatim to the voice note moment.

See how Alchemic AI moderation works →
Interview · Respondent 12 · Live● Live
Respondent

“It’s good I guess, packaging is fine.”

AI noticed: hedged answer — probing the hesitation
AI

You said “I guess” — what would make you say “definitely”?

“The claim about 12g protein sounds right for the gym crowd. For me it’s more about snacking without the guilt — and I’m not sure this one gives me that.”
 Positioning gap surfaced

03

Run it end-to-end — every stage, every channel, every language, with live insights

Concept testing isn't a single event. Early screening kills six concepts cheaply to develop two. Mid-stage fixes the messaging. Late validation confirms the bar before the production commit. Post-launch tells you why reality didn't match the prediction.

WhatsApp, voice call, web link — 57 languages natively moderated. No app install, no redirect friction. The channel meets the respondent where they are.

The report is live before fieldwork closes. Themes build as interviews come in. Theme reels surface the best evidence in a format a CMO can watch in three minutes.

Concept Test — Protein Snack Bar · Live● 143 / 200
All segments ▾
Comprehension gap on 'guilt-free'47 mentions
Protein claim is credible38 mentions
Price-premium concern31 mentions
Packaging dissonance24 mentions
"

Guilt-free means different things on different days — less sugar, no maida, small portion. You should be clearer.

Urban professional, F, 28 · Bangalore

[ vs. ]

How Alchemic compares

Survey-only concept tests give you scores without understanding. Focus groups give depth but not scale. Alchemic gives both — at the speed and sample size a real launch decision needs. If you need normative benchmarks from a syndicated database, the established survey vendors serve that well. If you need to understand why your concept scores what it scores, and what you need to change, that is where Alchemic works best.

Survey-only concept testFocus groupAlchemic
InterviewsHundreds (fixed questions)6–12 respondents200+ adaptive interviews
Stimulus formatsPDF or imagePrinted or screenedText, slide, image, video, prototype
Comprehension checkRarely includedModerator-dependentBuilt in — AI checks before rating
Turnaround1–2 weeks3–5 weeks5 days
LanguagesAvailable, translated post-hocOne per session57 natively
The why behind scoresOpen-end field onlyPartial — moderator-ledAI probes every rating
Sample reachPanel-dependentMetro and accessible onlyTier 1–3, WhatsApp, voice
Report formatScorecard + open-end dumpTopline + notesLive dashboard + theme reels + verbatim

Voice-first study or compliance-heavy audience? AI Phone Research → WhatsApp-native respondents in low-bandwidth zones? WhatsApp Interviews →

[ use cases ]

Where concept testing with Alchemic works

FMCG packaging and claims

Test packaging designs, front-of-pack claims, and new product positioning across Tier 1–3 markets. Understand which claim resonates, which creates confusion, and which triggers competitive comparison before committing to print runs.

Tech feature prototypes

Show a Figma prototype inline during the interview. The AI probes what's intuitive, what creates hesitation, and whether the value proposition lands with the actual user. Qualitative signal before an engineering sprint.

DTC brand positioning

Test two or three positioning options with your exact customer profile. Which creates emotional connection? Which sounds like every other brand in the space? AI-moderated interviews surface the gap with verbatim evidence.

Retail and assortment decisions

Which SKU to launch first? Which pack size or format? Test with real shoppers in the relevant channel — kiranas, modern trade, QSR, or D2C — before making range decisions that are expensive to reverse.

Service propositions

Test a new insurance plan, fintech feature, or healthcare service package before building operations around it. Catch the objections before a call centre hears them at scale.

B2B and SaaS concepts

Test product positioning and feature names with decision-makers and end users separately — they almost never agree. The CIO and the analyst care about entirely different things. Alchemic runs both conversations, in parallel.

Trusted by brand and insights teams at

Razorpay
Urban Company
CaratLane
Unilever
Mars
Dr. Reddy's
Sleepwell
Blackberrys
Razorpay
Urban Company
CaratLane
Unilever
Mars
Dr. Reddy's
Sleepwell
Blackberrys

[ FAQ ]

Concept testing, frequently asked.

What is concept testing?

Concept testing evaluates a new product idea, feature, positioning, or brand concept with target consumers before launch. It identifies whether the concept is understood, desirable, and likely to drive purchase — at a stage when changes are cheap. Traditional concept tests use surveys; Alchemic runs AI-moderated interviews that get the why behind every rating, not just the numbers.

How does AI concept testing work?

Alchemic's AI turns your brief into an interactive concept card respondents can explore. An AI moderator then interviews each respondent — probing comprehension, appeal, purchase intent, and the reasoning behind every answer. Hundreds of conversations run simultaneously, in the respondent's language, with real-time theme coding.

Monadic vs sequential monadic — which is better?

Monadic design (each respondent sees one concept) avoids order effects and is the gold standard for clean purchase-intent scores. Sequential monadic (each respondent sees all concepts in random order) is efficient for ranking but risks carry-over bias. Use monadic when a score's absolute value matters; sequential monadic when you need to rank concepts head-to-head. Alchemic supports both designs.

How many respondents do I need for concept testing?

For a single-concept monadic test with a broad target, 100–200 interviews typically produce stable themes and reliable quant scores. Multi-concept studies need 200–400 total across cells. Studies requiring robust sub-group cuts need at least 50–80 respondents per sub-group. Alchemic fields 200+ interviews in 5 days as standard.

How fast can concept testing be done?

Brief on Day 1, builder live within 48 hours, full theme-coded report within 5 days — for a standard 200-interview single-market study. Multi-market or large-sample studies typically add 2–3 days. Fieldwork is never the bottleneck — 200 conversations happen in parallel, not in a queue.

What is the difference between concept testing and product testing?

Concept testing evaluates an idea before a product is built or launched. Product testing (IHUT) puts a physical product in consumers' hands to evaluate actual experience. Concept testing is earlier, cheaper, and faster; product testing validates post-production. Alchemic covers both.

How do you test a concept before launch?

Define your target audience and the decision you need to make. Prepare your stimulus — sentence, deck slide, mock packaging, image, or video. Field AI-moderated interviews: comprehension first, then appeal and intent, then open probing on the why. Analyse themes across 200+ conversations. Make the go/no-go call with evidence.

How do you test concepts for FMCG, SaaS, or DTC brands?

FMCG: test packaging and claims with household decision-makers in Tier 1–3 cities, in regional languages. SaaS: test feature names and pricing frames with decision-makers; Alchemic supports Figma prototype stimulus. DTC: test proposition and creative messaging with your exact customer profile. Same platform, different briefing and stimulus.