Concept testing that gets to real signal — not survey averages

Q: What is concept testing?

Concept testing is a research method that evaluates a new product idea, feature, positioning, or brand concept with target consumers before launch. It identifies whether the concept is understood, desirable, and likely to drive purchase — at a stage when changes are cheap. Traditional concept tests use surveys; Alchemic runs AI-moderated interviews that get the why behind every rating, not just the numbers.

Q: How many respondents do I need for concept testing?

For a single-concept test with a broad target, 100–200 interviews typically produce stable themes and reliable quantitative scores. Multi-concept tests (monadic) or studies needing robust demographic cuts need proportionally more — often 200–400. Alchemic fields 200+ interviews in 5 days as standard.

Q: How fast can concept testing be done?

Brief on Day 1, live within 48 hours, full report within 5 days — for a standard 200-interview study in one market. Multi-market or very large samples take longer. Alchemic’s AI runs hundreds of interviews simultaneously so fieldwork doesn’t create a bottleneck.

Q: How do you test concepts for FMCG, SaaS, or DTC brands?

FMCG: test packaging, positioning claims, and new SKU propositions with household decision-makers in Tier 1–3 cities, in regional languages. SaaS: test feature names, onboarding flows, and pricing frames with decision-makers; Alchemic supports Figma prototype stimulus. DTC: test proposition and creative messaging with your exact customer profile — segment by channel, cohort, or geography. Same platform, different briefing and stimulus.

Hundreds of real conversations with real customers, in their language, at the depth of a senior moderator. Test any concept — sentence, slide, image, or video — in days, not weeks.

Let's Talk!

[ the problem ]

Why most concept tests don't actually de-risk launches

Most concept tests ask respondents to rate something they haven't fully understood. A static PDF designed for a product manager — not for a Tier-2 shopper encountering the category for the first time. The comprehension gap is invisible to a 1–5 scale.

A score of 3.8 on a concept nobody understood is noise dressed as research. Launch decisions made on that number aren't de-risked — they're randomised.

Survey averages also hide the variance that matters. A mean purchase-intent of 4.1 could be a tight cluster or a bimodal split between enthusiasts and rejecters. The only way to see it is to have the conversation.

Concept A — Brand survey

Appeal3.8/5Why?

Purchase intent4.1/5Why?

Brand fit3.6/5Why?

Scores recorded. No follow-up questions asked.

[ how alchemic solves it ]

Three things that work together.

Test with stimuli they actually understand

The comprehension gap is the most common failure mode in concept testing — and the least measured. A score of 3.8 on a concept nobody understood is noise dressed as research.

Alchemic turns your brief into an interactive concept card: structured, expandable, tappable. Key claims surface individually. The AI tracks engagement before any question is asked.

Text, slide, image, or video — the format is a parameter, not a constraint.

Before: static PDF

New Product Brief — Draft v3

A compact, low-sugar snack bar with 12g protein, targeting urban professionals aged 25–40. Positioned as a guilt-free mid-morning alternative to biscuits.

How appealing is this concept? (1–5)

12345

No follow-up. No why.

After: Alchemic concept card

Concept AProtein Snack Bar

12g protein. No guilt. Built for your 10am slump.

+Tap to see more details

Respondent expanded spec block at 0:14 — engaged.

AI interviewer

You tapped the protein detail — what caught your attention there?

Probe in real conversation. Get the why behind every rating.

The AI listens for the hesitation, the partial agreement, the vague 'it's okay' hiding a real objection. When someone rates appeal at 3, it asks why 3 and not 4.

Adaptive in real time. Consistent across every respondent. In 57 languages with native probing — not a translation layer applying English research logic to other languages.

Survey averages hide the variance that matters. Every theme links to respondents, every respondent to the verbatim, every verbatim to the voice note moment.

See how Alchemic AI moderation works →

Interview · Respondent 12 · Live● Live

Respondent

“It’s good I guess, packaging is fine.”

AI noticed: hedged answer — probing the hesitation

You said “I guess” — what would make you say “definitely”?

“The claim about 12g protein sounds right for the gym crowd. For me it’s more about snacking without the guilt — and I’m not sure this one gives me that.”

→ Positioning gap surfaced

Run it end-to-end — every stage, every channel, every language, with live insights

Concept testing isn't a single event. Early screening kills six concepts cheaply to develop two. Mid-stage fixes the messaging. Late validation confirms the bar before the production commit. Post-launch tells you why reality didn't match the prediction.

WhatsApp, voice call, web link — 57 languages natively moderated. No app install, no redirect friction. The channel meets the respondent where they are.

The report is live before fieldwork closes. Themes build as interviews come in. Theme reels surface the best evidence in a format a CMO can watch in three minutes.

Concept Test — Protein Snack Bar · Live● 143 / 200

Top themesAll segments ▾

Comprehension gap on 'guilt-free'47 mentions

Protein claim is credible38 mentions

Price-premium concern31 mentions

Packaging dissonance24 mentions

Guilt-free means different things on different days — less sugar, no maida, small portion. You should be clearer.

Urban professional, F, 28 · Bangalore

[ vs. ]

How Alchemic compares

Survey-only concept tests give you scores without understanding. Focus groups give depth but not scale. Alchemic gives both — at the speed and sample size a real launch decision needs. If you need normative benchmarks from a syndicated database, the established survey vendors serve that well. If you need to understand why your concept scores what it scores, and what you need to change, that is where Alchemic works best.

	Survey-only concept test	Focus group	Alchemic
Interviews	Hundreds (fixed questions)	6–12 respondents	200+ adaptive interviews
Stimulus formats	PDF or image	Printed or screened	Text, slide, image, video, prototype
Comprehension check	Rarely included	Moderator-dependent	Built in — AI checks before rating
Turnaround	1–2 weeks	3–5 weeks	5 days
Languages	Available, translated post-hoc	One per session	57 natively
The why behind scores	Open-end field only	Partial — moderator-led	AI probes every rating
Sample reach	Panel-dependent	Metro and accessible only	Tier 1–3, WhatsApp, voice
Report format	Scorecard + open-end dump	Topline + notes	Live dashboard + theme reels + verbatim

Voice-first study or compliance-heavy audience? AI Phone Research → WhatsApp-native respondents in low-bandwidth zones? WhatsApp Interviews →

[ use cases ]

Where concept testing with Alchemic works

FMCG packaging and claims

Test packaging designs, front-of-pack claims, and new product positioning across Tier 1–3 markets. Understand which claim resonates, which creates confusion, and which triggers competitive comparison before committing to print runs.

Tech feature prototypes

Show a Figma prototype inline during the interview. The AI probes what's intuitive, what creates hesitation, and whether the value proposition lands with the actual user. Qualitative signal before an engineering sprint.

DTC brand positioning

Test two or three positioning options with your exact customer profile. Which creates emotional connection? Which sounds like every other brand in the space? AI-moderated interviews surface the gap with verbatim evidence.

Retail and assortment decisions

Which SKU to launch first? Which pack size or format? Test with real shoppers in the relevant channel — kiranas, modern trade, QSR, or D2C — before making range decisions that are expensive to reverse.

Service propositions

Test a new insurance plan, fintech feature, or healthcare service package before building operations around it. Catch the objections before a call centre hears them at scale.

B2B and SaaS concepts

Test product positioning and feature names with decision-makers and end users separately — they almost never agree. The CIO and the analyst care about entirely different things. Alchemic runs both conversations, in parallel.

Trusted by brand and insights teams at

[ FAQ ]

Concept testing, frequently asked.

What is concept testing?

Concept testing evaluates a new product idea, feature, positioning, or brand concept with target consumers before launch. It identifies whether the concept is understood, desirable, and likely to drive purchase — at a stage when changes are cheap. Traditional concept tests use surveys; Alchemic runs AI-moderated interviews that get the why behind every rating, not just the numbers.

How does AI concept testing work?

Alchemic's AI turns your brief into an interactive concept card respondents can explore. An AI moderator then interviews each respondent — probing comprehension, appeal, purchase intent, and the reasoning behind every answer. Hundreds of conversations run simultaneously, in the respondent's language, with real-time theme coding.

Monadic vs sequential monadic — which is better?

Monadic design (each respondent sees one concept) avoids order effects and is the gold standard for clean purchase-intent scores. Sequential monadic (each respondent sees all concepts in random order) is efficient for ranking but risks carry-over bias. Use monadic when a score's absolute value matters; sequential monadic when you need to rank concepts head-to-head. Alchemic supports both designs.

How many respondents do I need for concept testing?

For a single-concept monadic test with a broad target, 100–200 interviews typically produce stable themes and reliable quant scores. Multi-concept studies need 200–400 total across cells. Studies requiring robust sub-group cuts need at least 50–80 respondents per sub-group. Alchemic fields 200+ interviews in 5 days as standard.

How fast can concept testing be done?

Brief on Day 1, builder live within 48 hours, full theme-coded report within 5 days — for a standard 200-interview single-market study. Multi-market or large-sample studies typically add 2–3 days. Fieldwork is never the bottleneck — 200 conversations happen in parallel, not in a queue.

What is the difference between concept testing and product testing?

Concept testing evaluates an idea before a product is built or launched. Product testing (IHUT) puts a physical product in consumers' hands to evaluate actual experience. Concept testing is earlier, cheaper, and faster; product testing validates post-production. Alchemic covers both.

How do you test a concept before launch?

Define your target audience and the decision you need to make. Prepare your stimulus — sentence, deck slide, mock packaging, image, or video. Field AI-moderated interviews: comprehension first, then appeal and intent, then open probing on the why. Analyse themes across 200+ conversations. Make the go/no-go call with evidence.

How do you test concepts for FMCG, SaaS, or DTC brands?

FMCG: test packaging and claims with household decision-makers in Tier 1–3 cities, in regional languages. SaaS: test feature names and pricing frames with decision-makers; Alchemic supports Figma prototype stimulus. DTC: test proposition and creative messaging with your exact customer profile. Same platform, different briefing and stimulus.