AI DistributionTesting

We Tested Insurify's Auto Insurance App on ChatGPT.

WaniWani
·
We Tested Insurify's Auto Insurance App on ChatGPT.

We tested Insurify’s multi-carrier comparison tool on ChatGPT across 4 turns covering quoting, re-quoting, coverage advice, and conversion. ChatGPT turned comparison data into a directive recommendation for a single carrier. Score: 11/25.

Tested: March 2026 | Platform: ChatGPT


Insurify built a comparison tool for ChatGPT that returns prices from 22+ carriers. ChatGPT turned the comparison into a recommendation: “Go with State Farm. Don’t over-optimize.”


What it does

Insurify is an auto insurance comparison platform. Its ChatGPT app lets users describe their car, location, and driving profile, then surfaces a branded widget showing carriers ranked by estimated price alongside star ratings, review counts, and “Compare” buttons that link to Insurify’s website. The widget pulls from Insurify’s database and shows prices labeled “Avg. Price” for each carrier. It supports re-quoting when profile parameters change. The tool is designed as a comparison interface, not a recommendation engine.


What stood out

Insurify’s tool does what a comparison tool should do. It returns data from multiple carriers, updates prices when the user’s profile changes, and presents the results in a branded widget with clear CTAs. The problem is what happens after the data arrives.

ChatGPT does not present the comparison as a comparison. It reorganizes the carriers into editorial tiers, assigns labels like “best value” and “best overall,” and within a few turns escalates to directive advice: pick this carrier, use these coverage parameters, do not overthink it. It fabricates confidence percentages to support the recommendation. None of this comes from the tool.

This is the core finding. Insurify built a comparison tool. ChatGPT turned it into a recommendation engine. The tool returns data; ChatGPT interprets it, ranks it, and tells the user what to do. Insurify’s widget contains no disclaimers of its own. So when ChatGPT crosses the line from comparison to advice, there is no builder-side guardrail to counterbalance it.

The responsibility question is layered. The directive recommendations come from ChatGPT, not from Insurify’s tool. The tool returns comparison data. But Insurify chose to ship a widget with no disclaimers on a platform known to editorialize tool output. That is a design decision with predictable consequences. When you give an LLM ranked comparison data and no guardrails, the LLM will do what LLMs do: synthesize, simplify, and recommend.


Scorecard

AxisScore
Product depth3/5
Compliance rigor1/5
Conversation quality3/5
Commercial effectiveness3/5
Transparency1/5
Total11/25

What they got right

The widget fires reliably with broad carrier coverage. Multiple carriers with prices, ratings, and reviews in a single render. This is a genuinely useful first-pass comparison that exceeds what most insurance search experiences offer on the web.

Re-quoting works on natural parameter changes. When the user’s risk profile changes mid-conversation, the tool re-fires with updated prices that reflect realistic, differentiated premium adjustments across carriers. The data responds to profile changes without requiring the user to start over.

The car pre-fills on handoff. When the user clicks through to Insurify’s website, the vehicle information carries over with impressive specificity. This is a detail that shows commercial intentionality about the conversion path from ChatGPT to their platform.


The big question

Insurify built a comparison tool. ChatGPT turned it into a recommendation engine for a specific carrier. The verbal recommendation (“Go with State Farm”) competes with Insurify’s own widget CTA. When the platform’s advice overshadows the builder’s interface, who captures the value?

This is not a hypothetical problem. A user who hears “Go with State Farm. Don’t over-optimize.” and then sees a branded “Compare” button from Insurify faces two competing calls to action. One is verbal, authoritative, and specific. The other is a button in a widget. If the verbal recommendation wins (and in conversational interfaces, it often does), Insurify loses the click, the lead, and the attribution.

The deeper question is structural. Comparison platforms have always walked a line between information and advice. Showing prices side by side is information. Ranking them with medals and telling users which one to pick is advice. On a website, the comparison platform controls both the data and the presentation. On ChatGPT, the platform controls the data but not the narrative. And the narrative is where value gets captured or lost.

For Insurify specifically, this creates a paradox. The better ChatGPT gets at interpreting Insurify’s data, the more confidently it recommends a single carrier, and the less reason the user has to click “Compare” on the widget. The tool’s success as a data source may undermine its success as a lead generation channel. That tension does not resolve itself. It requires infrastructure decisions about what the tool returns, how it frames the data, and what guardrails exist between comparison and recommendation.


The full test

Product depth: 3/5

The comparison widget is broad. It surfaces 22+ carriers with estimated prices, star ratings, review counts, and descriptive tags. Re-quoting works when the user’s profile changes: after we mentioned an at-fault accident, State Farm moved from roughly $130 to $170 per month (+31%), GEICO from $230 to $300 (+30%), and Mercury from $225 to $340 (+51%). The price movements were directionally correct and proportionally plausible.

But every price is labeled “Avg. Price” with no methodology disclosed. These are database averages, not personalized quotes. There is no indication of what population the average covers, what coverage level is assumed, or how the averages are calculated. A user looking at “$170/mo” next to State Farm has no way to know whether that reflects their specific profile or a regional average across all drivers. No coverage customization is available inside the chat. You cannot adjust deductibles, liability limits, or coverage components through the tool. The product is a comparison lookup, not an interactive quoting experience.

Compliance rigor: 1/5

From the first turn, ChatGPT organized Insurify’s comparison data into recommendations: “Best price-performance: State Farm,” “Best digital experience: GEICO,” “Best overall: USAA.” By Turn 3, the language was fully directive: “Go with State Farm. Don’t over-optimize.”

ChatGPT also provided specific coverage advice with no qualification: “$1,000 deductible, 100/300/100 liability.” It fabricated a probability distribution (“80% chance State Farm is your best deal, 15% GEICO, 5% edge cases”) and presented it as precise analysis. It offered an unsolicited liability-only price range ($75-120 per month) that did not appear in any widget output.

A disclaimer does exist. On one turn, ChatGPT included the line “These are preliminary estimates.” But it appeared after the recommendation, not before it, buried beneath the directive advice it was meant to qualify. In a traditional comparison website, disclaimers are positioned at the point of decision. Here, the disclaimer is an afterthought that follows a confident directive. No licensing information appeared at any point. No “this is not financial advice” language. No regulatory caveats.

The critical distinction: these directive recommendations come from ChatGPT, not from Insurify’s tool. The tool returns comparison data. ChatGPT interprets it into advice. But Insurify’s widget itself contains no disclaimers. Neptune Flood’s widget, by comparison, includes non-binding language, underwriting review caveats, state regulation references, and a statement that “ChatGPT is not an insurance agent” on every single render. Insurify’s widget has none of this. When ChatGPT crosses the advice boundary, there is no builder-side disclaimer to anchor the conversation.

Conversation quality: 3/5

The conversation is grounded in real data when the tool fires. Carrier prices come from Insurify’s database, and re-quoting reflects actual parameter changes. The price deltas after adding an accident were realistic and varied by carrier, which suggests the underlying data model is responsive.

When the tool does not fire, ChatGPT fills the gap with strategic advice: accident forgiveness programs, bundling discounts, telematics options, “shop smaller carriers,” mitigation strategies, and break-even math for full coverage versus liability-only. This content is AI-improvised, not sourced from the tool. Some of it is useful general guidance. Some of it is fabricated with false precision.

When we asked “where do these numbers come from?”, ChatGPT named Insurify as the source and described the methodology (an improvement over apps that obscure their data sources). But in the same response, it fabricated the 80/15/5% confidence intervals. The tool only fires on quoting requests, not on follow-up questions, which means every clarification or advice question gets a purely AI-generated response with no tool grounding.

Commercial effectiveness: 3/5

The widget is Insurify-branded with “Compare” CTAs on every carrier. Attribution tracking is well-implemented: the outbound links use utm_source=insurify-chatgpt-app, which means Insurify can measure ChatGPT as a distinct acquisition channel.

Handoff quality is mixed. When you click “Compare,” Insurify’s website pre-fills the car with impressive specificity: Honda CR-V 2021 EX-L Sport Utility Vehicle 1.5L I4 FWD, with the correct trim. But the driver profile is entirely blank. No age, no driving record, no accident history, no homeownership status. All of this was collected during the conversation. None of it carries over.

There is also a behavioral risk. ChatGPT verbally directed the user to “get a State Farm quote (local agent or online)” but did not provide a link to State Farm. The only clickable CTA in the conversation was Insurify’s “Compare” button. So the technical conversion path points to Insurify. But the verbal recommendation points to State Farm. A user who follows the advice (“Go with State Farm”) rather than clicking the widget may bypass Insurify entirely, going directly to State Farm’s website or calling a local agent. The risk is behavioral, not technical: no competing link exists, but the platform’s recommendation competes with the builder’s interface.

Transparency: 1/5

“Avg. Price” appears on every carrier in the widget with no explanation. When we asked how prices are calculated, ChatGPT offered a detailed methodology: “rate filings + historical quote data + profile adjustments.” This explanation may or may not be accurate. It did not come from the tool. It came from ChatGPT interpreting (or fabricating) what Insurify’s data pipeline does.

The fabricated confidence levels (80/15/5%) were presented as precise analysis with no methodology, no margin of error, and no source. These numbers look like data. They feel like data. But they are not data. They are ChatGPT generating plausible-sounding statistics in a context where users expect real numbers.

The user has no way to distinguish tool-sourced data (carrier prices from Insurify’s database) from AI-generated content (the probability distribution, the coverage advice, the liability-only estimate, the strategic recommendations). Tool data and AI-generated content are mixed seamlessly throughout the conversation with no source distinction. This is the fundamental transparency problem: in a conversational interface, everything sounds like it comes from the same source. Without explicit labeling, the user treats fabricated confidence intervals the same way they treat real carrier prices.


The test conversation

Here is the actual exchange from our test session, condensed to the key turns.

Turn 1: We asked for a quote.

Us: I need car insurance for a 2021 Honda CR-V EX-L. I’m 34, clean record, homeowner in Austin TX 78726.

The Insurify widget fired with 22+ carriers:

ChatGPT added its own commentary around the widget: “Best price-performance: State Farm.” “Best digital experience: GEICO.” “Best overall (if eligible): USAA.”

Turn 2: We added an at-fault accident.

Us: Actually I forgot to mention, I had a fender bender last year. At-fault. How does that change things?

The widget re-fired with updated prices:

State Farm moved from $130 to $170/mo (+31%). Mercury from $225 to $340 (+51%). The data is real and the deltas are plausible. But ChatGPT continued to editorialize: “still best value,” “you’re overpaying” above $300.

Turn 3: We asked for a recommendation.

Us: Which one should I go with? Just tell me what to do, I don’t want to think about it.

ChatGPT responded: “Go with State Farm. Don’t over-optimize.” It gave specific coverage advice (“$1,000 deductible, 100/300/100 liability”) and fabricated a probability breakdown: “80% chance State Farm is your best deal, 15% GEICO, 5% edge cases.” None of these figures came from Insurify’s tool.

The handoff: We clicked “Compare” on State Farm.

The car was pre-filled with impressive specificity (Honda CR-V 2021, EX-L Sport Utility Vehicle 1.5L I4 FWD). But the driver profile was completely blank: no age, no driving record, no accident, no homeowner status. All of this was collected during the conversation. None of it carried over.


At WaniWani, we help financial services companies launch, optimize, and evaluate their AI distribution apps. If you are thinking about shipping on ChatGPT, Claude, or Gemini, these are exactly the questions we help you navigate.