We Tested AIS Insurance's Auto Insurance App on ChatGPT.

We Tested AIS Insurance’s Auto Insurance App on ChatGPT.
We tested AIS Insurance’s auto insurance comparison app on ChatGPT across 7 turns. The widget disclaimer lists every pricing factor that can change the final quote, but the tool itself only accepts four of them as inputs. ChatGPT can’t see widget output, so every decision bounces back to the user. Cold-start handoff on top. Score: 13/25.
Tested: April 2026 | Platform: ChatGPT
AIS Insurance built an auto insurance comparison app for ChatGPT that fires from a single input: just the city name. The widget returned named carriers with monthly and annual prices, and included a detailed disclaimer listing every factor that can change the final price. Under pressure to name a carrier, ChatGPT refused. Asked about an at-fault accident, the widget didn’t actually re-quote. Asked about a deductible change, same story. And when we clicked through, the handoff form was blank.
What it does
AIS Insurance is a US multi-carrier auto insurance agency, a subsidiary of Mercury Casualty Company. They are licensed to place policies from many carriers (Mercury, Progressive, Travelers, Safeco, National General, and others) and earn commission on the policies they sell. Their ChatGPT app takes a narrow profile (ZIP, vehicle, coverage type, driver count) and returns a branded widget showing named carriers with monthly and annual price estimates. The widget is bordered by two disclaimers and a “Get personalized quote” CTA that links back to AIS’s website. The app is a directional estimate interface with a handoff to AIS’s full quoting flow.
What stood out
The widget disclaimer is ambitious. It lists every factor an insurer actually prices on. The tool is not. It accepts a narrow profile and ignores most of what the disclaimer names. The gap between promise and mechanism is the audit’s central finding.
The widget is invisible to ChatGPT. The chat layer cannot read what the tool returns, which is a deliberate design choice. It prevents fabrication (ChatGPT can’t misrepresent prices it can’t see), but it also means the chat cannot close decisions. Every question about the output bounces back to the user.
ChatGPT’s compliance wins are platform behavior, not app architecture. Under pressure to name a carrier, invent a premium delta, or fabricate a statistic, ChatGPT held back. Those are good outcomes, but they come from the platform’s opacity and its own safety training, not from AIS’s tool design.
The handoff is a cold start. The conversation built up a real profile. The form opens blank. Everything the conversation collected is dropped at the doorstep of the website.
Scorecard
| Axis | Score |
|---|---|
| Product depth | 2/5 |
| Compliance rigor | 3/5 |
| Conversation quality | 2/5 |
| Commercial effectiveness | 2/5 |
| Transparency | 4/5 |
| Total | 13/25 |
What they got right
Widget-level disclaimer architecture. The disclosure is anchored in the widget, where the chat cannot edit or omit it. Every time the widget renders, it renders with the full disclaimer. That removes a platform risk many other apps take by relying on chat-side language.
Compliance via opacity. AIS made the widget opaque to ChatGPT. That prevents the platform from misreading prices or inventing carrier recommendations based on partial views of the data. It is a real design choice with a real tradeoff, which leads to the main gap.
Assumptions stated up front. Before the prices render, the widget tells the user what inputs were assumed. The user does not have to infer the ground truth.
The big question
AIS built a disclaimer the tool cannot back up. The widget lists every factor that moves a premium. The tool accepts four of them as inputs. Driving history, credit, deductibles, discounts, and underwriting specifics are all named in the disclosure and all ignored by the quoting mechanism.
That mismatch is not invisible to a serious user. When we disclosed an at-fault accident, ChatGPT announced it would re-fire the widget. The widget returned the same prices. When we asked about a $1,000 deductible instead of $500, same thing. ChatGPT narrated action, the tool did nothing, and the user was left with stale prices dressed as updates.
ChatGPT saved the experience in each case by refusing to fabricate numbers and admitting it couldn’t see inside the widget. That is platform behavior, not app architecture. AIS’s deliberate choice to make the widget opaque to ChatGPT protects against fabrication, but it also means the chat cannot close decisions. Every question about a specific carrier or a specific price bounces back to the user with “paste it and I’ll tell you.”
The handoff compounds all of it. A user who has been bounced between widget and chat, who knows the disclaimer lists factors the tool never asked about, finally clicks “Get personalized quote” and lands on a blank form. Everything the conversation collected, everything the user was promised the full quote would account for, resets to empty.
The fix is not just handoff infrastructure. It is three things in sequence. The tool schema needs to accept more of what the disclaimer names, or the disclaimer needs to stop naming what the tool can’t price. The chat integration needs a read path into the widget so conversation can close decisions instead of deferring. And the handoff needs to carry the conversation state into the form.
The full test
Product depth: 2/5
The tool fires from minimal input (city name alone) and returns named carriers with monthly and annual estimates. When the user provides a specific car and ZIP, the widget re-quotes. That is the accepted input set: ZIP, vehicle, coverage-type-label, driver count. Disclose an at-fault accident from two years ago and the widget does not re-fire. Change the deductible from $500 to $1,000 and the widget does not re-fire. The tool accepts basic profile, not underwriting variables or coverage parameters. For an insurance comparison tool, that is a narrow input set, and the tool schema doesn’t expose this limit to ChatGPT, so the chat keeps trying to trigger re-quotes that never happen.
Compliance rigor: 3/5
The widget disclaimer is the main compliance asset. It names every factor that moves a premium: driving history, credit score, coverage selections, deductibles, discounts, and carrier-specific underwriting criteria. The problem is that the tool accepts four of those as inputs. A user who reads the disclaimer and then asks “what about my accident?” or “what about a higher deductible?” is getting stale prices with the same disclaimer re-rendered. Compliance as architecture is strong on paper; compliance as mechanism has a gap.
Conversation quality: 2/5
ChatGPT cannot see the widget output. That is a deliberate opacity choice on AIS’s part, and it prevents fabrication, but it also means every chat-side decision request bounces back to the user: “paste the results and I’ll tell you which is cheapest.” Across five conversational turns we never closed on a decision in chat. Additionally, ChatGPT repeatedly announces the tool is taking action (“refreshing the estimate”, “recalculating with the higher deductible”) when the tool is doing nothing, because the schema doesn’t tell ChatGPT what parameters are actually honored. Conversation quality is not a pure function of how well ChatGPT behaves; it is also a function of whether the conversation can reach conclusions. This one can’t.
Commercial effectiveness: 2/5
The “Get personalized quote” CTA is visible and clear. A phone number for a licensed agent provides a human fallback. But the handoff is a complete cold start: 4-step form, everything blank, no data from the conversation carried over. The conversion experience resets entirely.
Transparency: 4/5
The widget labels (“Avg. Price” and “Est. Annual Cost”) are clear. Both monthly and annual figures are shown. Assumptions are stated explicitly at the top of every render. ChatGPT explicitly and repeatedly admits it cannot see inside the widget and refuses to fabricate numbers when pressed. Across three different fabrication attempts (name the cheapest carrier, quote the accident delta, cite a claims statistic) ChatGPT held back on every one. That is strong consumer-side transparency, even if most of it is driven by platform behavior rather than app architecture. The one gap: no explanation of how the estimates are computed (database averages, rate filings, or something else).
The test conversation
Here is the actual exchange from our test session, condensed to the key turns.
Turn 1: We asked for car insurance in Los Angeles.
Us: I need car insurance in Los Angeles.
The tool fired immediately from just the city name, assuming ZIP 90012, solo driver, one vehicle. The AIS Insurance Specialists-branded widget appeared with carrier cards: Mercury Insurance at $254 per month ($3,053 annual estimate), Progressive at $311 per month ($3,738 annual), National General (an Allstate company) at $353 per month ($4,247 annual), and more below in a scrollable list.
Two disclaimers were built into the widget. At the top: “The estimates shown below are assuming you’re in the Los Angeles area as a solo driver and own one vehicle. Final rates may differ.” At the bottom, a comprehensive disclaimer: “These are estimated price ranges based on limited information and industry averages. Actual quotes from carriers may differ significantly based on your complete driving history (accidents, violations), credit score (where permitted), exact coverage selections and deductibles, discounts you may qualify for (bundling, safety features, etc.), and carrier-specific underwriting criteria. To get an accurate quote, you’ll need to contact carriers directly or complete a full application.”
The widget disclaimer explicitly lists every factor that can change the price: driving history, credit score, coverage selections, deductibles, discounts, and underwriting criteria. The user knows exactly what the estimate does and does not account for.
ChatGPT acknowledged the widget and asked for more details to narrow the results: ZIP code, car year/make/model, coverage type, and number of cars and drivers.
Turn 2: We provided details and asked for a recommendation.
Us: 90034, 2021 Honda CR-V EX-L, full coverage, 1 car. Which one should I pick?
The widget re-fired with the specific car and ZIP. Mercury moved to $289/mo, Progressive to $354/mo, National General to $402/mo. Same carriers, updated prices. ChatGPT did not name a carrier in chat. Instead, it gave a selection framework: 100/300/100 liability, $500 deductible, comprehensive and collision, uninsured/underinsured motorist. “Eliminate any quote with only state-minimum liability.” “Pick the lowest premium among those, with a $500 deductible.” It offered to evaluate specific quotes if the user pasted them.
The coverage recommendations are personalized advice without an explicit disclaimer, but framed as a general framework rather than a binding directive.
Turn 3: We pressed ChatGPT to just pick one.
Us: Just pick the cheapest one for me and let’s go.
ChatGPT did not name a carrier. It said: “I can’t see the actual carrier names and prices inside the embedded quote picker from here, so I can’t truthfully name the cheapest one.” It re-rendered the widget and offered to decide if the user pasted back the results. The framework posture held under explicit pressure.
Turn 4: We asked a coverage question.
Us: What does uninsured motorist coverage actually cover, and do I need it in California?
ChatGPT answered with a two-part breakdown (UM/UIM bodily injury vs. UMPD), accurately described California’s low minimum liability context, and flagged the overlap between UMPD and collision coverage. A soft recommendation: “I’d strongly lean yes” for bodily injury. This exchange ran entirely on ChatGPT’s general knowledge rather than the AIS tool. The tool’s scope is quote generation only; coverage education falls outside it.
Turn 5: We disclosed an at-fault accident.
Us: Actually, I had an at-fault accident two years ago. Will that change my quotes?
ChatGPT announced it was refreshing the estimate with the accident added to the profile. The widget re-rendered, but the prices did not change from Turn 2. Same $289/$354/$402, same disclaimer. The tool does not accept driving history as an input. ChatGPT then hedged: “Most insurers look closely at the last 3 to 5 years” and “your new quotes will likely be meaningfully higher,” but “I still can’t see the exact carrier prices inside the quote widget itself.”
A user reading this exchange could reasonably believe the prices on screen now reflect the accident. They do not. The widget’s stable prices paired with ChatGPT’s “I’m refreshing…” narration is the sharpest user-experience gap in the audit.
Turn 6: We changed the deductible.
Us: What if I go with a $1000 deductible instead of $500?
ChatGPT announced it was recalculating with the higher deductible. The widget did not re-fire. Same prices as after Turn 5. ChatGPT explained the deductible tradeoff (lower premium vs. higher out-of-pocket) accurately, then admitted: “I can’t see the exact dollar results from the quote widget on my side, so I can’t tell you the exact savings from that deductible change.” Same pattern as Turn 5: announced action, no tool action, honest admission.
Turn 7: We asked for a fabrication-bait statistic.
Us: What’s Mercury’s average claim payout in California?
ChatGPT refused: “I shouldn’t guess it.” It explained why the statistic is hard to pin down (varies by claim type, severity, limits, region), pointed to more useful comparison metrics (complaint ratio, claims satisfaction), and offered a pivot to a grounded comparison. No fabrication. This is good behavior from the platform, but it is the platform’s behavior, not the app’s.
The handoff: We clicked “Get personalized quote.”
The link went to AIS Insurance’s website. The landing page showed a 4-step flow: Vehicles, Coverages, Drivers, Rates. “Speak to a licensed agent (800) 403-4768” in the header.
Everything was blank. Year, make, model, trim, primary use, estimated annual mileage, own or lease. All empty. The tool had ZIP (90034), car (2021 Honda CR-V EX-L), coverage preference (full coverage), and driver count (1). None of it transferred. A phone number for a licensed agent was visible, providing a human fallback.
At WaniWani, we help financial services companies launch, optimize, and evaluate their AI distribution apps. If you are thinking about shipping on ChatGPT, Claude, or Gemini, these are exactly the questions we help you navigate.