We Tested TurboTax's Tax Estimate App on ChatGPT.

We tested Intuit TurboTax’s tax estimate and filing app on ChatGPT across 4 turns covering estimation, re-quoting with deductions, IRA handling, and conversion. The tool fires at the conversion moment with branded filing options. Score: 22/25.

Tested: March 2026 | Platform: ChatGPT

What it does

TurboTax is Intuit’s consumer tax preparation platform. Its ChatGPT app lets users describe their tax situation, then calculates an estimated federal tax balance using structured inputs. The tool handles parameter changes (adding deductions, adjusting contribution types) and recalculates accordingly. It surfaces branded filing options at the point of decision, with four product tiers ranging from self-service to full expert preparation. A separate tool endpoint offers to connect users with tax experts. The app is designed as a full-funnel distribution tool, not just a knowledge resource.

What stood out

TurboTax solved the problem most other apps we have tested struggle with: the tool fires at the conversion moment.

In most apps, the tool fires when the user asks for a price or a quote. When the user asks “how do I buy this?” or “what should I do next?”, the tool goes silent and ChatGPT becomes a generic guide. The conversion moment, the most commercially valuable point in the conversation, gets no tool support. TurboTax breaks the pattern. When the user asked “can you file my taxes for me?”, the filing options widget rendered with branded tiers and “Get started” CTAs. The tool did not go silent. It showed up with the product catalog.

This design choice has cascading effects. Because the tool fires at conversion, ChatGPT has structured product data to work with instead of improvising. It recommended Full Service for the user’s freelance situation, a recommendation grounded in the actual tier descriptions rather than generic advice. The tool also offered an expert connect endpoint, giving the conversation a path to human handoff without leaving the chat.

The compliance handling was equally deliberate. When we asked whether to switch from Roth to Traditional IRA, a question with real financial consequences, ChatGPT delivered a three-layer response. It answered the factual question. It flagged the complexity. It referred to a professional. This is what a responsible advisor does: answer, qualify, and refer.

The widget itself carries the key disclaimer (“Actual amount may vary”) in a way ChatGPT cannot strip or rewrite. The builder controls the compliance message by embedding it in the widget, not by hoping ChatGPT will add a disclaimer on its own.

Where TurboTax falls short is the handoff. Several data points were collected across the conversation. None of them carried into the landing page. The conversion design works inside the chat. The handoff experience suggests product integration stops at the marketing page.

Scorecard

Axis	Score
Product depth	5/5
Compliance rigor	4/5
Conversation quality	5/5
Commercial effectiveness	4/5
Transparency	4/5
Total	22/25

What they got right

The tool fires at the conversion moment. When the user asked “can you file my taxes?”, the filing options widget rendered with branded tiers. Most apps go silent at this moment and let ChatGPT improvise a next step. TurboTax shows up with the product catalog.

Multiple tool endpoints cover the full funnel. Tax estimate, filing options, and expert connect. The app handles estimation, product selection, and professional referral inside ChatGPT, not just one of the three.

ChatGPT’s behavior stays grounded throughout. The Roth vs. Traditional correction, the self-employment tax breakdown, the layered compliance response on a sensitive question. When the tool provides deep, structured data, ChatGPT stops improvising and starts interpreting.

The big question

TurboTax shows that the conversion moment can be tool-supported. The filing options widget answers a problem that recurs across the apps we have tested: when the user is ready to act, the tool disappears. TurboTax built a tool that shows up when the user wants to buy.

But the handoff raises a deeper question. Multiple structured data points collected. A clear product tier recommendation. And the landing page ignores all of it. The conversation did the work of qualification, education, and product selection. The website starts from zero.

This is not a TurboTax-specific gap. It recurs in every app we have audited so far. TurboTax makes it more visible because the conversation is more effective. The better the ChatGPT experience, the more jarring the cold-start landing page feels.

The path from 22 to 25 is straightforward. Carry the data. The conversation already has it, structured and ready to hand off. The missing piece is a bridge between the chat and the form. For now, the tool fires at conversion, the widget contains the disclaimer, and the conversation stays grounded. The only thing it does not do yet is close the loop.

The full test

Product depth: 5/5

The tool performs real tax calculations from structured inputs. It returned a $6,104 balance for a freelancer earning $95,000 with $15,000 in quarterly payments. When we added $12,000 in expenses and a $6,000 Roth IRA, it recalculated to roughly $2,450, correctly applying the expenses and correctly excluding the Roth contribution.

Multiple tool endpoints cover the funnel. Tax estimate handles calculations and re-quotes. Filing options surfaces product tiers at the decision moment. Expert connect routes the user to a tax professional. This is a multi-endpoint application that mirrors the full TurboTax conversion funnel inside ChatGPT.

The Roth vs. Traditional handling is worth isolating. The tool was designed to only accept deductible contributions. When we submitted a Roth IRA, the tool did not apply it. ChatGPT caught this distinction before running the tool and confirmed it after.

Compliance rigor: 4/5

The widget disclaimer (“Actual amount may vary”) is embedded in the rendered output. The language “Estimated 2025 balance due” is precise. These are builder-controlled messages that ChatGPT cannot modify or omit. Compliance language belongs in the widget, not in the AI’s improvised text.

The three-layer compliance response on the Roth-to-Traditional question stood out: factual answer, complexity flag, professional referral. This behavior was not replicated consistently across every turn. On Turn 2, ChatGPT offered specific optimization advice (Solo 401(k) strategy, Traditional IRA savings estimates) with no disclaimer. Accurate, but personalized tax planning without a “consult a professional” qualifier.

The missing piece is an explicit “this is not tax advice” statement. The widget disclaims the estimate. ChatGPT qualifies its IRA advice. But no blanket disclaimer appears anywhere in the conversation. For a category as regulated as tax preparation, that gap is noticeable.

Conversation quality: 5/5

Every ChatGPT addition to the tool’s output was accurate, relevant, and verifiable against the tool data.

On Turn 2, ChatGPT interpreted the $6,104 widget figure and added a self-employment tax breakdown ($13,400 for Social Security and Medicare), an effective rate calculation (around 13%), and the observation that zero deductions made the estimate conservative. On Turn 3, it caught the Roth vs. Traditional distinction before the tool ran and confirmed it after. On Turn 4, it navigated IRA conversion (a genuinely complex topic) with measured, accurate language.

ChatGPT did not fabricate numbers. It did not invent confidence intervals. It did not override the tool’s output. When it added context, the additions were checkable against the tool’s calculations and standard tax rules.

Commercial effectiveness: 4/5

The widget design is dense and conversion-oriented. A single render contains the tax estimate, four product tiers with descriptions, and “Get started” CTAs. The user goes from estimate to product selection in one step. The widget also includes an expert upsell: “Want to talk to a tax expert live? Send a follow-up ask to connect you with an expert.”

The gap is the handoff. Despite collecting structured data across the conversation, nothing carries over. The landing page is a marketing page, not a pre-filled filing flow. From the user’s perspective, they start from scratch.

Transparency: 4/5

The widget labels are precise. “Estimated 2025 balance due” sets expectations. “Actual amount may vary” disclaims the figure. The filing tier descriptions are clear about what each product includes. The user knows what they are looking at.

The math is traceable. The user can follow: $95,000 income minus $12,000 expenses equals roughly $83,000 taxable, producing roughly $17,450 in federal tax, minus $15,000 in quarterly payments, leaving a $2,450 balance. ChatGPT showed this progression explicitly. Each step is verifiable.

Where transparency weakens is in the improvised breakdown. The individual tax components (self-employment tax of $13,400, income tax at the 22% bracket) came from ChatGPT’s interpretation, not from the widget. The widget showed a single number; ChatGPT disaggregated it. The user has no way to confirm whether those sub-figures came from the tool or from ChatGPT’s calculations.

The test conversation

Here is the actual exchange from our test session, condensed to the key turns.

Turn 1: We described our tax situation.

Us: I’m a freelancer in New York, made about $95,000 this year. How much will I owe in taxes?

ChatGPT did not fire the tool. It asked for five required inputs before running any calculation: filing status, age, dependent status, income type (freelance vs. W-2), and federal tax already paid. No improvised estimates. No generic ranges. The tool needs structured data, and ChatGPT correctly gated the call.

Turn 2: We provided the inputs. The tool calculated a $6,104 balance due.

Us: Single, 32 years old, nobody claims me. All freelance income, no W-2. I paid $15,000 in quarterly estimates so far.

ChatGPT requested permission to run the calculation. The permission screen showed exactly what data would be shared: financial information ($95,000 business income) and date of birth (32 years old), with a note at the bottom: “Using tools comes with risks. Learn more.”

The TurboTax widget rendered with a real calculation: “$6,104 Estimated 2025 balance due.” Directly on the widget: “Actual amount may vary.” Below the estimate, four filing options appeared: Do it Yourself (“File your own taxes confidently”), Expert Assist (“File confidently with experts by your side”), Expert Full Service (“An expert will do your taxes for you”), and Local Expert Full Service. Each tier had a “Get started” CTA.

ChatGPT added context below the widget: total federal tax roughly $21,100, self-employment tax roughly $13,400, effective rate around 13%. It flagged that the estimate assumed zero deductions and was “likely overestimated,” then asked for business expenses and retirement contributions.

Turn 3: We added deductions. The tool recalculated to $2,450.

Us: I spent about $12,000 on business expenses this year and put $6,000 into a Roth IRA. Can you recalculate?

The tool re-fired with updated inputs. Balance dropped from $6,104 to roughly $2,450. The $12,000 in business expenses reduced taxable income. The $6,000 Roth IRA contribution did not.

ChatGPT caught the Roth vs. Traditional distinction before running the tool: “Roth IRA usually doesn’t reduce taxable income.” After the recalculation, it confirmed: “Your $6,000 Roth IRA does NOT reduce your taxes. It’s post-tax.” The tool was designed to only accept deductible contributions, and ChatGPT correctly explained why the Roth was excluded.

Turn 4: We asked about switching to a Traditional IRA and filing directly.

Us: Should I switch my Roth to a Traditional IRA to save on taxes? And can you just file my taxes for me right now?

The tool fired again, this time rendering the filing options widget with four tiers.

ChatGPT’s response had three layers. First, a factual answer: switching that $6,000 to Traditional would likely reduce what you owe. Second, a complexity flag: “Moving money from Roth to Traditional is not as simple as switching after the fact, and eligibility/deductibility details matter.” Third, a professional referral: “A tax expert is the right place to confirm that before you act.”

It was honest about its limitation (“I can’t directly file and submit your tax return myself here”) and recommended Full Service for the user’s freelance situation, directing to the highest-value tier.

The handoff: We clicked “Get started” on Expert Full Service.

The landing page was a TurboTax Full Service marketing page. Not a form. No fields pre-filled. The user had provided income ($95,000), filing status (single), expenses ($12,000), Roth IRA ($6,000), age (32), and quarterly payments ($15,000). None of it transferred. The conversion design is strong. The handoff execution loses all context.

At WaniWani, we help financial services companies launch, optimize, and evaluate their AI distribution apps. If you are thinking about launching on ChatGPT, Claude, or Gemini, these are exactly the questions we help you navigate.