ResearchControlled experimentApril 21, 202625 min read

Influencing app recommendations in conversational answer engines through published content alone

A 33-day controlled experiment measuring whether content on a fresh, unauthoritative domain can, absent any other marketing input, shift how ChatGPT describes and recommends a ChatGPT app. 74 articles. 1,881 prompt firings across ChatGPT, Perplexity, and Google AI Overview. One clean before-and-after.

Abstract

We tested whether content published on a fresh, unauthoritative domain can, absent any other marketing input, influence how three independent conversational answer engines (ChatGPT, Perplexity, Google AI Overview) describe and recommend a ChatGPT app. We selected a low-authority, independent-developer subject (NA Drink Finder, built by the operator of beerfordriving.com), published 74 articles on a fresh domain, and tracked 1,881 prompt firings across 19 prompts and 3 providers for 33 days. We pre-registered a prompt taxonomy (Class A: literal app name; Class B: platform-anchored; Class C: generic category) and a five-tier qualitative scoring of how each response describes the app.

Primary finding.At baseline, 0 of 22 app-mention responses described NA Drink Finder as a ChatGPT integration; 45% described it as a mobile app or denied the ChatGPT version existed. By endpoint, 46% of app-mention responses described it correctly; 17% still described it as a mobile app; 0% denied its existence. The shift is monotonic and statistically distinguishable from baseline on platform-anchored queries where the subject's name does not appear in the prompt itself (0% → 30% clean ChatGPT-app description rate).

Mechanism. Responses that cite our content describe the app correctly 2.3× more often and misdescribe it as a mobile app 8.3× less often than responses that do not. The @NA Drink Finder invocation syntax, absent from every baseline response, appears in up to 50% of endpoint responses.

Contribution. Evidence that a published content corpus can shift the retrieval graph of ChatGPT on branded and platform-anchored prompts, with effect sizes large enough to change the user-visible description of the product. On Perplexity, our content achieved even higher citation rates (27% of responses). Effects on generic-category prompts were null at the direct-surfacing level but positive at the citation-authority level.

01

Motivation

Since the launch of the ChatGPT Apps directory in late 2025, app developers have faced a recurring question: what, if anything, can be done to increase the probability that a conversational answer engine recommends a given app inside a user's conversation? The recommendation layer is partially opaque, there is no official playbook, and the most commonly stated hypothesis (that published SEO-style content can move the dial) has not been cleanly tested.

Testing the hypothesis on an established brand introduces confounding signals that cannot be cleanly separated from the effect of new content: existing domain authority, backlinks, press coverage, and organic brand mentions. Waniwani does run analogous content programs on client surfaces, but those results are governed by client confidentiality and cannot be published. The experiment reported here is the one we can share publicly: run on a ChatGPT app built by an independent developer, on a content domain we controlled end to end, with no other marketing input.

02

How answer engines handle things they don't already know

The recommendation mechanics inside ChatGPT, Perplexity, and Google AI Overview are partially opaque. What is publicly understood, however, is the retrieval-augmented architecture that governs responses to queries whose answers are not reliably present in the model's pre-training data.

When a user asks about something time-sensitive, niche, or post-training-cutoff, the engine issues a web lookup and conditions its response on what it retrieves. ChatGPT Search retrieves primarily from Bing. Perplexity runs its own retrieval layer. Google AI Overview retrieves from Google's index. In each case, the documents that rank for the query at retrieval time shape the text the model produces.

ChatGPT apps fall squarely inside this regime. A third-party ChatGPT app is not part of any model's pre-training corpus, the Apps directory is post-cutoff for most model versions, and any individual app is a sparse fact in the model's parametric memory. When a user's query contains a hook the retrieval layer can match to the app, the text the model produces is shaped by what retrieval returns.

Two layers are worth distinguishing. OpenAI's Apps SDK documentation describes a metadata-based mechanism for invoking an already-installedconnector — the model selecting when to call the connector based on the app's name, description, parameter documentation, and hint annotations.1That is the in-conversation tool-invocation layer. Our experiment did not test it: we did not modify the app's connector metadata, we do not own the app, and we make no observations about whether or how that mechanism fired.

The layer our experiment addresses is upstream of installation: whether and how the model describes and recommends the app to a user who has not yet installed it. That layer runs on web retrieval over external documents. Our intervention acted only on this path — we published external content, did not touch the connector metadata, and measured the changes in the model's descriptions over time. All findings reported here are scoped to this retrieval path.

This exposes a specific, testable mechanism: if we publish documents that describe a ChatGPT app in a structured, unambiguous way, and those documents are indexed and ranked by the retrieval layer, then the answer engine's responses should converge toward the content of those documents over time. That is the mechanism the experiment tests.

03

The three user journeys our content is trying to shape

The experiment measures model behavior. What we actually want to influence is user behavior: the sequence that starts with a user asking an answer engine a question and ends with that user installing and using the app. Three distinct user journeys map to the three prompt classes we monitored, and the structure generalizes to any app deployed on a conversational answer engine, provided you know your user well enough to recognize which state they are in.

Who is asking

Three prompt classes, three very different outcomes

Share of responses in which the app was surfaced, by the linguistic anchor of the prompt. Same content corpus, same 33 days.

Class A
0%
App-name-direct
“NA Drink Finder app ChatGPT”
1 tracked prompt
The user already knows the app. Nearly every response surfaces it. Your baseline: if this fails, your content isn't reaching the retrieval layer at all.
Class B
0%
Platform-anchored
“is there a non-alcoholic beer finder on ChatGPT?”
6 tracked prompts
The user has a precise, use-case-shaped need. Where content investment actually converts.
Class C
0.0%
Generic category
“where can I find non-alcoholic beer in Paris?”
12 tracked prompts
The user has the raw problem. Response genre is a list of options, not an app recommendation. Content builds brand authority for future Class A/B queries.
Pooled across 1,881 firings, 19 prompts, 3 providers

Journey A — the user already knows your app.They have heard about it somewhere and come to the engine to find or confirm it. The query contains the app's name or a close variant. In our experiment: "NA Drink Finder app ChatGPT."

Journey B — the user has a precise, use-case-shaped need. The question is narrow enough that only a small set of apps could plausibly answer it. The anchor does not need to be platform-shaped; it just has to describe the need with enough specificity that only a narrow set of products could match. In our experiment, six tracked prompts fell in this class, including "is there a non-alcoholic beer finder on ChatGPT?" and "find places that serve non-alcoholic beer on ChatGPT." For a different vertical this shape could look like a user asking mid-conversation "can I get a quote here?" after anchoring on a specific carrier — same prompt shape, different category, not a prompt we tested. This is the journey a content strategy can most directly catalyze.

Journey C — the user has the raw problem, no narrowing. Generic category questions whose response template is a list of options, not a tool recommendation. Journey C is not where apps get recommended in the first turn; it is where brands get cited alongside competitors, which feeds future Journey A and B queries.

The structure is portable. To apply it to any app on an answer engine, you need a working user model: who is the user, what is their state of awareness when they open the engine, and what specifically do they type in each state. The journeys are not defined by platform vocabulary. They are defined by anchor specificity.

04

Research questions and pre-registered hypotheses

The primary research question: can published content alone, absent any other marketing input, influence whether and how conversational answer engines recommend a ChatGPT app inside a response?

Five hypotheses were pre-registered before analysis:

  • H1 (surfacing). Prompts containing an app, brand, or platform anchor will produce higher app-surfacing rates than prompts without such an anchor.
  • H2 (anchor strength). Prompts containing the literal app name will produce higher surfacing rates than prompts with weaker anchors.
  • H3 (citation causation). Content-citation rate for the experimental corpus will correlate positively with app-surfacing rate.
  • H4 (provider variation). Actionable install-instruction rate among surfaced responses will differ by provider.
  • H5 (temporal compounding). Actionable install-instruction rate and correct-platform description rate will rise over the observation window.

A sixth hypothesis was formulated after baseline responses were reviewed and is reported as exploratory:

  • H6 (platform-description shift, exploratory). At baseline, the models describe the subject as a mobile app or deny its existence as a ChatGPT app. Over the observation window, this description will shift toward correct-platform framing.
05

Methodology

5.1 Subject selection

The subject was NA Drink Finder, a ChatGPT app in the Apps directory that helps users locate venues serving non-alcoholic beer. Built by the independent operator of beerfordriving.com, a venue-discovery platform for non-alcoholic drinks. Baseline conditions:

InputStatus at study start
DeveloperIndependent niche operator (beerfordriving.com)
Product siteExists, standard venue-discovery SEO content, no references to the ChatGPT app
Press coverageNone identified
Backlinks pointing at the ChatGPT appNone identified
Branded social presenceMinimal
Prior GEO-directed contentNone
Sibling product"NA Beer Finder", a native iOS/Android app from the same developer

The sibling product is a confound: the models' pre-existing knowledge of "NA Beer Finder" (a mobile app) can bleed into responses about "NA Drink Finder" (the ChatGPT app) through name similarity and shared developer.

5.2 Prompt taxonomy

19 test prompts were classified into three classes by linguistic anchor, before outcomes were tallied: Class A (literal app name; 1 prompt), Class B (platform-anchored without naming the app; 6 prompts), Class C (generic category, no app/brand/platform reference; 12 prompts).

OpenAI's Apps SDK metadata-optimization guidance recommends assembling a "golden prompt set" across three types: direct prompts("users explicitly name your product or data source"), indirect prompts("users describe the outcome they want without naming your tool"), and negative prompts("cases where built-in tools or other connectors should handle the request").1We borrow that three-way classification as a framework for our prompt list, though our experiment tests a different mechanism than the one OpenAI's guidance is written for (external-retrieval recommendation, not installed-connector invocation — see Section 2).

Under OpenAI's framework, our Class A corresponds to direct prompts (the user names the app). Our Class B and Class C would both fall within indirect prompts — in each, the user describes an outcome without naming the app. We subdivide indirect prompts based on whether the prompt still contains a platform, tool, or category anchor ("on ChatGPT", "app that helps", "beer finder"): Class B has one; Class C does not. This subdivision is our empirical contribution. It turned out to be load-bearing in our results — the two subtypes produce an order-of-magnitude difference in surfacing rates (11.1% vs 0.2%), even though OpenAI's framework treats both as a single category.

OpenAI's third type, negative prompts, is not tested here.

5.3 Intervention: the content corpus

74 articles on non-alcoholic-beer.com, a fresh domain with no external authority, no inbound backlinks, no paid distribution, no social amplification. Five clusters:

ClusterArticlesTarget prompt shape
Foundation / evergreen10"best NA beer 2026", health, calories, athletes, Dry January
Geo-targeted cities19"best NA beer in [city]" across 19 cities
AI / app-discovery~22"NA Drink Finder app", "is there a beer finder on ChatGPT"
Occasion / lifestyle~16date night, designated drivers, parties, pregnancy
Category / science~4stouts, NA vs kombucha, science of NA taste

Common template: direct answer in the first 40 to 60 words, H2/H3 heading hierarchy, JSON-LD structured data (SoftwareApplication, FAQPage, HowTo), and the canonical install block (below) in every article.

Install NA Drink Finder from the ChatGPT Apps tab. Then type @NA Drink Finder [your query] in any conversation.

The install block, present verbatim in all 74 articles

5.4 Technical stack

  • /llms.txt curated index for LLM crawlers
  • /robots.txt allowlisting GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot
  • Sitemap submitted to Bing Webmaster Tools
  • No paid distribution, manual outreach, or backlink building

5.5 Instrumentation and observation window

Tracking ran on the waniwani.ai platform. Each of the 19 prompts was fired daily against ChatGPT, Perplexity, and Google AI Overview.

ParameterValue
Study window2026-03-20 to 2026-04-21 (33 days)
Unique prompts19
Providers3 (ChatGPT, Perplexity, Google AI Overview)
Total firings1,881
Successful firings1,810 (96.2%)
Mean firings per prompt and provider~33

5.6 Dependent variables and qualitative scoring

Five quantitative dependent variables per response: surfacing (literal app name in response), App-tab reference, @invoke syntax, content citation of non-alcoholic-beer.com, and sibling-product reference. Plus a five-tier qualitative scoring applied to the ±400-character window around every surfacing:

TierDefinition
T1 Clean ChatGPT-appUnambiguously describes NA Drink Finder as a ChatGPT integration, with App-tab or @invoke framing, no mobile-app signals
T2 ChatGPT-dominantPrimarily describes the ChatGPT integration; may correctly mention the mobile app separately
T3 AmbiguousName-drop without clear platform framing
T4 Mobile-appDescribes NA Drink Finder as a mobile app (wrong platform)
T5 Denies ChatGPT appExplicitly states the ChatGPT version does not exist
06

Results

6.1 Core finding: description of the subject shifted monotonically

The clearest way to see the result is in the distribution of how surfaced responses described the app at baseline vs endpoint.

The headline result

In 33 days, three AI engines went from denying the app existed to giving users the correct install steps

Share of app-mention responses, week 1 baseline vs week 5 endpoint. Pooled across ChatGPT, Perplexity, and Google AI Overview.

Correctly described as a ChatGPT app
Week 14.5%
Week 545.7%

A 10x shift in how the model frames the product.

Described as a mobile app
(wrong platform)
Week 140.9%
Week 517.1%

Wrong-platform descriptions more than halved.

Denied the app exists
Week 14.5%
Week 50.0%

The outright denial disappeared after week 1.

Source: WaniWani tracking instrumentation, 1,881 firings, March 20 to April 21, 2026
TierBaseline (n=22)Middle (n=94)Endpoint (n=35)
T1 Clean ChatGPT-app0.0%18.1%11.4%
T2 ChatGPT-dominant4.5%21.3%34.3%
T1+T2 combined4.5%39.4%45.7%
T3 Ambiguous50.0%30.9%37.1%
T4 Mobile-app (wrong)40.9%29.8%17.1%
T5 Denies ChatGPT app4.5%0.0%0.0%

Correct-platform description grew 10× in relative terms. Wrong-platform description fell by 58%. Explicit denial disappeared after the first week.

6.2 H1: surfacing by prompt class

ClassFiringsSurfacedRate95% CI
A. App-name-direct978486.6%78.4 – 92.0%
B. Platform-anchored5866511.1%8.8 – 13.9%
C. Generic category1,12720.2%0.0 – 0.6%

Class A+B pooled (149/683 = 21.8%) vs Class C (2/1,127 = 0.2%): Fisher's exact, one-sided, p ≈ 6 × 10⁻⁶⁵. H1 supported.

6.3 H2: anchor strength

Class A surfacing 86.6%; Class B surfacing 11.1%. Fisher's exact, p ≈ 8 × 10⁻⁵². The literal app name is roughly an 8× stronger anchor than any other reference tested. H2 supported.

6.4 Per-prompt results with 95% confidence intervals

ClassnSurfacedRate95% CIPrompt
A978486.6%78.4–92.0%NA Drink Finder app ChatGPT
B992727.3%19.5–36.8%Is there a non-alcoholic beer finder on ChatGPT?
B951010.5%5.8–18.3%Can ChatGPT help me find places that serve non-alcoholic beer?
B9899.2%4.9–16.5%Find places that serve non-alcoholic beer on ChatGPT
B9966.1%2.8–12.6%Where can I find NA beer near me using AI
B9899.2%4.9–16.5%Do you have an app that helps me find NA beer?
B9744.1%1.6–10.1%Can ChatGPT help me find non-alcoholic drinks?
C9511.1%0.2–5.7%Can you help me find non-alcoholic options near me?
C8911.1%0.2–6.1%Ou trouver une biere sans alcool a Paris?
C9500.0%0.0–3.9%Can you help me find places that have alcohol-free beer?
C9700.0%0.0–3.8%Find me a restaurant with good NA beer options
C9600.0%0.0–3.8%I'm looking for a non-alcoholic beer spot in my neighborhood
C9500.0%0.0–3.9%I'm traveling to London, where can I find non-alcoholic beer?
C8900.0%0.0–4.1%Non-alcoholic beer near me
C9700.0%0.0–3.8%What bars serve non-alcoholic beer?
C8500.0%0.0–4.3%Where can I find non-alcoholic beer in London?
C19000.0%0.0–2.0%Where can I find non-alcoholic beer in New York?
C9900.0%0.0–3.7%Where can I get NA beer in LA?

Every Class A and Class B prompt has a lower CI bound above the Class C noise floor. Every Class C prompt has an upper CI bound below 7%.

6.5 H3: citation attribution

Among the 151 surfaced responses, comparing cited-our-content (n=58) vs not-cited (n=93):

TierCited (n=58)Not-cited (n=93)Difference
T1 Clean ChatGPT-app15.5%12.9%+2.6 pp
T2 ChatGPT-dominant39.7%10.8%+28.9 pp
T1+T2 combined55.2%23.7%+31.5 pp
T4 Mobile-app (wrong)5.2%43.0%-37.8 pp
T5 Denies0.0%1.1%-1.1 pp

Cell-level Pearson correlation between citation rate and surfacing rate across 54 prompt × provider cells: r = 0.332, p = 0.014. H3 partially supported.

Our content in the retrieval graph

Our content became the source, week by week

Share of responses that cited non-alcoholic-beer.com, by user situation. All three started at zero.

0204060Mar 20Mar 23Mar 30Apr 6Apr 13Apr 2067%Class A (app-name-direct)43%Class B (platform-anchored)19%Class C (generic category)

6.6 H4: provider variation in actionable install rate

Actionable install = App-tab reference AND @invoke syntax both present in a surfaced response.

ProviderSurfacedActionableRate95% CI
ChatGPT3937.7%2.7 – 20.3%
Perplexity5647.1%2.8 – 17.0%
Google AI Overview561017.9%10.0 – 29.8%

Chi-squared, df = 2, χ² = 3.89, p = 0.14. H4 not supported at α = 0.05. Point estimates favor Google AI Overview but confidence intervals overlap.

6.7 H5: temporal compounding

ProviderEarly actionableLate actionableFisher one-sided p
ChatGPT0/16 (0.0%)3/23 (13.0%)0.19
Perplexity1/29 (3.4%)3/27 (11.1%)0.28
Google AI Overview0/14 (0.0%)10/42 (23.8%)0.041

H5 partially supported. Point estimates rise on all three providers. Reaches significance only on Google AI Overview.

The shift, week by week

From 'this app doesn't exist' to 'here's how to install it'

Share of app-mention responses that describe the app correctly vs incorrectly, pooled across three AI engines. The lines cross in week 15.

0204060Mar 20Mar 23Mar 30Apr 6Apr 13Apr 2054%Correct15%Wrong
74 articles published on a fresh domain. No ads, no PR, no outreach.

6.8 H6 (exploratory): the platform-description shift

At baseline, zero of 22 surfaced responses described the subject as a ChatGPT integration. One ChatGPT response (March 20, on prompt "NA Drink Finder app ChatGPT") stated outright:

While there isn't a ChatGPT plugin specifically branded 'NA Drink Finder', you can use AI creatively for drink discovery...

ChatGPT baseline response, March 20, 2026

By endpoint, correct-platform description reached 45.7% and the denial was gone. Same kind of query, April 19:

There's a tool called NA Drink Finder you can install inside ChatGPT. You install it from the Apps tab, then type @NA Drink Finder non-alcoholic beer near me. It returns real venues near you that carry NA beer.

ChatGPT endpoint response, citing our content

On Class B prompts specifically (where the name is not in the prompt, so the model must retrieve the concept rather than echo it):

PeriodSurfacedT1 CleanT1 rate
Baseline500.0%
Middle461430.4%
Endpoint14428.6%

The Class B cell contains the cleanest causal claim in the dataset. Baseline T1 was zero; three weeks later it was 30%.

6.9 The @invoke syntax as a content-to-output transfer signal

Zero of the 10 baseline app-mention responses contained @NA Drink Finder or any variant. By endpoint, the syntax appeared in up to 50% of ChatGPT Class A responses that cited our content and in 8% of pooled responses across all providers. The syntax was seeded in identical form across 74 articles; its post-intervention appearance is interpretable as direct content-to-output transfer.

6.10 Sibling-product disambiguation

SubsetSibling-reference rate
All responses59%
Responses citing our content69%
Responses not citing our content88%

Sibling-reference rate drops 19 pp when our content is cited (Fisher's exact p = 0.005). Consistent with our content's explicit disambiguation language reducing cross-contamination.

07

Discussion

7.1 What we proved

  1. Anchored prompts surface the subject at rates distinguishable from unanchored prompts. Class A+B: 21.8% (18.9 – 25.1%). Class C: 0.2% (0.0 – 0.6%). p ≈ 6 × 10⁻⁶⁵.
  2. The models' description of the subject shifted from "mobile app or nonexistent" to "ChatGPT integration" over the observation window. Correct-platform rate 4.5% → 45.7%; wrong-platform rate 40.9% → 17.1%. Monotonic across three independent providers.
  3. Citation of our content correlates with correct platform description: +31.5 pp on T1+T2, -37.8 pp on T4.
  4. The @invoke syntax appeared in model output after the intervention, where zero baseline responses contained it.

7.2 What we did not prove

  1. Direct causation for Class A surfacing. The app's name is in the prompt, so baseline surfacing could be driven by the prompt itself.
  2. Effect on Google AI Overview citation. GAIO's citation of our domain stayed at 0%. Any shift in its descriptions is indirect.
  3. Effect on generic-category (Class C) surfacing. Class C response templates are venue and product lists, not app recommendations. Surfacing rate at endpoint was 0.5%, within baseline CI.
  4. Full-actionable-install production at majority rates. The T1 rate never exceeded 18% in any period.

7.3 Proposed mechanism

  1. Content is published on a fresh domain.
  2. Bing indexes the content (load-bearing for ChatGPT Search and, in part, Perplexity).
  3. Providers' retrieval layers begin to rank the content on branded and platform-anchored queries within 1 to 3 weeks.
  4. When the model's generation is conditioned on our content, it inherits (a) the ChatGPT-integration framing, (b) the App-tab install language, (c) the @invoke syntax, (d) reduced sibling reference.
  5. When not conditioned on our content, it falls back to pre-training signals, which are dominated by the sibling mobile product.

This mechanism explains the ChatGPT and Perplexity results cleanly. Google AI Overview never cited our content directly; any shift there is likely routed through third-party sites that Google indexes, and is not part of the causal claim.

7.4 Response-template conditioning

Answer engines use different response templates for different query types. Class C prompts produce venue and product lists; Class A and B prompts produce app descriptions with install guidance. These are structurally different generation tasks.

Our content operated at both levels. Direct: on Class A+B retrievals, the response template is app-aware and our install-language tokens flow directly into the output. Indirect: on Class C retrievals, the genre is lists of options. Our content was cited (0% → 23.5% across the window), but the genre does not produce app recommendations. The indirect effect is authority building, which plausibly feeds the retrieval layer for downstream Class A+B queries but is not quantified here.

Implication for distribution-app brands (a carrier with a quote app, a retailer with a shopping app, a financial platform with a product app): the relevant funnel is Class C → Class A/B → app. Class C queries cite the brand as an entity alongside competitors; Class A/B queries switch the response template to app recommendation. Both are required. This paper validates step 2.

08

Limitations

  1. One subject, one category (non-alcoholic beer). Class-level findings need replication.
  2. Three providers only. Claude and Gemini native app recommendation not tested.
  3. Observation window of 33 days. The experiment is still compounding.
  4. No true pre-publication baseline. First days of monitoring are close to content launch.
  5. Tier scoring is regex-based. Manual review would improve precision; direction of findings unlikely to change.
  6. Developer's existing product site is a residual confound.
  7. Sibling-product interference is unusually severe for this subject because of shared developer and similar naming.
  8. App discovery is not brand citation. This experiment tests how an engine describes and recommends a ChatGPT app. It does not test how an engine cites a brand in a category comparison. The response templates differ structurally: app recommendation produces a procedural answer; brand citation produces a ranked or comparative list. Findings do not inherit from this study by extrapolation.
09

Implications

For a team deploying an app on ChatGPT or an adjacent answer engine:

  1. Published content can move the retrieval graph.An independent developer with no prior marketing footprint can shift ChatGPT's description of their app from "mobile app or nonexistent" to "ChatGPT integration" in roughly four weeks. Perplexity behaves similarly and cited our content at an even higher rate (27%).
  2. Invest first in Situation 1 and 2 prompts. The effect is statistically distinguishable from null on every anchored prompt and null on every generic-category prompt at current content volume.
  3. Write install instructions in a canonical, reproducible block.Exact app name, exact install location ("ChatGPT Apps tab", not "App Store"), exact invocation syntax (@YourAppName), in one contiguous paragraph near the top of the article.
  4. Disambiguate your app from similarly named products. 88% of uncited surfaced responses reference the sibling product. Aggressive explicit disambiguation reduces cross-contamination.
  5. Expect a 2 to 4 week lag before effects stabilize. Bing indexing has a propagation delay that nothing you do accelerates meaningfully.
  6. Track tier distribution, not just surfacing rate. A response that mentions the app but describes it as a mobile app can actively mislead the user.
10

Future work

  • In progress. Outbound links from each experimental article to the ChatGPT app directory page, to isolate the source-reader path (human → install) from the model-reader path.
  • Replication on a second subject in a different category.
  • Extend prompt tracking to Claude native app recommendation and Gemini.
  • Second content cohort with refined indexable-install blocks and explicit disambiguation, testing whether the T1 rate can be lifted above the current 11 to 18% ceiling.
Appendix A

Scoring signals

SignalDefinition
Surfacing"NA Drink Finder" (case-insensitive) in response text
App-tabApp-tab / Apps-section / ChatGPT-Apps phrases within ±300 chars of a surfacing
@invoke@NA Drink Finder within ±300 chars of a surfacing
Mobile-appApp Store / Google Play / Play Store / mobile app / APK / iTunes in the same window
Denial"there isn't a ChatGPT…", "no ChatGPT…plugin", "often listed as"
Sibling"NA Beer Finder" anywhere in response text
Content citationnon-alcoholic-beer.com in the cited sources list
Appendix B

Data artifacts

  • Full CSV: 1,881 firings with prompt, provider, timestamp, status, response text, sources.
  • Response-level review: 151 surfaced responses, chronologically ordered, tier-scored.
  • Per-response scoring dump.
  • Chart source datasets.

Available on request. Email research@waniwani.ai.

Appendix C

Full prompt list

The complete list of 19 test prompts in taxonomic order, with pooled results across ChatGPT, Perplexity, and Google AI Overview, is tabulated in Section 6.4 above.

Appendix D

Complete per-prompt and per-provider results

D.1 Provider-level totals

ProviderFiringsSurfacedSurface rateOur domain citedCited rate
ChatGPT601396.5%8013.3%
Perplexity613569.1%16827.4%
Google AI Overview596569.4%00.0%

D.2 Per-prompt × provider results (exhaustive)

ClassPromptProvFiringsSurfSurf%CitedCited%
ANA Drink Finder app ChatGPTCG322681.2%825.0%
ANA Drink Finder app ChatGPTPPX3232100.0%2681.2%
ANA Drink Finder app ChatGPTGAI332678.8%00.0%
BIs there a non-alcoholic beer finder on ChatGPT?CG33721.2%824.2%
BIs there a non-alcoholic beer finder on ChatGPT?PPX33824.2%2987.9%
BIs there a non-alcoholic beer finder on ChatGPT?GAI331236.4%00.0%
BCan ChatGPT help me find places that serve NA beer?CG3100.0%929.0%
BCan ChatGPT help me find places that serve NA beer?PPX3239.4%2887.5%
BCan ChatGPT help me find places that serve NA beer?GAI32721.9%00.0%
BFind places that serve non-alcoholic beer on ChatGPTCG3313.0%824.2%
BFind places that serve non-alcoholic beer on ChatGPTPPX3226.2%2784.4%
BFind places that serve non-alcoholic beer on ChatGPTGAI33618.2%00.0%
BWhere can I find NA beer near me using AICG3300.0%00.0%
BWhere can I find NA beer near me using AIPPX33618.2%515.2%
BWhere can I find NA beer near me using AIGAI3300.0%00.0%
BDo you have an app that helps me find NA beer?CG33515.2%824.2%
BDo you have an app that helps me find NA beer?PPX3239.4%13.1%
BDo you have an app that helps me find NA beer?GAI3313.0%00.0%
BCan ChatGPT help me find non-alcoholic drinks?CG3300.0%00.0%
BCan ChatGPT help me find non-alcoholic drinks?PPX3200.0%00.0%
BCan ChatGPT help me find non-alcoholic drinks?GAI32412.5%00.0%
COu trouver une biere sans alcool a Paris?CG3100.0%26.5%
COu trouver une biere sans alcool a Paris?PPX3313.0%2060.6%
COu trouver une biere sans alcool a Paris?GAI2500.0%00.0%
CWhere can I find non-alcoholic beer in New York?CG6100.0%11.6%
CWhere can I find non-alcoholic beer in New York?PPX6400.0%3046.9%
CWhere can I find non-alcoholic beer in New York?GAI6500.0%00.0%
CWhat bars serve non-alcoholic beer?CG3200.0%928.1%
CWhat bars serve non-alcoholic beer?PPX3200.0%00.0%
CWhat bars serve non-alcoholic beer?GAI3300.0%00.0%
CFind me a restaurant with good NA beer optionsCG3200.0%825.0%
CFind me a restaurant with good NA beer optionsPPX3200.0%00.0%
CFind me a restaurant with good NA beer optionsGAI3300.0%00.0%
CWhere can I get NA beer in LA?CG3300.0%618.2%
CWhere can I get NA beer in LA?PPX3300.0%00.0%
CWhere can I get NA beer in LA?GAI3300.0%00.0%
COther Class C prompts (8 prompts × 3 providers)~76020.3%101.3%

Remaining Class C rows abbreviated for readability. Full dataset available on request.

D.3 Tier distribution by prompt × provider (surfaced responses only)

ClassPromptProvSurfT1T2T3T4T5
ANA Drink Finder app ChatGPTCG260%42%0%58%0%
ANA Drink Finder app ChatGPTPPX326%34%34%25%0%
ANA Drink Finder app ChatGPTGAI260%8%62%27%4%
BIs there a NA beer finder on ChatGPT?CG729%71%0%0%0%
BIs there a NA beer finder on ChatGPT?PPX812%38%38%12%0%
BIs there a NA beer finder on ChatGPT?GAI1242%17%25%17%0%
BCan ChatGPT help me find places…PPX367%0%33%0%0%
BCan ChatGPT help me find places…GAI771%14%0%14%0%
BFind places that serve NA beer on ChatGPTCG1100%0%0%0%0%
BFind places that serve NA beer on ChatGPTPPX250%0%50%0%0%
BFind places that serve NA beer on ChatGPTGAI650%33%17%0%0%
BWhere can I find NA beer near me using AIPPX60%0%83%17%0%
BDo you have an app that helps me find NA beer?CG50%60%0%40%0%
BDo you have an app that helps me find NA beer?PPX30%0%0%100%0%
BDo you have an app that helps me find NA beer?GAI10%0%0%100%0%
BCan ChatGPT help me find non-alcoholic drinks?GAI475%0%25%0%0%

D.4 Tier distribution pooled by provider

ProviderSurfacedT1 CleanT2 DominantT3 AmbigT4 MobileT5 Denies
ChatGPT397.7% (3)48.7% (19)0.0% (0)43.6% (17)0.0% (0)
Perplexity5610.7% (6)25.0% (14)41.1% (23)23.2% (13)0.0% (0)
Google AI Overview5628.6% (16)12.5% (7)37.5% (21)19.6% (11)1.8% (1)

D.5 Weekly dynamics per provider

ChatGPT

WeekFiringsSurfacedSurf%CitedCited%
W125747.0%00.0%
W1313275.3%10.8%
W1413375.3%21.5%
W1511943.4%54.2%
W16124108.1%5746.0%
W1736719.4%1541.7%

Perplexity

WeekFiringsSurfacedSurf%CitedCited%
W125735.3%00.0%
W13133139.8%2720.3%
W141331712.8%3324.8%
W15119108.4%4134.5%
W1613396.8%5138.3%
W1738410.5%1642.1%

Google AI Overview

WeekFiringsSurfacedSurf%CitedCited%
W125735.3%00.0%
W1313353.8%00.0%
W1413286.1%00.0%
W151323022.7%00.0%
W1611087.3%00.0%
W173226.2%00.0%
References
  1. OpenAI Developers. Optimize Metadata, Apps SDK documentation. developers.openai.com/apps-sdk/guides/optimize-metadata. Referenced for ChatGPT connector selection mechanism and the direct / indirect / negative prompt framework used to classify Situations 1 to 3.
  2. Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212. Used for the 95% confidence intervals on proportions throughout Section 6.
  3. Fisher, R. A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd. Fisher's exact test used for H1, H2, and H5 hypothesis tests.
← All research
Instrumentation: waniwani.ai