Influencing app recommendations in conversational answer engines through published content alone
A 33-day controlled experiment measuring whether content on a fresh, unauthoritative domain can, absent any other marketing input, shift how ChatGPT describes and recommends a ChatGPT app. 74 articles. 1,881 prompt firings across ChatGPT, Perplexity, and Google AI Overview. One clean before-and-after.
We tested whether content published on a fresh, unauthoritative domain can, absent any other marketing input, influence how three independent conversational answer engines (ChatGPT, Perplexity, Google AI Overview) describe and recommend a ChatGPT app. We selected a low-authority, independent-developer subject (NA Drink Finder, built by the operator of beerfordriving.com), published 74 articles on a fresh domain, and tracked 1,881 prompt firings across 19 prompts and 3 providers for 33 days. We pre-registered a prompt taxonomy (Class A: literal app name; Class B: platform-anchored; Class C: generic category) and a five-tier qualitative scoring of how each response describes the app.
Primary finding.At baseline, 0 of 22 app-mention responses described NA Drink Finder as a ChatGPT integration; 45% described it as a mobile app or denied the ChatGPT version existed. By endpoint, 46% of app-mention responses described it correctly; 17% still described it as a mobile app; 0% denied its existence. The shift is monotonic and statistically distinguishable from baseline on platform-anchored queries where the subject's name does not appear in the prompt itself (0% → 30% clean ChatGPT-app description rate).
Mechanism. Responses that cite our content describe the app correctly 2.3× more often and misdescribe it as a mobile app 8.3× less often than responses that do not. The @NA Drink Finder invocation syntax, absent from every baseline response, appears in up to 50% of endpoint responses.
Contribution. Evidence that a published content corpus can shift the retrieval graph of ChatGPT on branded and platform-anchored prompts, with effect sizes large enough to change the user-visible description of the product. On Perplexity, our content achieved even higher citation rates (27% of responses). Effects on generic-category prompts were null at the direct-surfacing level but positive at the citation-authority level.
Motivation
Since the launch of the ChatGPT Apps directory in late 2025, app developers have faced a recurring question: what, if anything, can be done to increase the probability that a conversational answer engine recommends a given app inside a user's conversation? The recommendation layer is partially opaque, there is no official playbook, and the most commonly stated hypothesis (that published SEO-style content can move the dial) has not been cleanly tested.
Testing the hypothesis on an established brand introduces confounding signals that cannot be cleanly separated from the effect of new content: existing domain authority, backlinks, press coverage, and organic brand mentions. Waniwani does run analogous content programs on client surfaces, but those results are governed by client confidentiality and cannot be published. The experiment reported here is the one we can share publicly: run on a ChatGPT app built by an independent developer, on a content domain we controlled end to end, with no other marketing input.
How answer engines handle things they don't already know
The recommendation mechanics inside ChatGPT, Perplexity, and Google AI Overview are partially opaque. What is publicly understood, however, is the retrieval-augmented architecture that governs responses to queries whose answers are not reliably present in the model's pre-training data.
When a user asks about something time-sensitive, niche, or post-training-cutoff, the engine issues a web lookup and conditions its response on what it retrieves. ChatGPT Search retrieves primarily from Bing. Perplexity runs its own retrieval layer. Google AI Overview retrieves from Google's index. In each case, the documents that rank for the query at retrieval time shape the text the model produces.
ChatGPT apps fall squarely inside this regime. A third-party ChatGPT app is not part of any model's pre-training corpus, the Apps directory is post-cutoff for most model versions, and any individual app is a sparse fact in the model's parametric memory. When a user's query contains a hook the retrieval layer can match to the app, the text the model produces is shaped by what retrieval returns.
Two layers are worth distinguishing. OpenAI's Apps SDK documentation describes a metadata-based mechanism for invoking an already-installedconnector — the model selecting when to call the connector based on the app's name, description, parameter documentation, and hint annotations.1That is the in-conversation tool-invocation layer. Our experiment did not test it: we did not modify the app's connector metadata, we do not own the app, and we make no observations about whether or how that mechanism fired.
The layer our experiment addresses is upstream of installation: whether and how the model describes and recommends the app to a user who has not yet installed it. That layer runs on web retrieval over external documents. Our intervention acted only on this path — we published external content, did not touch the connector metadata, and measured the changes in the model's descriptions over time. All findings reported here are scoped to this retrieval path.
This exposes a specific, testable mechanism: if we publish documents that describe a ChatGPT app in a structured, unambiguous way, and those documents are indexed and ranked by the retrieval layer, then the answer engine's responses should converge toward the content of those documents over time. That is the mechanism the experiment tests.
The three user journeys our content is trying to shape
The experiment measures model behavior. What we actually want to influence is user behavior: the sequence that starts with a user asking an answer engine a question and ends with that user installing and using the app. Three distinct user journeys map to the three prompt classes we monitored, and the structure generalizes to any app deployed on a conversational answer engine, provided you know your user well enough to recognize which state they are in.
Three prompt classes, three very different outcomes
Share of responses in which the app was surfaced, by the linguistic anchor of the prompt. Same content corpus, same 33 days.
Journey A — the user already knows your app.They have heard about it somewhere and come to the engine to find or confirm it. The query contains the app's name or a close variant. In our experiment: "NA Drink Finder app ChatGPT."
Journey B — the user has a precise, use-case-shaped need. The question is narrow enough that only a small set of apps could plausibly answer it. The anchor does not need to be platform-shaped; it just has to describe the need with enough specificity that only a narrow set of products could match. In our experiment, six tracked prompts fell in this class, including "is there a non-alcoholic beer finder on ChatGPT?" and "find places that serve non-alcoholic beer on ChatGPT." For a different vertical this shape could look like a user asking mid-conversation "can I get a quote here?" after anchoring on a specific carrier — same prompt shape, different category, not a prompt we tested. This is the journey a content strategy can most directly catalyze.
Journey C — the user has the raw problem, no narrowing. Generic category questions whose response template is a list of options, not a tool recommendation. Journey C is not where apps get recommended in the first turn; it is where brands get cited alongside competitors, which feeds future Journey A and B queries.
The structure is portable. To apply it to any app on an answer engine, you need a working user model: who is the user, what is their state of awareness when they open the engine, and what specifically do they type in each state. The journeys are not defined by platform vocabulary. They are defined by anchor specificity.
Research questions and pre-registered hypotheses
The primary research question: can published content alone, absent any other marketing input, influence whether and how conversational answer engines recommend a ChatGPT app inside a response?
Five hypotheses were pre-registered before analysis:
- H1 (surfacing). Prompts containing an app, brand, or platform anchor will produce higher app-surfacing rates than prompts without such an anchor.
- H2 (anchor strength). Prompts containing the literal app name will produce higher surfacing rates than prompts with weaker anchors.
- H3 (citation causation). Content-citation rate for the experimental corpus will correlate positively with app-surfacing rate.
- H4 (provider variation). Actionable install-instruction rate among surfaced responses will differ by provider.
- H5 (temporal compounding). Actionable install-instruction rate and correct-platform description rate will rise over the observation window.
A sixth hypothesis was formulated after baseline responses were reviewed and is reported as exploratory:
- H6 (platform-description shift, exploratory). At baseline, the models describe the subject as a mobile app or deny its existence as a ChatGPT app. Over the observation window, this description will shift toward correct-platform framing.
Methodology
5.1 Subject selection
The subject was NA Drink Finder, a ChatGPT app in the Apps directory that helps users locate venues serving non-alcoholic beer. Built by the independent operator of beerfordriving.com, a venue-discovery platform for non-alcoholic drinks. Baseline conditions:
| Input | Status at study start |
|---|---|
| Developer | Independent niche operator (beerfordriving.com) |
| Product site | Exists, standard venue-discovery SEO content, no references to the ChatGPT app |
| Press coverage | None identified |
| Backlinks pointing at the ChatGPT app | None identified |
| Branded social presence | Minimal |
| Prior GEO-directed content | None |
| Sibling product | "NA Beer Finder", a native iOS/Android app from the same developer |
The sibling product is a confound: the models' pre-existing knowledge of "NA Beer Finder" (a mobile app) can bleed into responses about "NA Drink Finder" (the ChatGPT app) through name similarity and shared developer.
5.2 Prompt taxonomy
19 test prompts were classified into three classes by linguistic anchor, before outcomes were tallied: Class A (literal app name; 1 prompt), Class B (platform-anchored without naming the app; 6 prompts), Class C (generic category, no app/brand/platform reference; 12 prompts).
OpenAI's Apps SDK metadata-optimization guidance recommends assembling a "golden prompt set" across three types: direct prompts("users explicitly name your product or data source"), indirect prompts("users describe the outcome they want without naming your tool"), and negative prompts("cases where built-in tools or other connectors should handle the request").1We borrow that three-way classification as a framework for our prompt list, though our experiment tests a different mechanism than the one OpenAI's guidance is written for (external-retrieval recommendation, not installed-connector invocation — see Section 2).
Under OpenAI's framework, our Class A corresponds to direct prompts (the user names the app). Our Class B and Class C would both fall within indirect prompts — in each, the user describes an outcome without naming the app. We subdivide indirect prompts based on whether the prompt still contains a platform, tool, or category anchor ("on ChatGPT", "app that helps", "beer finder"): Class B has one; Class C does not. This subdivision is our empirical contribution. It turned out to be load-bearing in our results — the two subtypes produce an order-of-magnitude difference in surfacing rates (11.1% vs 0.2%), even though OpenAI's framework treats both as a single category.
OpenAI's third type, negative prompts, is not tested here.
5.3 Intervention: the content corpus
74 articles on non-alcoholic-beer.com, a fresh domain with no external authority, no inbound backlinks, no paid distribution, no social amplification. Five clusters:
| Cluster | Articles | Target prompt shape |
|---|---|---|
| Foundation / evergreen | 10 | "best NA beer 2026", health, calories, athletes, Dry January |
| Geo-targeted cities | 19 | "best NA beer in [city]" across 19 cities |
| AI / app-discovery | ~22 | "NA Drink Finder app", "is there a beer finder on ChatGPT" |
| Occasion / lifestyle | ~16 | date night, designated drivers, parties, pregnancy |
| Category / science | ~4 | stouts, NA vs kombucha, science of NA taste |
Common template: direct answer in the first 40 to 60 words, H2/H3 heading hierarchy, JSON-LD structured data (SoftwareApplication, FAQPage, HowTo), and the canonical install block (below) in every article.
“Install NA Drink Finder from the ChatGPT Apps tab. Then type @NA Drink Finder [your query] in any conversation.”
5.4 Technical stack
/llms.txtcurated index for LLM crawlers/robots.txtallowlisting GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot- Sitemap submitted to Bing Webmaster Tools
- No paid distribution, manual outreach, or backlink building
5.5 Instrumentation and observation window
Tracking ran on the waniwani.ai platform. Each of the 19 prompts was fired daily against ChatGPT, Perplexity, and Google AI Overview.
| Parameter | Value |
|---|---|
| Study window | 2026-03-20 to 2026-04-21 (33 days) |
| Unique prompts | 19 |
| Providers | 3 (ChatGPT, Perplexity, Google AI Overview) |
| Total firings | 1,881 |
| Successful firings | 1,810 (96.2%) |
| Mean firings per prompt and provider | ~33 |
5.6 Dependent variables and qualitative scoring
Five quantitative dependent variables per response: surfacing (literal app name in response), App-tab reference, @invoke syntax, content citation of non-alcoholic-beer.com, and sibling-product reference. Plus a five-tier qualitative scoring applied to the ±400-character window around every surfacing:
| Tier | Definition |
|---|---|
| T1 Clean ChatGPT-app | Unambiguously describes NA Drink Finder as a ChatGPT integration, with App-tab or @invoke framing, no mobile-app signals |
| T2 ChatGPT-dominant | Primarily describes the ChatGPT integration; may correctly mention the mobile app separately |
| T3 Ambiguous | Name-drop without clear platform framing |
| T4 Mobile-app | Describes NA Drink Finder as a mobile app (wrong platform) |
| T5 Denies ChatGPT app | Explicitly states the ChatGPT version does not exist |
Results
6.1 Core finding: description of the subject shifted monotonically
The clearest way to see the result is in the distribution of how surfaced responses described the app at baseline vs endpoint.
In 33 days, three AI engines went from denying the app existed to giving users the correct install steps
Share of app-mention responses, week 1 baseline vs week 5 endpoint. Pooled across ChatGPT, Perplexity, and Google AI Overview.
A 10x shift in how the model frames the product.
Wrong-platform descriptions more than halved.
The outright denial disappeared after week 1.
| Tier | Baseline (n=22) | Middle (n=94) | Endpoint (n=35) |
|---|---|---|---|
| T1 Clean ChatGPT-app | 0.0% | 18.1% | 11.4% |
| T2 ChatGPT-dominant | 4.5% | 21.3% | 34.3% |
| T1+T2 combined | 4.5% | 39.4% | 45.7% |
| T3 Ambiguous | 50.0% | 30.9% | 37.1% |
| T4 Mobile-app (wrong) | 40.9% | 29.8% | 17.1% |
| T5 Denies ChatGPT app | 4.5% | 0.0% | 0.0% |
Correct-platform description grew 10× in relative terms. Wrong-platform description fell by 58%. Explicit denial disappeared after the first week.
6.2 H1: surfacing by prompt class
| Class | Firings | Surfaced | Rate | 95% CI |
|---|---|---|---|---|
| A. App-name-direct | 97 | 84 | 86.6% | 78.4 – 92.0% |
| B. Platform-anchored | 586 | 65 | 11.1% | 8.8 – 13.9% |
| C. Generic category | 1,127 | 2 | 0.2% | 0.0 – 0.6% |
Class A+B pooled (149/683 = 21.8%) vs Class C (2/1,127 = 0.2%): Fisher's exact, one-sided, p ≈ 6 × 10⁻⁶⁵. H1 supported.
6.3 H2: anchor strength
Class A surfacing 86.6%; Class B surfacing 11.1%. Fisher's exact, p ≈ 8 × 10⁻⁵². The literal app name is roughly an 8× stronger anchor than any other reference tested. H2 supported.
6.4 Per-prompt results with 95% confidence intervals
| Class | n | Surfaced | Rate | 95% CI | Prompt |
|---|---|---|---|---|---|
| A | 97 | 84 | 86.6% | 78.4–92.0% | NA Drink Finder app ChatGPT |
| B | 99 | 27 | 27.3% | 19.5–36.8% | Is there a non-alcoholic beer finder on ChatGPT? |
| B | 95 | 10 | 10.5% | 5.8–18.3% | Can ChatGPT help me find places that serve non-alcoholic beer? |
| B | 98 | 9 | 9.2% | 4.9–16.5% | Find places that serve non-alcoholic beer on ChatGPT |
| B | 99 | 6 | 6.1% | 2.8–12.6% | Where can I find NA beer near me using AI |
| B | 98 | 9 | 9.2% | 4.9–16.5% | Do you have an app that helps me find NA beer? |
| B | 97 | 4 | 4.1% | 1.6–10.1% | Can ChatGPT help me find non-alcoholic drinks? |
| C | 95 | 1 | 1.1% | 0.2–5.7% | Can you help me find non-alcoholic options near me? |
| C | 89 | 1 | 1.1% | 0.2–6.1% | Ou trouver une biere sans alcool a Paris? |
| C | 95 | 0 | 0.0% | 0.0–3.9% | Can you help me find places that have alcohol-free beer? |
| C | 97 | 0 | 0.0% | 0.0–3.8% | Find me a restaurant with good NA beer options |
| C | 96 | 0 | 0.0% | 0.0–3.8% | I'm looking for a non-alcoholic beer spot in my neighborhood |
| C | 95 | 0 | 0.0% | 0.0–3.9% | I'm traveling to London, where can I find non-alcoholic beer? |
| C | 89 | 0 | 0.0% | 0.0–4.1% | Non-alcoholic beer near me |
| C | 97 | 0 | 0.0% | 0.0–3.8% | What bars serve non-alcoholic beer? |
| C | 85 | 0 | 0.0% | 0.0–4.3% | Where can I find non-alcoholic beer in London? |
| C | 190 | 0 | 0.0% | 0.0–2.0% | Where can I find non-alcoholic beer in New York? |
| C | 99 | 0 | 0.0% | 0.0–3.7% | Where can I get NA beer in LA? |
Every Class A and Class B prompt has a lower CI bound above the Class C noise floor. Every Class C prompt has an upper CI bound below 7%.
6.5 H3: citation attribution
Among the 151 surfaced responses, comparing cited-our-content (n=58) vs not-cited (n=93):
| Tier | Cited (n=58) | Not-cited (n=93) | Difference |
|---|---|---|---|
| T1 Clean ChatGPT-app | 15.5% | 12.9% | +2.6 pp |
| T2 ChatGPT-dominant | 39.7% | 10.8% | +28.9 pp |
| T1+T2 combined | 55.2% | 23.7% | +31.5 pp |
| T4 Mobile-app (wrong) | 5.2% | 43.0% | -37.8 pp |
| T5 Denies | 0.0% | 1.1% | -1.1 pp |
Cell-level Pearson correlation between citation rate and surfacing rate across 54 prompt × provider cells: r = 0.332, p = 0.014. H3 partially supported.
Our content became the source, week by week
Share of responses that cited non-alcoholic-beer.com, by user situation. All three started at zero.
6.6 H4: provider variation in actionable install rate
Actionable install = App-tab reference AND @invoke syntax both present in a surfaced response.
| Provider | Surfaced | Actionable | Rate | 95% CI |
|---|---|---|---|---|
| ChatGPT | 39 | 3 | 7.7% | 2.7 – 20.3% |
| Perplexity | 56 | 4 | 7.1% | 2.8 – 17.0% |
| Google AI Overview | 56 | 10 | 17.9% | 10.0 – 29.8% |
Chi-squared, df = 2, χ² = 3.89, p = 0.14. H4 not supported at α = 0.05. Point estimates favor Google AI Overview but confidence intervals overlap.
6.7 H5: temporal compounding
| Provider | Early actionable | Late actionable | Fisher one-sided p |
|---|---|---|---|
| ChatGPT | 0/16 (0.0%) | 3/23 (13.0%) | 0.19 |
| Perplexity | 1/29 (3.4%) | 3/27 (11.1%) | 0.28 |
| Google AI Overview | 0/14 (0.0%) | 10/42 (23.8%) | 0.041 |
H5 partially supported. Point estimates rise on all three providers. Reaches significance only on Google AI Overview.
From 'this app doesn't exist' to 'here's how to install it'
Share of app-mention responses that describe the app correctly vs incorrectly, pooled across three AI engines. The lines cross in week 15.
6.8 H6 (exploratory): the platform-description shift
At baseline, zero of 22 surfaced responses described the subject as a ChatGPT integration. One ChatGPT response (March 20, on prompt "NA Drink Finder app ChatGPT") stated outright:
“While there isn't a ChatGPT plugin specifically branded 'NA Drink Finder', you can use AI creatively for drink discovery...”
By endpoint, correct-platform description reached 45.7% and the denial was gone. Same kind of query, April 19:
“There's a tool called NA Drink Finder you can install inside ChatGPT. You install it from the Apps tab, then type @NA Drink Finder non-alcoholic beer near me. It returns real venues near you that carry NA beer.”
On Class B prompts specifically (where the name is not in the prompt, so the model must retrieve the concept rather than echo it):
| Period | Surfaced | T1 Clean | T1 rate |
|---|---|---|---|
| Baseline | 5 | 0 | 0.0% |
| Middle | 46 | 14 | 30.4% |
| Endpoint | 14 | 4 | 28.6% |
The Class B cell contains the cleanest causal claim in the dataset. Baseline T1 was zero; three weeks later it was 30%.
6.9 The @invoke syntax as a content-to-output transfer signal
Zero of the 10 baseline app-mention responses contained @NA Drink Finder or any variant. By endpoint, the syntax appeared in up to 50% of ChatGPT Class A responses that cited our content and in 8% of pooled responses across all providers. The syntax was seeded in identical form across 74 articles; its post-intervention appearance is interpretable as direct content-to-output transfer.
6.10 Sibling-product disambiguation
| Subset | Sibling-reference rate |
|---|---|
| All responses | 59% |
| Responses citing our content | 69% |
| Responses not citing our content | 88% |
Sibling-reference rate drops 19 pp when our content is cited (Fisher's exact p = 0.005). Consistent with our content's explicit disambiguation language reducing cross-contamination.
Discussion
7.1 What we proved
- Anchored prompts surface the subject at rates distinguishable from unanchored prompts. Class A+B: 21.8% (18.9 – 25.1%). Class C: 0.2% (0.0 – 0.6%). p ≈ 6 × 10⁻⁶⁵.
- The models' description of the subject shifted from "mobile app or nonexistent" to "ChatGPT integration" over the observation window. Correct-platform rate 4.5% → 45.7%; wrong-platform rate 40.9% → 17.1%. Monotonic across three independent providers.
- Citation of our content correlates with correct platform description: +31.5 pp on T1+T2, -37.8 pp on T4.
- The
@invokesyntax appeared in model output after the intervention, where zero baseline responses contained it.
7.2 What we did not prove
- Direct causation for Class A surfacing. The app's name is in the prompt, so baseline surfacing could be driven by the prompt itself.
- Effect on Google AI Overview citation. GAIO's citation of our domain stayed at 0%. Any shift in its descriptions is indirect.
- Effect on generic-category (Class C) surfacing. Class C response templates are venue and product lists, not app recommendations. Surfacing rate at endpoint was 0.5%, within baseline CI.
- Full-actionable-install production at majority rates. The T1 rate never exceeded 18% in any period.
7.3 Proposed mechanism
- Content is published on a fresh domain.
- Bing indexes the content (load-bearing for ChatGPT Search and, in part, Perplexity).
- Providers' retrieval layers begin to rank the content on branded and platform-anchored queries within 1 to 3 weeks.
- When the model's generation is conditioned on our content, it inherits (a) the ChatGPT-integration framing, (b) the App-tab install language, (c) the
@invokesyntax, (d) reduced sibling reference. - When not conditioned on our content, it falls back to pre-training signals, which are dominated by the sibling mobile product.
This mechanism explains the ChatGPT and Perplexity results cleanly. Google AI Overview never cited our content directly; any shift there is likely routed through third-party sites that Google indexes, and is not part of the causal claim.
7.4 Response-template conditioning
Answer engines use different response templates for different query types. Class C prompts produce venue and product lists; Class A and B prompts produce app descriptions with install guidance. These are structurally different generation tasks.
Our content operated at both levels. Direct: on Class A+B retrievals, the response template is app-aware and our install-language tokens flow directly into the output. Indirect: on Class C retrievals, the genre is lists of options. Our content was cited (0% → 23.5% across the window), but the genre does not produce app recommendations. The indirect effect is authority building, which plausibly feeds the retrieval layer for downstream Class A+B queries but is not quantified here.
Implication for distribution-app brands (a carrier with a quote app, a retailer with a shopping app, a financial platform with a product app): the relevant funnel is Class C → Class A/B → app. Class C queries cite the brand as an entity alongside competitors; Class A/B queries switch the response template to app recommendation. Both are required. This paper validates step 2.
Limitations
- One subject, one category (non-alcoholic beer). Class-level findings need replication.
- Three providers only. Claude and Gemini native app recommendation not tested.
- Observation window of 33 days. The experiment is still compounding.
- No true pre-publication baseline. First days of monitoring are close to content launch.
- Tier scoring is regex-based. Manual review would improve precision; direction of findings unlikely to change.
- Developer's existing product site is a residual confound.
- Sibling-product interference is unusually severe for this subject because of shared developer and similar naming.
- App discovery is not brand citation. This experiment tests how an engine describes and recommends a ChatGPT app. It does not test how an engine cites a brand in a category comparison. The response templates differ structurally: app recommendation produces a procedural answer; brand citation produces a ranked or comparative list. Findings do not inherit from this study by extrapolation.
Implications
For a team deploying an app on ChatGPT or an adjacent answer engine:
- Published content can move the retrieval graph.An independent developer with no prior marketing footprint can shift ChatGPT's description of their app from "mobile app or nonexistent" to "ChatGPT integration" in roughly four weeks. Perplexity behaves similarly and cited our content at an even higher rate (27%).
- Invest first in Situation 1 and 2 prompts. The effect is statistically distinguishable from null on every anchored prompt and null on every generic-category prompt at current content volume.
- Write install instructions in a canonical, reproducible block.Exact app name, exact install location ("ChatGPT Apps tab", not "App Store"), exact invocation syntax (
@YourAppName), in one contiguous paragraph near the top of the article. - Disambiguate your app from similarly named products. 88% of uncited surfaced responses reference the sibling product. Aggressive explicit disambiguation reduces cross-contamination.
- Expect a 2 to 4 week lag before effects stabilize. Bing indexing has a propagation delay that nothing you do accelerates meaningfully.
- Track tier distribution, not just surfacing rate. A response that mentions the app but describes it as a mobile app can actively mislead the user.
Future work
- In progress. Outbound links from each experimental article to the ChatGPT app directory page, to isolate the source-reader path (human → install) from the model-reader path.
- Replication on a second subject in a different category.
- Extend prompt tracking to Claude native app recommendation and Gemini.
- Second content cohort with refined indexable-install blocks and explicit disambiguation, testing whether the T1 rate can be lifted above the current 11 to 18% ceiling.
Scoring signals
| Signal | Definition |
|---|---|
| Surfacing | "NA Drink Finder" (case-insensitive) in response text |
| App-tab | App-tab / Apps-section / ChatGPT-Apps phrases within ±300 chars of a surfacing |
| @invoke | @NA Drink Finder within ±300 chars of a surfacing |
| Mobile-app | App Store / Google Play / Play Store / mobile app / APK / iTunes in the same window |
| Denial | "there isn't a ChatGPT…", "no ChatGPT…plugin", "often listed as" |
| Sibling | "NA Beer Finder" anywhere in response text |
| Content citation | non-alcoholic-beer.com in the cited sources list |
Data artifacts
- Full CSV: 1,881 firings with prompt, provider, timestamp, status, response text, sources.
- Response-level review: 151 surfaced responses, chronologically ordered, tier-scored.
- Per-response scoring dump.
- Chart source datasets.
Available on request. Email research@waniwani.ai.
Full prompt list
The complete list of 19 test prompts in taxonomic order, with pooled results across ChatGPT, Perplexity, and Google AI Overview, is tabulated in Section 6.4 above.
Complete per-prompt and per-provider results
D.1 Provider-level totals
| Provider | Firings | Surfaced | Surface rate | Our domain cited | Cited rate |
|---|---|---|---|---|---|
| ChatGPT | 601 | 39 | 6.5% | 80 | 13.3% |
| Perplexity | 613 | 56 | 9.1% | 168 | 27.4% |
| Google AI Overview | 596 | 56 | 9.4% | 0 | 0.0% |
D.2 Per-prompt × provider results (exhaustive)
| Class | Prompt | Prov | Firings | Surf | Surf% | Cited | Cited% |
|---|---|---|---|---|---|---|---|
| A | NA Drink Finder app ChatGPT | CG | 32 | 26 | 81.2% | 8 | 25.0% |
| A | NA Drink Finder app ChatGPT | PPX | 32 | 32 | 100.0% | 26 | 81.2% |
| A | NA Drink Finder app ChatGPT | GAI | 33 | 26 | 78.8% | 0 | 0.0% |
| B | Is there a non-alcoholic beer finder on ChatGPT? | CG | 33 | 7 | 21.2% | 8 | 24.2% |
| B | Is there a non-alcoholic beer finder on ChatGPT? | PPX | 33 | 8 | 24.2% | 29 | 87.9% |
| B | Is there a non-alcoholic beer finder on ChatGPT? | GAI | 33 | 12 | 36.4% | 0 | 0.0% |
| B | Can ChatGPT help me find places that serve NA beer? | CG | 31 | 0 | 0.0% | 9 | 29.0% |
| B | Can ChatGPT help me find places that serve NA beer? | PPX | 32 | 3 | 9.4% | 28 | 87.5% |
| B | Can ChatGPT help me find places that serve NA beer? | GAI | 32 | 7 | 21.9% | 0 | 0.0% |
| B | Find places that serve non-alcoholic beer on ChatGPT | CG | 33 | 1 | 3.0% | 8 | 24.2% |
| B | Find places that serve non-alcoholic beer on ChatGPT | PPX | 32 | 2 | 6.2% | 27 | 84.4% |
| B | Find places that serve non-alcoholic beer on ChatGPT | GAI | 33 | 6 | 18.2% | 0 | 0.0% |
| B | Where can I find NA beer near me using AI | CG | 33 | 0 | 0.0% | 0 | 0.0% |
| B | Where can I find NA beer near me using AI | PPX | 33 | 6 | 18.2% | 5 | 15.2% |
| B | Where can I find NA beer near me using AI | GAI | 33 | 0 | 0.0% | 0 | 0.0% |
| B | Do you have an app that helps me find NA beer? | CG | 33 | 5 | 15.2% | 8 | 24.2% |
| B | Do you have an app that helps me find NA beer? | PPX | 32 | 3 | 9.4% | 1 | 3.1% |
| B | Do you have an app that helps me find NA beer? | GAI | 33 | 1 | 3.0% | 0 | 0.0% |
| B | Can ChatGPT help me find non-alcoholic drinks? | CG | 33 | 0 | 0.0% | 0 | 0.0% |
| B | Can ChatGPT help me find non-alcoholic drinks? | PPX | 32 | 0 | 0.0% | 0 | 0.0% |
| B | Can ChatGPT help me find non-alcoholic drinks? | GAI | 32 | 4 | 12.5% | 0 | 0.0% |
| C | Ou trouver une biere sans alcool a Paris? | CG | 31 | 0 | 0.0% | 2 | 6.5% |
| C | Ou trouver une biere sans alcool a Paris? | PPX | 33 | 1 | 3.0% | 20 | 60.6% |
| C | Ou trouver une biere sans alcool a Paris? | GAI | 25 | 0 | 0.0% | 0 | 0.0% |
| C | Where can I find non-alcoholic beer in New York? | CG | 61 | 0 | 0.0% | 1 | 1.6% |
| C | Where can I find non-alcoholic beer in New York? | PPX | 64 | 0 | 0.0% | 30 | 46.9% |
| C | Where can I find non-alcoholic beer in New York? | GAI | 65 | 0 | 0.0% | 0 | 0.0% |
| C | What bars serve non-alcoholic beer? | CG | 32 | 0 | 0.0% | 9 | 28.1% |
| C | What bars serve non-alcoholic beer? | PPX | 32 | 0 | 0.0% | 0 | 0.0% |
| C | What bars serve non-alcoholic beer? | GAI | 33 | 0 | 0.0% | 0 | 0.0% |
| C | Find me a restaurant with good NA beer options | CG | 32 | 0 | 0.0% | 8 | 25.0% |
| C | Find me a restaurant with good NA beer options | PPX | 32 | 0 | 0.0% | 0 | 0.0% |
| C | Find me a restaurant with good NA beer options | GAI | 33 | 0 | 0.0% | 0 | 0.0% |
| C | Where can I get NA beer in LA? | CG | 33 | 0 | 0.0% | 6 | 18.2% |
| C | Where can I get NA beer in LA? | PPX | 33 | 0 | 0.0% | 0 | 0.0% |
| C | Where can I get NA beer in LA? | GAI | 33 | 0 | 0.0% | 0 | 0.0% |
| C | Other Class C prompts (8 prompts × 3 providers) | — | ~760 | 2 | 0.3% | 10 | 1.3% |
Remaining Class C rows abbreviated for readability. Full dataset available on request.
D.3 Tier distribution by prompt × provider (surfaced responses only)
| Class | Prompt | Prov | Surf | T1 | T2 | T3 | T4 | T5 |
|---|---|---|---|---|---|---|---|---|
| A | NA Drink Finder app ChatGPT | CG | 26 | 0% | 42% | 0% | 58% | 0% |
| A | NA Drink Finder app ChatGPT | PPX | 32 | 6% | 34% | 34% | 25% | 0% |
| A | NA Drink Finder app ChatGPT | GAI | 26 | 0% | 8% | 62% | 27% | 4% |
| B | Is there a NA beer finder on ChatGPT? | CG | 7 | 29% | 71% | 0% | 0% | 0% |
| B | Is there a NA beer finder on ChatGPT? | PPX | 8 | 12% | 38% | 38% | 12% | 0% |
| B | Is there a NA beer finder on ChatGPT? | GAI | 12 | 42% | 17% | 25% | 17% | 0% |
| B | Can ChatGPT help me find places… | PPX | 3 | 67% | 0% | 33% | 0% | 0% |
| B | Can ChatGPT help me find places… | GAI | 7 | 71% | 14% | 0% | 14% | 0% |
| B | Find places that serve NA beer on ChatGPT | CG | 1 | 100% | 0% | 0% | 0% | 0% |
| B | Find places that serve NA beer on ChatGPT | PPX | 2 | 50% | 0% | 50% | 0% | 0% |
| B | Find places that serve NA beer on ChatGPT | GAI | 6 | 50% | 33% | 17% | 0% | 0% |
| B | Where can I find NA beer near me using AI | PPX | 6 | 0% | 0% | 83% | 17% | 0% |
| B | Do you have an app that helps me find NA beer? | CG | 5 | 0% | 60% | 0% | 40% | 0% |
| B | Do you have an app that helps me find NA beer? | PPX | 3 | 0% | 0% | 0% | 100% | 0% |
| B | Do you have an app that helps me find NA beer? | GAI | 1 | 0% | 0% | 0% | 100% | 0% |
| B | Can ChatGPT help me find non-alcoholic drinks? | GAI | 4 | 75% | 0% | 25% | 0% | 0% |
D.4 Tier distribution pooled by provider
| Provider | Surfaced | T1 Clean | T2 Dominant | T3 Ambig | T4 Mobile | T5 Denies |
|---|---|---|---|---|---|---|
| ChatGPT | 39 | 7.7% (3) | 48.7% (19) | 0.0% (0) | 43.6% (17) | 0.0% (0) |
| Perplexity | 56 | 10.7% (6) | 25.0% (14) | 41.1% (23) | 23.2% (13) | 0.0% (0) |
| Google AI Overview | 56 | 28.6% (16) | 12.5% (7) | 37.5% (21) | 19.6% (11) | 1.8% (1) |
D.5 Weekly dynamics per provider
ChatGPT
| Week | Firings | Surfaced | Surf% | Cited | Cited% |
|---|---|---|---|---|---|
| W12 | 57 | 4 | 7.0% | 0 | 0.0% |
| W13 | 132 | 7 | 5.3% | 1 | 0.8% |
| W14 | 133 | 7 | 5.3% | 2 | 1.5% |
| W15 | 119 | 4 | 3.4% | 5 | 4.2% |
| W16 | 124 | 10 | 8.1% | 57 | 46.0% |
| W17 | 36 | 7 | 19.4% | 15 | 41.7% |
Perplexity
| Week | Firings | Surfaced | Surf% | Cited | Cited% |
|---|---|---|---|---|---|
| W12 | 57 | 3 | 5.3% | 0 | 0.0% |
| W13 | 133 | 13 | 9.8% | 27 | 20.3% |
| W14 | 133 | 17 | 12.8% | 33 | 24.8% |
| W15 | 119 | 10 | 8.4% | 41 | 34.5% |
| W16 | 133 | 9 | 6.8% | 51 | 38.3% |
| W17 | 38 | 4 | 10.5% | 16 | 42.1% |
Google AI Overview
| Week | Firings | Surfaced | Surf% | Cited | Cited% |
|---|---|---|---|---|---|
| W12 | 57 | 3 | 5.3% | 0 | 0.0% |
| W13 | 133 | 5 | 3.8% | 0 | 0.0% |
| W14 | 132 | 8 | 6.1% | 0 | 0.0% |
| W15 | 132 | 30 | 22.7% | 0 | 0.0% |
| W16 | 110 | 8 | 7.3% | 0 | 0.0% |
| W17 | 32 | 2 | 6.2% | 0 | 0.0% |
- OpenAI Developers. Optimize Metadata, Apps SDK documentation. developers.openai.com/apps-sdk/guides/optimize-metadata. Referenced for ChatGPT connector selection mechanism and the direct / indirect / negative prompt framework used to classify Situations 1 to 3.
- Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212. Used for the 95% confidence intervals on proportions throughout Section 6.
- Fisher, R. A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd. Fisher's exact test used for H1, H2, and H5 hypothesis tests.