ResearchControlled experimentApril 21, 202625 min read

Influencing app recommendations in conversational answer engines through published content alone

A 33-day controlled experiment measuring whether content on a fresh, unauthoritative domain can, absent any other marketing input, shift how ChatGPT describes and recommends a ChatGPT app. 74 articles. 1,881 prompt firings across ChatGPT, Perplexity, and Google AI Overview. One clean before-and-after.

Abstract

We tested whether content published on a fresh, unauthoritative domain can, absent any other marketing input, influence how three independent conversational answer engines (ChatGPT, Perplexity, Google AI Overview) describe and recommend a ChatGPT app. We selected a low-authority, independent-developer subject (NA Drink Finder, built by the operator of beerfordriving.com), published 74 articles on a fresh domain, and tracked 1,881 prompt firings across 19 prompts and 3 providers for 33 days. We pre-registered a prompt taxonomy (Class A: literal app name; Class B: platform-anchored; Class C: generic category) and a five-tier qualitative scoring of how each response describes the app.

Primary finding.At baseline, 0 of 22 app-mention responses described NA Drink Finder as a ChatGPT integration; 45% described it as a mobile app or denied the ChatGPT version existed. By endpoint, 46% of app-mention responses described it correctly; 17% still described it as a mobile app; 0% denied its existence. The shift is monotonic and statistically distinguishable from baseline on platform-anchored queries where the subject's name does not appear in the prompt itself (0% → 30% clean ChatGPT-app description rate).

Mechanism. Responses that cite our content describe the app correctly 2.3× more often and misdescribe it as a mobile app 8.3× less often than responses that do not. The @NA Drink Finder invocation syntax, absent from every baseline response, appears in up to 50% of endpoint responses.

Contribution. Evidence that a published content corpus can shift the retrieval graph of ChatGPT on branded and platform-anchored prompts, with effect sizes large enough to change the user-visible description of the product. On Perplexity, our content achieved even higher citation rates (27% of responses). Effects on generic-category prompts were null at the direct-surfacing level but positive at the citation-authority level.

Motivation

Since the launch of the ChatGPT Apps directory in late 2025, app developers have faced a recurring question: what, if anything, can be done to increase the probability that a conversational answer engine recommends a given app inside a user's conversation? The recommendation layer is partially opaque, there is no official playbook, and the most commonly stated hypothesis (that published SEO-style content can move the dial) has not been cleanly tested.

Testing the hypothesis on an established brand introduces confounding signals that cannot be cleanly separated from the effect of new content: existing domain authority, backlinks, press coverage, and organic brand mentions. Waniwani does run analogous content programs on client surfaces, but those results are governed by client confidentiality and cannot be published. The experiment reported here is the one we can share publicly: run on a ChatGPT app built by an independent developer, on a content domain we controlled end to end, with no other marketing input.

How answer engines handle things they don't already know

The recommendation mechanics inside ChatGPT, Perplexity, and Google AI Overview are partially opaque. What is publicly understood, however, is the retrieval-augmented architecture that governs responses to queries whose answers are not reliably present in the model's pre-training data.

When a user asks about something time-sensitive, niche, or post-training-cutoff, the engine issues a web lookup and conditions its response on what it retrieves. ChatGPT Search retrieves primarily from Bing. Perplexity runs its own retrieval layer. Google AI Overview retrieves from Google's index. In each case, the documents that rank for the query at retrieval time shape the text the model produces.

ChatGPT apps fall squarely inside this regime. A third-party ChatGPT app is not part of any model's pre-training corpus, the Apps directory is post-cutoff for most model versions, and any individual app is a sparse fact in the model's parametric memory. When a user's query contains a hook the retrieval layer can match to the app, the text the model produces is shaped by what retrieval returns.

Two layers are worth distinguishing. OpenAI's Apps SDK documentation describes a metadata-based mechanism for invoking an already-installedconnector — the model selecting when to call the connector based on the app's name, description, parameter documentation, and hint annotations.¹That is the in-conversation tool-invocation layer. Our experiment did not test it: we did not modify the app's connector metadata, we do not own the app, and we make no observations about whether or how that mechanism fired.

The layer our experiment addresses is upstream of installation: whether and how the model describes and recommends the app to a user who has not yet installed it. That layer runs on web retrieval over external documents. Our intervention acted only on this path — we published external content, did not touch the connector metadata, and measured the changes in the model's descriptions over time. All findings reported here are scoped to this retrieval path.

This exposes a specific, testable mechanism: if we publish documents that describe a ChatGPT app in a structured, unambiguous way, and those documents are indexed and ranked by the retrieval layer, then the answer engine's responses should converge toward the content of those documents over time. That is the mechanism the experiment tests.

The three user journeys our content is trying to shape

The experiment measures model behavior. What we actually want to influence is user behavior: the sequence that starts with a user asking an answer engine a question and ends with that user installing and using the app. Three distinct user journeys map to the three prompt classes we monitored, and the structure generalizes to any app deployed on a conversational answer engine, provided you know your user well enough to recognize which state they are in.

Who is asking

Three prompt classes, three very different outcomes

Share of responses in which the app was surfaced, by the linguistic anchor of the prompt. Same content corpus, same 33 days.

Class A

App-name-direct

“NA Drink Finder app ChatGPT”

1 tracked prompt

The user already knows the app. Nearly every response surfaces it. Your baseline: if this fails, your content isn't reaching the retrieval layer at all.

Class B

Platform-anchored

“is there a non-alcoholic beer finder on ChatGPT?”

6 tracked prompts

The user has a precise, use-case-shaped need. Where content investment actually converts.

Class C

0.0%

Generic category

“where can I find non-alcoholic beer in Paris?”

12 tracked prompts

The user has the raw problem. Response genre is a list of options, not an app recommendation. Content builds brand authority for future Class A/B queries.

Pooled across 1,881 firings, 19 prompts, 3 providers

Journey A — the user already knows your app.They have heard about it somewhere and come to the engine to find or confirm it. The query contains the app's name or a close variant. In our experiment: "NA Drink Finder app ChatGPT."

Journey B — the user has a precise, use-case-shaped need. The question is narrow enough that only a small set of apps could plausibly answer it. The anchor does not need to be platform-shaped; it just has to describe the need with enough specificity that only a narrow set of products could match. In our experiment, six tracked prompts fell in this class, including "is there a non-alcoholic beer finder on ChatGPT?" and "find places that serve non-alcoholic beer on ChatGPT." For a different vertical this shape could look like a user asking mid-conversation "can I get a quote here?" after anchoring on a specific carrier — same prompt shape, different category, not a prompt we tested. This is the journey a content strategy can most directly catalyze.

Journey C — the user has the raw problem, no narrowing. Generic category questions whose response template is a list of options, not a tool recommendation. Journey C is not where apps get recommended in the first turn; it is where brands get cited alongside competitors, which feeds future Journey A and B queries.

The structure is portable. To apply it to any app on an answer engine, you need a working user model: who is the user, what is their state of awareness when they open the engine, and what specifically do they type in each state. The journeys are not defined by platform vocabulary. They are defined by anchor specificity.

Research questions and pre-registered hypotheses

The primary research question: can published content alone, absent any other marketing input, influence whether and how conversational answer engines recommend a ChatGPT app inside a response?

Five hypotheses were pre-registered before analysis:

H1 (surfacing). Prompts containing an app, brand, or platform anchor will produce higher app-surfacing rates than prompts without such an anchor.
H2 (anchor strength). Prompts containing the literal app name will produce higher surfacing rates than prompts with weaker anchors.
H3 (citation causation). Content-citation rate for the experimental corpus will correlate positively with app-surfacing rate.
H4 (provider variation). Actionable install-instruction rate among surfaced responses will differ by provider.
H5 (temporal compounding). Actionable install-instruction rate and correct-platform description rate will rise over the observation window.

A sixth hypothesis was formulated after baseline responses were reviewed and is reported as exploratory:

H6 (platform-description shift, exploratory). At baseline, the models describe the subject as a mobile app or deny its existence as a ChatGPT app. Over the observation window, this description will shift toward correct-platform framing.

Methodology

5.1 Subject selection

The subject was NA Drink Finder, a ChatGPT app in the Apps directory that helps users locate venues serving non-alcoholic beer. Built by the independent operator of beerfordriving.com, a venue-discovery platform for non-alcoholic drinks. Baseline conditions:

Input	Status at study start
Developer	Independent niche operator (beerfordriving.com)
Product site	Exists, standard venue-discovery SEO content, no references to the ChatGPT app
Press coverage	None identified
Backlinks pointing at the ChatGPT app	None identified
Branded social presence	Minimal
Prior GEO-directed content	None
Sibling product	"NA Beer Finder", a native iOS/Android app from the same developer

The sibling product is a confound: the models' pre-existing knowledge of "NA Beer Finder" (a mobile app) can bleed into responses about "NA Drink Finder" (the ChatGPT app) through name similarity and shared developer.

5.2 Prompt taxonomy

19 test prompts were classified into three classes by linguistic anchor, before outcomes were tallied: Class A (literal app name; 1 prompt), Class B (platform-anchored without naming the app; 6 prompts), Class C (generic category, no app/brand/platform reference; 12 prompts).

OpenAI's Apps SDK metadata-optimization guidance recommends assembling a "golden prompt set" across three types: direct prompts("users explicitly name your product or data source"), indirect prompts("users describe the outcome they want without naming your tool"), and negative prompts("cases where built-in tools or other connectors should handle the request").¹We borrow that three-way classification as a framework for our prompt list, though our experiment tests a different mechanism than the one OpenAI's guidance is written for (external-retrieval recommendation, not installed-connector invocation — see Section 2).

Under OpenAI's framework, our Class A corresponds to direct prompts (the user names the app). Our Class B and Class C would both fall within indirect prompts — in each, the user describes an outcome without naming the app. We subdivide indirect prompts based on whether the prompt still contains a platform, tool, or category anchor ("on ChatGPT", "app that helps", "beer finder"): Class B has one; Class C does not. This subdivision is our empirical contribution. It turned out to be load-bearing in our results — the two subtypes produce an order-of-magnitude difference in surfacing rates (11.1% vs 0.2%), even though OpenAI's framework treats both as a single category.

OpenAI's third type, negative prompts, is not tested here.

5.3 Intervention: the content corpus

74 articles on non-alcoholic-beer.com, a fresh domain with no external authority, no inbound backlinks, no paid distribution, no social amplification. Five clusters:

Cluster	Articles	Target prompt shape
Foundation / evergreen	10	"best NA beer 2026", health, calories, athletes, Dry January
Geo-targeted cities	19	"best NA beer in [city]" across 19 cities
AI / app-discovery	~22	"NA Drink Finder app", "is there a beer finder on ChatGPT"
Occasion / lifestyle	~16	date night, designated drivers, parties, pregnancy
Category / science	~4	stouts, NA vs kombucha, science of NA taste

Common template: direct answer in the first 40 to 60 words, H2/H3 heading hierarchy, JSON-LD structured data (SoftwareApplication, FAQPage, HowTo), and the canonical install block (below) in every article.

“Install NA Drink Finder from the ChatGPT Apps tab. Then type @NA Drink Finder [your query] in any conversation.”
— The install block, present verbatim in all 74 articles

5.4 Technical stack

/llms.txt curated index for LLM crawlers
/robots.txt allowlisting GPTBot, ClaudeBot, ChatGPT-User, PerplexityBot
Sitemap submitted to Bing Webmaster Tools
No paid distribution, manual outreach, or backlink building

5.5 Instrumentation and observation window

Tracking ran on the waniwani.ai platform. Each of the 19 prompts was fired daily against ChatGPT, Perplexity, and Google AI Overview.

Parameter	Value
Study window	2026-03-20 to 2026-04-21 (33 days)
Unique prompts	19
Providers	3 (ChatGPT, Perplexity, Google AI Overview)
Total firings	1,881
Successful firings	1,810 (96.2%)
Mean firings per prompt and provider	~33

5.6 Dependent variables and qualitative scoring

Five quantitative dependent variables per response: surfacing (literal app name in response), App-tab reference, @invoke syntax, content citation of non-alcoholic-beer.com, and sibling-product reference. Plus a five-tier qualitative scoring applied to the ±400-character window around every surfacing:

Tier	Definition
T1 Clean ChatGPT-app	Unambiguously describes NA Drink Finder as a ChatGPT integration, with App-tab or @invoke framing, no mobile-app signals
T2 ChatGPT-dominant	Primarily describes the ChatGPT integration; may correctly mention the mobile app separately
T3 Ambiguous	Name-drop without clear platform framing
T4 Mobile-app	Describes NA Drink Finder as a mobile app (wrong platform)
T5 Denies ChatGPT app	Explicitly states the ChatGPT version does not exist

Results

6.1 Core finding: description of the subject shifted monotonically

The clearest way to see the result is in the distribution of how surfaced responses described the app at baseline vs endpoint.

The headline result

In 33 days, three AI engines went from denying the app existed to giving users the correct install steps

Share of app-mention responses, week 1 baseline vs week 5 endpoint. Pooled across ChatGPT, Perplexity, and Google AI Overview.

Correctly described as a ChatGPT app

Week 14.5%

Week 545.7%

A 10x shift in how the model frames the product.

Described as a mobile app

(wrong platform)

Week 140.9%

Week 517.1%

Wrong-platform descriptions more than halved.

Denied the app exists

Week 14.5%

Week 50.0%

The outright denial disappeared after week 1.

Source: WaniWani tracking instrumentation, 1,881 firings, March 20 to April 21, 2026

Tier	Baseline (n=22)	Middle (n=94)	Endpoint (n=35)
T1 Clean ChatGPT-app	0.0%	18.1%	11.4%
T2 ChatGPT-dominant	4.5%	21.3%	34.3%
T1+T2 combined	4.5%	39.4%	45.7%
T3 Ambiguous	50.0%	30.9%	37.1%
T4 Mobile-app (wrong)	40.9%	29.8%	17.1%
T5 Denies ChatGPT app	4.5%	0.0%	0.0%

Correct-platform description grew 10× in relative terms. Wrong-platform description fell by 58%. Explicit denial disappeared after the first week.

6.2 H1: surfacing by prompt class

Class	Firings	Surfaced	Rate	95% CI
A. App-name-direct	97	84	86.6%	78.4 – 92.0%
B. Platform-anchored	586	65	11.1%	8.8 – 13.9%
C. Generic category	1,127	2	0.2%	0.0 – 0.6%

Class A+B pooled (149/683 = 21.8%) vs Class C (2/1,127 = 0.2%): Fisher's exact, one-sided, p ≈ 6 × 10⁻⁶⁵. H1 supported.

6.3 H2: anchor strength

Class A surfacing 86.6%; Class B surfacing 11.1%. Fisher's exact, p ≈ 8 × 10⁻⁵². The literal app name is roughly an 8× stronger anchor than any other reference tested. H2 supported.

6.4 Per-prompt results with 95% confidence intervals

Class	n	Surfaced	Rate	95% CI	Prompt
A	97	84	86.6%	78.4–92.0%	NA Drink Finder app ChatGPT
B	99	27	27.3%	19.5–36.8%	Is there a non-alcoholic beer finder on ChatGPT?
B	95	10	10.5%	5.8–18.3%	Can ChatGPT help me find places that serve non-alcoholic beer?
B	98	9	9.2%	4.9–16.5%	Find places that serve non-alcoholic beer on ChatGPT
B	99	6	6.1%	2.8–12.6%	Where can I find NA beer near me using AI
B	98	9	9.2%	4.9–16.5%	Do you have an app that helps me find NA beer?
B	97	4	4.1%	1.6–10.1%	Can ChatGPT help me find non-alcoholic drinks?
C	95	1	1.1%	0.2–5.7%	Can you help me find non-alcoholic options near me?
C	89	1	1.1%	0.2–6.1%	Ou trouver une biere sans alcool a Paris?
C	95	0	0.0%	0.0–3.9%	Can you help me find places that have alcohol-free beer?
C	97	0	0.0%	0.0–3.8%	Find me a restaurant with good NA beer options
C	96	0	0.0%	0.0–3.8%	I'm looking for a non-alcoholic beer spot in my neighborhood
C	95	0	0.0%	0.0–3.9%	I'm traveling to London, where can I find non-alcoholic beer?
C	89	0	0.0%	0.0–4.1%	Non-alcoholic beer near me
C	97	0	0.0%	0.0–3.8%	What bars serve non-alcoholic beer?
C	85	0	0.0%	0.0–4.3%	Where can I find non-alcoholic beer in London?
C	190	0	0.0%	0.0–2.0%	Where can I find non-alcoholic beer in New York?
C	99	0	0.0%	0.0–3.7%	Where can I get NA beer in LA?

Every Class A and Class B prompt has a lower CI bound above the Class C noise floor. Every Class C prompt has an upper CI bound below 7%.

6.5 H3: citation attribution

Among the 151 surfaced responses, comparing cited-our-content (n=58) vs not-cited (n=93):

Tier	Cited (n=58)	Not-cited (n=93)	Difference
T1 Clean ChatGPT-app	15.5%	12.9%	+2.6 pp
T2 ChatGPT-dominant	39.7%	10.8%	+28.9 pp
T1+T2 combined	55.2%	23.7%	+31.5 pp
T4 Mobile-app (wrong)	5.2%	43.0%	-37.8 pp
T5 Denies	0.0%	1.1%	-1.1 pp

Cell-level Pearson correlation between citation rate and surfacing rate across 54 prompt × provider cells: r = 0.332, p = 0.014. H3 partially supported.

Our content in the retrieval graph

Our content became the source, week by week

Share of responses that cited non-alcoholic-beer.com, by user situation. All three started at zero.

6.6 H4: provider variation in actionable install rate

Actionable install = App-tab reference AND @invoke syntax both present in a surfaced response.

Provider	Surfaced	Actionable	Rate	95% CI
ChatGPT	39	3	7.7%	2.7 – 20.3%
Perplexity	56	4	7.1%	2.8 – 17.0%
Google AI Overview	56	10	17.9%	10.0 – 29.8%

Chi-squared, df = 2, χ² = 3.89, p = 0.14. H4 not supported at α = 0.05. Point estimates favor Google AI Overview but confidence intervals overlap.

6.7 H5: temporal compounding

Provider	Early actionable	Late actionable	Fisher one-sided p
ChatGPT	0/16 (0.0%)	3/23 (13.0%)	0.19
Perplexity	1/29 (3.4%)	3/27 (11.1%)	0.28
Google AI Overview	0/14 (0.0%)	10/42 (23.8%)	0.041

H5 partially supported. Point estimates rise on all three providers. Reaches significance only on Google AI Overview.

The shift, week by week

From 'this app doesn't exist' to 'here's how to install it'

Share of app-mention responses that describe the app correctly vs incorrectly, pooled across three AI engines. The lines cross in week 15.

74 articles published on a fresh domain. No ads, no PR, no outreach.

6.8 H6 (exploratory): the platform-description shift

At baseline, zero of 22 surfaced responses described the subject as a ChatGPT integration. One ChatGPT response (March 20, on prompt "NA Drink Finder app ChatGPT") stated outright:

“While there isn't a ChatGPT plugin specifically branded 'NA Drink Finder', you can use AI creatively for drink discovery...”
— ChatGPT baseline response, March 20, 2026

By endpoint, correct-platform description reached 45.7% and the denial was gone. Same kind of query, April 19:

“There's a tool called NA Drink Finder you can install inside ChatGPT. You install it from the Apps tab, then type @NA Drink Finder non-alcoholic beer near me. It returns real venues near you that carry NA beer.”
— ChatGPT endpoint response, citing our content

On Class B prompts specifically (where the name is not in the prompt, so the model must retrieve the concept rather than echo it):

Period	Surfaced	T1 Clean	T1 rate
Baseline	5	0	0.0%
Middle	46	14	30.4%
Endpoint	14	4	28.6%

The Class B cell contains the cleanest causal claim in the dataset. Baseline T1 was zero; three weeks later it was 30%.

6.9 The `@invoke` syntax as a content-to-output transfer signal

Zero of the 10 baseline app-mention responses contained @NA Drink Finder or any variant. By endpoint, the syntax appeared in up to 50% of ChatGPT Class A responses that cited our content and in 8% of pooled responses across all providers. The syntax was seeded in identical form across 74 articles; its post-intervention appearance is interpretable as direct content-to-output transfer.

6.10 Sibling-product disambiguation

Subset	Sibling-reference rate
All responses	59%
Responses citing our content	69%
Responses not citing our content	88%

Sibling-reference rate drops 19 pp when our content is cited (Fisher's exact p = 0.005). Consistent with our content's explicit disambiguation language reducing cross-contamination.

Discussion

7.1 What we proved

Anchored prompts surface the subject at rates distinguishable from unanchored prompts. Class A+B: 21.8% (18.9 – 25.1%). Class C: 0.2% (0.0 – 0.6%). p ≈ 6 × 10⁻⁶⁵.
The models' description of the subject shifted from "mobile app or nonexistent" to "ChatGPT integration" over the observation window. Correct-platform rate 4.5% → 45.7%; wrong-platform rate 40.9% → 17.1%. Monotonic across three independent providers.
Citation of our content correlates with correct platform description: +31.5 pp on T1+T2, -37.8 pp on T4.
The @invoke syntax appeared in model output after the intervention, where zero baseline responses contained it.

7.2 What we did not prove

Direct causation for Class A surfacing. The app's name is in the prompt, so baseline surfacing could be driven by the prompt itself.
Effect on Google AI Overview citation. GAIO's citation of our domain stayed at 0%. Any shift in its descriptions is indirect.
Effect on generic-category (Class C) surfacing. Class C response templates are venue and product lists, not app recommendations. Surfacing rate at endpoint was 0.5%, within baseline CI.
Full-actionable-install production at majority rates. The T1 rate never exceeded 18% in any period.

7.3 Proposed mechanism

Content is published on a fresh domain.
Bing indexes the content (load-bearing for ChatGPT Search and, in part, Perplexity).
Providers' retrieval layers begin to rank the content on branded and platform-anchored queries within 1 to 3 weeks.
When the model's generation is conditioned on our content, it inherits (a) the ChatGPT-integration framing, (b) the App-tab install language, (c) the @invoke syntax, (d) reduced sibling reference.
When not conditioned on our content, it falls back to pre-training signals, which are dominated by the sibling mobile product.

This mechanism explains the ChatGPT and Perplexity results cleanly. Google AI Overview never cited our content directly; any shift there is likely routed through third-party sites that Google indexes, and is not part of the causal claim.

7.4 Response-template conditioning

Answer engines use different response templates for different query types. Class C prompts produce venue and product lists; Class A and B prompts produce app descriptions with install guidance. These are structurally different generation tasks.

Our content operated at both levels. Direct: on Class A+B retrievals, the response template is app-aware and our install-language tokens flow directly into the output. Indirect: on Class C retrievals, the genre is lists of options. Our content was cited (0% → 23.5% across the window), but the genre does not produce app recommendations. The indirect effect is authority building, which plausibly feeds the retrieval layer for downstream Class A+B queries but is not quantified here.

Implication for distribution-app brands (a carrier with a quote app, a retailer with a shopping app, a financial platform with a product app): the relevant funnel is Class C → Class A/B → app. Class C queries cite the brand as an entity alongside competitors; Class A/B queries switch the response template to app recommendation. Both are required. This paper validates step 2.

Limitations

One subject, one category (non-alcoholic beer). Class-level findings need replication.
Three providers only. Claude and Gemini native app recommendation not tested.
Observation window of 33 days. The experiment is still compounding.
No true pre-publication baseline. First days of monitoring are close to content launch.
Tier scoring is regex-based. Manual review would improve precision; direction of findings unlikely to change.
Developer's existing product site is a residual confound.
Sibling-product interference is unusually severe for this subject because of shared developer and similar naming.
App discovery is not brand citation. This experiment tests how an engine describes and recommends a ChatGPT app. It does not test how an engine cites a brand in a category comparison. The response templates differ structurally: app recommendation produces a procedural answer; brand citation produces a ranked or comparative list. Findings do not inherit from this study by extrapolation.

Implications

For a team deploying an app on ChatGPT or an adjacent answer engine:

Published content can move the retrieval graph.An independent developer with no prior marketing footprint can shift ChatGPT's description of their app from "mobile app or nonexistent" to "ChatGPT integration" in roughly four weeks. Perplexity behaves similarly and cited our content at an even higher rate (27%).
Invest first in Situation 1 and 2 prompts. The effect is statistically distinguishable from null on every anchored prompt and null on every generic-category prompt at current content volume.
Write install instructions in a canonical, reproducible block.Exact app name, exact install location ("ChatGPT Apps tab", not "App Store"), exact invocation syntax (@YourAppName), in one contiguous paragraph near the top of the article.
Disambiguate your app from similarly named products. 88% of uncited surfaced responses reference the sibling product. Aggressive explicit disambiguation reduces cross-contamination.
Expect a 2 to 4 week lag before effects stabilize. Bing indexing has a propagation delay that nothing you do accelerates meaningfully.
Track tier distribution, not just surfacing rate. A response that mentions the app but describes it as a mobile app can actively mislead the user.

Future work

In progress. Outbound links from each experimental article to the ChatGPT app directory page, to isolate the source-reader path (human → install) from the model-reader path.
Replication on a second subject in a different category.
Extend prompt tracking to Claude native app recommendation and Gemini.
Second content cohort with refined indexable-install blocks and explicit disambiguation, testing whether the T1 rate can be lifted above the current 11 to 18% ceiling.

Appendix A

Scoring signals

Signal	Definition
Surfacing	"NA Drink Finder" (case-insensitive) in response text
App-tab	App-tab / Apps-section / ChatGPT-Apps phrases within ±300 chars of a surfacing
@invoke	@NA Drink Finder within ±300 chars of a surfacing
Mobile-app	App Store / Google Play / Play Store / mobile app / APK / iTunes in the same window
Denial	"there isn't a ChatGPT…", "no ChatGPT…plugin", "often listed as"
Sibling	"NA Beer Finder" anywhere in response text
Content citation	non-alcoholic-beer.com in the cited sources list

Appendix B

Data artifacts

Full CSV: 1,881 firings with prompt, provider, timestamp, status, response text, sources.
Response-level review: 151 surfaced responses, chronologically ordered, tier-scored.
Per-response scoring dump.
Chart source datasets.

Available on request. Email research@waniwani.ai.

Appendix C

Full prompt list

The complete list of 19 test prompts in taxonomic order, with pooled results across ChatGPT, Perplexity, and Google AI Overview, is tabulated in Section 6.4 above.

Appendix D

Complete per-prompt and per-provider results

D.1 Provider-level totals

Provider	Firings	Surfaced	Surface rate	Our domain cited	Cited rate
ChatGPT	601	39	6.5%	80	13.3%
Perplexity	613	56	9.1%	168	27.4%
Google AI Overview	596	56	9.4%	0	0.0%

D.2 Per-prompt × provider results (exhaustive)

Class	Prompt	Prov	Firings	Surf	Surf%	Cited	Cited%
A	NA Drink Finder app ChatGPT	CG	32	26	81.2%	8	25.0%
A	NA Drink Finder app ChatGPT	PPX	32	32	100.0%	26	81.2%
A	NA Drink Finder app ChatGPT	GAI	33	26	78.8%	0	0.0%
B	Is there a non-alcoholic beer finder on ChatGPT?	CG	33	7	21.2%	8	24.2%
B	Is there a non-alcoholic beer finder on ChatGPT?	PPX	33	8	24.2%	29	87.9%
B	Is there a non-alcoholic beer finder on ChatGPT?	GAI	33	12	36.4%	0	0.0%
B	Can ChatGPT help me find places that serve NA beer?	CG	31	0	0.0%	9	29.0%
B	Can ChatGPT help me find places that serve NA beer?	PPX	32	3	9.4%	28	87.5%
B	Can ChatGPT help me find places that serve NA beer?	GAI	32	7	21.9%	0	0.0%
B	Find places that serve non-alcoholic beer on ChatGPT	CG	33	1	3.0%	8	24.2%
B	Find places that serve non-alcoholic beer on ChatGPT	PPX	32	2	6.2%	27	84.4%
B	Find places that serve non-alcoholic beer on ChatGPT	GAI	33	6	18.2%	0	0.0%
B	Where can I find NA beer near me using AI	CG	33	0	0.0%	0	0.0%
B	Where can I find NA beer near me using AI	PPX	33	6	18.2%	5	15.2%
B	Where can I find NA beer near me using AI	GAI	33	0	0.0%	0	0.0%
B	Do you have an app that helps me find NA beer?	CG	33	5	15.2%	8	24.2%
B	Do you have an app that helps me find NA beer?	PPX	32	3	9.4%	1	3.1%
B	Do you have an app that helps me find NA beer?	GAI	33	1	3.0%	0	0.0%
B	Can ChatGPT help me find non-alcoholic drinks?	CG	33	0	0.0%	0	0.0%
B	Can ChatGPT help me find non-alcoholic drinks?	PPX	32	0	0.0%	0	0.0%
B	Can ChatGPT help me find non-alcoholic drinks?	GAI	32	4	12.5%	0	0.0%
C	Ou trouver une biere sans alcool a Paris?	CG	31	0	0.0%	2	6.5%
C	Ou trouver une biere sans alcool a Paris?	PPX	33	1	3.0%	20	60.6%
C	Ou trouver une biere sans alcool a Paris?	GAI	25	0	0.0%	0	0.0%
C	Where can I find non-alcoholic beer in New York?	CG	61	0	0.0%	1	1.6%
C	Where can I find non-alcoholic beer in New York?	PPX	64	0	0.0%	30	46.9%
C	Where can I find non-alcoholic beer in New York?	GAI	65	0	0.0%	0	0.0%
C	What bars serve non-alcoholic beer?	CG	32	0	0.0%	9	28.1%
C	What bars serve non-alcoholic beer?	PPX	32	0	0.0%	0	0.0%
C	What bars serve non-alcoholic beer?	GAI	33	0	0.0%	0	0.0%
C	Find me a restaurant with good NA beer options	CG	32	0	0.0%	8	25.0%
C	Find me a restaurant with good NA beer options	PPX	32	0	0.0%	0	0.0%
C	Find me a restaurant with good NA beer options	GAI	33	0	0.0%	0	0.0%
C	Where can I get NA beer in LA?	CG	33	0	0.0%	6	18.2%
C	Where can I get NA beer in LA?	PPX	33	0	0.0%	0	0.0%
C	Where can I get NA beer in LA?	GAI	33	0	0.0%	0	0.0%
C	Other Class C prompts (8 prompts × 3 providers)	—	~760	2	0.3%	10	1.3%

Remaining Class C rows abbreviated for readability. Full dataset available on request.

D.3 Tier distribution by prompt × provider (surfaced responses only)

Class	Prompt	Prov	Surf	T1	T2	T3	T4	T5
A	NA Drink Finder app ChatGPT	CG	26	0%	42%	0%	58%	0%
A	NA Drink Finder app ChatGPT	PPX	32	6%	34%	34%	25%	0%
A	NA Drink Finder app ChatGPT	GAI	26	0%	8%	62%	27%	4%
B	Is there a NA beer finder on ChatGPT?	CG	7	29%	71%	0%	0%	0%
B	Is there a NA beer finder on ChatGPT?	PPX	8	12%	38%	38%	12%	0%
B	Is there a NA beer finder on ChatGPT?	GAI	12	42%	17%	25%	17%	0%
B	Can ChatGPT help me find places…	PPX	3	67%	0%	33%	0%	0%
B	Can ChatGPT help me find places…	GAI	7	71%	14%	0%	14%	0%
B	Find places that serve NA beer on ChatGPT	CG	1	100%	0%	0%	0%	0%
B	Find places that serve NA beer on ChatGPT	PPX	2	50%	0%	50%	0%	0%
B	Find places that serve NA beer on ChatGPT	GAI	6	50%	33%	17%	0%	0%
B	Where can I find NA beer near me using AI	PPX	6	0%	0%	83%	17%	0%
B	Do you have an app that helps me find NA beer?	CG	5	0%	60%	0%	40%	0%
B	Do you have an app that helps me find NA beer?	PPX	3	0%	0%	0%	100%	0%
B	Do you have an app that helps me find NA beer?	GAI	1	0%	0%	0%	100%	0%
B	Can ChatGPT help me find non-alcoholic drinks?	GAI	4	75%	0%	25%	0%	0%

D.4 Tier distribution pooled by provider

Provider	Surfaced	T1 Clean	T2 Dominant	T3 Ambig	T4 Mobile	T5 Denies
ChatGPT	39	7.7% (3)	48.7% (19)	0.0% (0)	43.6% (17)	0.0% (0)
Perplexity	56	10.7% (6)	25.0% (14)	41.1% (23)	23.2% (13)	0.0% (0)
Google AI Overview	56	28.6% (16)	12.5% (7)	37.5% (21)	19.6% (11)	1.8% (1)

D.5 Weekly dynamics per provider

ChatGPT

Week	Firings	Surfaced	Surf%	Cited	Cited%
W12	57	4	7.0%	0	0.0%
W13	132	7	5.3%	1	0.8%
W14	133	7	5.3%	2	1.5%
W15	119	4	3.4%	5	4.2%
W16	124	10	8.1%	57	46.0%
W17	36	7	19.4%	15	41.7%

Perplexity

Week	Firings	Surfaced	Surf%	Cited	Cited%
W12	57	3	5.3%	0	0.0%
W13	133	13	9.8%	27	20.3%
W14	133	17	12.8%	33	24.8%
W15	119	10	8.4%	41	34.5%
W16	133	9	6.8%	51	38.3%
W17	38	4	10.5%	16	42.1%

Google AI Overview

Week	Firings	Surfaced	Surf%	Cited%
W12	57	3	5.3%	0.0%
W13	133	5	3.8%	0.0%
W14	132	8	6.1%	0.0%
W15	132	30	22.7%	0.0%
W16	110	8	7.3%	0.0%
W17	32	2	6.2%	0.0%

References

OpenAI Developers. Optimize Metadata, Apps SDK documentation. developers.openai.com/apps-sdk/guides/optimize-metadata. Referenced for ChatGPT connector selection mechanism and the direct / indirect / negative prompt framework used to classify Situations 1 to 3.
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212. Used for the 95% confidence intervals on proportions throughout Section 6.
Fisher, R. A. (1935). The Design of Experiments. Edinburgh: Oliver & Boyd. Fisher's exact test used for H1, H2, and H5 hypothesis tests.

← All research

Instrumentation: waniwani.ai