Methodology — How CreatorDB Builds Its Public Creator Pages

What's Measured vs. What's AI-Assisted

Every public creator-stats page on CreatorDB is built from two distinct kinds of content. We separate them deliberately because the trust signals are different.

Measured (Live From Public Platforms)

Follower / subscriber counts — pulled from each platform's public API at generation time
Engagement rate — computed from the most recent ~30 days of public posts
Audience demographics — gender breakdown, age skew, and top countries from platform-reported analytics
Growth history — historical follower counts over the last 365 days, sampled daily
Posts-per-week — derived from the public content stream
Sponsorship history — extracted from creator content tagged or detected as sponsored

These data fields are directly observed and refreshed. They aren't generated, predicted, or inferred.

AI-Assisted (Clearly Disclosed on Every Page)

Creator bio — a 2-paragraph synthesis that draws on the live data sheet plus the model's training knowledge of the creator. Speculative claims about private relationships, controversies, or unverified facts are explicitly forbidden in the system prompt.
FAQs — 10 long-tail SEO questions and answers targeted at what people search about that specific creator. Generated from the same data sheet.
Niche classification — short label inferred from content signals.
Recent news — see "News sourcing" below; never relies on model memory.

News Sourcing — Grounded, Not Recalled

The "Recent News" section on every page is generated through a web-search-grounded call: the model is given access to a live search tool and is required to cite a verifiable URL for every news item it returns. Items without a real source URL are silently dropped — they never reach the page.

Why this matters: a stats page with fabricated news damages trust more than no news at all. Every news item shows a via [Source] → link so readers can verify the claim at the source.

Source Allowlist

The model is instructed to prefer recognized publications, including but not limited to:

Variety · The Verge · TechCrunch · Tubefilter · Wired · The Hollywood Reporter · Insider / Business Insider · Forbes · Bloomberg · Reuters · Associated Press · BBC · CNN · NBC News · Polygon · Kotaku · IGN · Dexerto · Wikipedia · the creator's own official channels.

Tabloids, rumor mills, and drama channels are excluded. If no credible coverage exists for a creator, the news section is omitted entirely.

Refresh Cadence

Content type	Refresh trigger
Live stats (followers, engagement, demographics)	Nightly batch refresh on popular pages; on-demand via the "Refresh profile" action on individual pages
Bio & FAQs	Manually re-generated when a creator's situation materially changes; not refreshed nightly to avoid drift
News items	Re-generated on any "Refresh profile" action; cited URLs are static once stored
Sponsorship data	Refreshed in lockstep with live stats

Every page footer displays the actual last refreshed timestamp pulled from when the data was last written, not a generic "today" string.

Quality Threshold & Sparse Profiles

Not every creator-handle hit produces a public page. Profiles that fall below our data-quality floor are flagged sparse and rendered with noindex,follow so they don't pollute search results. The floor includes:

Under 10,000 followers and no public engagement signal
Profile data older than 90 days with no recent activity
Missing required fields (handle, country, primary platform)

Sections within a page also render conditionally — pages with no documented sponsorships skip the Brand Partnerships section, pages with fewer than three cited news items skip Recent News, and so on. Pages naturally vary in length based on the depth of public data available, rather than every page following the same template.

Accuracy & Corrections

We take accuracy seriously because errors on a stats page compound: they affect business decisions on the brand side and reputation on the creator side. If something on a CreatorDB profile is wrong, we want to fix it within five business days.

Schema markup on every page reflects only the measured fields, not AI-generated content.
Sourced claims in Recent News link out to the citation — verify directly at the source.
Bio claims stick to information present in the live data sheet plus widely-reported public facts. Speculative material is filtered by the system prompt and removed in review.

To flag an inaccuracy, email hello@creatordb.app with the URL and the specific field. We'll respond within five business days.

Removal / Opt-Out

If you're the creator and you'd like your profile removed from the public CreatorDB site, we'll honor that request:

Email hello@creatordb.app from the same email associated with your verified social presence, or from a representative who can confirm
Include the profile URL or your handle
We'll take the page down and add a permanent tombstone so it won't be regenerated

If your concern is that a specific section (e.g., AI-generated bio) is wrong but you don't want the whole page removed, say so — we can edit the relevant fields without deleting the page.

EU/UK residents covered by GDPR are entitled to the same process and will receive priority handling.

Why We Publish These Pages

CreatorDB operates an influencer marketing agency and a creator data API. The public creator-stats pages serve two audiences: brands evaluating whether to work with a creator (the data answers questions they'd otherwise need a paid tool for), and creators themselves who can use the page as a public stat-sheet during outreach.

We publish them free because the underlying data is public to begin with — what we add is the normalization, comparison against tier baselines, and the editorial layer that turns raw API responses into something readable. We don't paywall demographics or sponsorship history the way most competitors do.

Technical Disclosures

AI provider: Anthropic (Claude Opus 4.7). Listed for transparency.
Data infrastructure: profiles cached in Cloudflare Workers KV; rendered through Cloudflare Pages Functions.
Crawl access: our robots.txt explicitly allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and other LLM crawlers. We don't block AI training crawlers.
llms.txt: we publish an llms.txt at the root so AI assistants citing our pages have a clean summary.