Open-Source Analytics for AI Apps: Why It Matters

The analytics industry was built for websites and mobile apps. Google Analytics tracks pageviews. Mixpanel tracks button clicks. Amplitude tracks user journeys through screens. These tools assume a browser or a native app, a visual UI the user navigates, and client-side JavaScript collecting events.

AI apps break all of these assumptions. When your product is an MCP server or a ChatGPT App, the user interacts through a conversation — not a screen. There are no pageviews. No click events. No DOM to instrument. The AI model sits between your product and your user, mediating every interaction.

This architectural shift creates a new analytics problem. And the solution, we believe, should be open source.

Why Existing Analytics Tools Don't Work for AI Apps

If you've tried plugging an MCP server into a traditional analytics platform, you've already hit the wall.

No client-side environment. Tools like Google Analytics, Hotjar, and FullStory rely on JavaScript running in the user's browser. MCP servers are backend processes. There's no browser, no DOM, no window object. The entire execution model is different.

Wrong event model. Traditional analytics track events like "page_viewed," "button_clicked," "form_submitted." AI app events are "tool_called," "resource_read," "prompt_executed," "conversion_completed." The taxonomy is fundamentally different, and shoehorning AI events into web analytics categories loses the meaning.

No protocol awareness. When an AI assistant calls your MCP tool, the interaction follows a specific protocol with tool names, typed parameters, structured responses, and error states. Generic analytics tools see an HTTP request. They don't understand that it was a tool call, which tool was called, what parameters were passed, or whether the response was useful.

No multi-turn session model. Web analytics define sessions by browser activity — pageviews within a time window. AI app sessions are conversations, often spanning multiple tool calls across minutes or hours. The session model needs to understand conversation context, not cookie-based visit windows.

You can hack around some of these limitations by manually sending events to Mixpanel or Amplitude from your MCP server. But you end up with a fragile custom integration that doesn't understand the domain, requires manual instrumentation of every tool, and produces dashboards that don't map to the questions you actually need answered.

The Case for Open Source

Analytics platforms hold sensitive data — every user interaction, every conversion, every error. For AI apps, this data is even more sensitive: it includes the content of tool calls, user identifiers, and the parameters users pass through AI conversations.

Open source addresses this in three ways.

Transparency. You can read the code that processes your data. You know exactly what gets captured, how PII is handled, and where data is stored. There's no black box between your users and your analytics.

Self-hosting. If your data needs to stay on your infrastructure — for compliance, for privacy, or for principle — you need self-hosted analytics. Open source makes this possible without licensing fees or vendor negotiations. Deploy with Docker, run it on your servers, own your data.

No vendor lock-in. Closed-source analytics platforms can change pricing, deprecate features, or shut down. If your analytics runs on code you can fork and maintain, you're not at the mercy of a vendor's roadmap. Your data and your dashboards are yours permanently.

For AI apps specifically, there's an additional argument: the ecosystem is moving fast. MCP is evolving. ChatGPT Apps are evolving. The analytics layer needs to keep up with protocol changes, new event types, and shifting interaction patterns. Open source lets the community contribute to this evolution instead of waiting for a single vendor to ship updates.

What Open-Source AI App Analytics Looks Like

An analytics platform built for AI apps needs to handle a different set of primitives than web analytics:

Tool call tracking — automatically capturing every MCP tool invocation with full metadata: tool name, input parameters, output content, duration, success/failure state. This is the equivalent of pageview tracking for web, but for AI interactions.

User identification — tying events to known users so you can calculate retention, build cohorts, and debug per-user issues. This is especially important in AI apps where the same tool might be called by an anonymous user via ChatGPT or by an authenticated user via your own integration.

Funnel analysis — tracking multi-step workflows across tool calls. Search → select → configure → purchase. The funnel might span multiple conversation turns and multiple tool invocations. The analytics layer needs to stitch these together into a coherent path.

Error analysis — grouping and counting errors by tool, error type, and user segment. In AI apps, errors are particularly costly: the AI model may stop recommending your tool if it fails too often. Error monitoring isn't just about reliability — it's about visibility and distribution.

Retention curves — measuring whether users come back after day 1, day 7, day 30. This is the metric that separates products from demos. For AI apps, retention is tracked across tool call sessions, not browser visits.

PII stripping — automatically removing personally identifiable information before storage. AI app events can contain sensitive data passed through tool parameters. The analytics layer should handle this by default, not as an afterthought.

Yavio: Open-Source Analytics for MCP Apps and ChatGPT Apps

Yavio is our answer to this problem. It's the first open-source analytics platform built specifically for AI-native applications.

The architecture is simple: an SDK wraps your MCP server (or ChatGPT App widget), capturing events and sending them to an ingestion API. The ingestion API strips PII, validates schemas, and writes to ClickHouse. A Next.js dashboard reads from ClickHouse and PostgreSQL to present analytics views.

The entire stack is open source under the MIT license. One GitHub repository, one codebase, no feature gating between self-hosted and cloud.

For MCP servers, integration is a single function call:

import { withYavio } from "@yavio/sdk";
const server = withYavio(yourMcpServer);

Every tool call, resource read, and prompt execution is captured automatically. No manual event tracking, no code changes to your tool handlers.

For ChatGPT App widgets, a React hook provides the same automatic instrumentation:

import { useYavio } from "@yavio/sdk";

On the dashboard, you get an overview of all activity, per-tool breakdowns, funnels with drop-off percentages, retention curves, a live event feed, and error analysis. Everything you need to understand how your AI app is actually being used.

Self-Hosted in Five Minutes

If you want analytics on your own infrastructure:

git clone https://github.com/yavio-ai/yavio-analytics.git
cd yavio-analytics
cp .env.example .env
docker compose up -d

The dashboard runs at localhost:3000, the ingestion API at localhost:3001. ClickHouse handles the analytics event storage (MergeTree engine, designed for high-throughput append-heavy workloads). PostgreSQL handles application data (users, workspaces, projects, API keys).

Minimum requirements are 2 CPU cores, 4 GB RAM, and 20 GB disk. For production workloads, 4+ cores and 8+ GB RAM are recommended.

Cloud If You Don't Want to Manage Infrastructure

Yavio Cloud runs the same codebase as the self-hosted version. No feature gating — everything you get self-hosted, you get in Cloud, and vice versa.

The free tier includes 1 million events per month, which covers most early-stage AI apps comfortably. Setup takes two commands:

npm install @yavio/sdk
npx @yavio/cli init

Your analytics dashboard is at app.yavio.ai.

The Bigger Picture

The AI app ecosystem is at an inflection point. MCP adoption is accelerating. ChatGPT Apps are maturing. More businesses are shipping AI integrations every week. But the tooling hasn't caught up.

Web development had a decade to build its analytics stack — from Google Analytics to Mixpanel to PostHog. AI apps need their own analytics stack, designed for the unique interaction patterns of tool calls, AI-mediated sessions, and conversational workflows.

We think this stack should be open source. The data is too sensitive, the ecosystem is too young, and the pace of change is too fast for a single closed-source vendor to own the category. Open source lets the community build the analytics layer that AI apps deserve.

Yavio is our contribution to that effort. The code is on GitHub, the license is MIT, and the product is free to start with. If you're building AI apps, you need analytics. We'd rather you have good, open-source analytics than no analytics at all.

Star us on GitHub or try Yavio Cloud free.