Observability

llm.port helps you understand what is happening in your AI system day to day.

You can quickly answer questions like:

Are requests healthy and fast?
Which models are being used most?
Where is cost going up?
Who changed what, and when?

What you can observe

Request activity and outcome trends
Latency and throughput indicators
Cost estimates per request, model, provider, and user
Pricing catalog management with version history
System health and service behavior
Administrative action trails

Cost Observability Dashboard

The Cost Observability dashboard (Admin → Observability → Cost Dashboard) provides self-contained cost tracking without external dependencies.

Overview Tab

Widget	Description
Estimated Spend	Total estimated cost for the selected period
Total Requests	Number of gateway requests
Total Tokens	Prompt + completion tokens consumed
Avg Latency	Average request latency
Error Rate	Percentage of requests with 4xx/5xx status
Cost Over Time	Line chart of cost per day/hour
Request Throughput	Line chart of requests per day/hour
Latency Percentiles	Bar chart of p50/p95/p99 latency
Spend by Model	Bar chart of cost per model
Top Users	Table of users ranked by request count

Use the date range selector (7d / 14d / 30d / 90d) to focus on the time window you care about. Use CSV Export when you want to share or analyze data outside the UI.

Requests Tab

Paginated log of all gateway requests with:

Filters: model alias, user ID
Expandable detail: request ID, trace ID, provider, tokens, TTFT, cost breakdown, estimate status
Status badges: color-coded HTTP status

This is usually the fastest place to investigate a user-reported issue.

Pricing Tab

Manage the cost catalog used for request-level estimates:

Add new provider/model pricing entries
Edit prices (creates a new version, deactivates the old one)
Delete (deactivate) entries
View history per model to audit price changes
Seed data included for common models (GPT-4.1, Claude Sonnet 4, Gemini 2.5 Pro, etc.)

Note: Prices are estimates for visibility and planning. They do not change provider billing.

How cost estimates work

Each gateway request is matched against the price catalog by provider + model
Cost is calculated: (prompt_tokens / 1000) × input_price + (completion_tokens / 1000) × output_price
Cached tokens are deducted from prompt token count before pricing
Each request gets a cost_estimate_status: complete, partial (missing tokens or price), or unavailable

Enterprise Edition extensions

The following features require the Enterprise Edition license:

Budget management — set monthly spend limits per tenant
Cost forecasting — projected spend based on historical trends
Alerting — threshold-based alerts for cost, error rate, latency
Chargeback reports — per-tenant cost allocation

Why this matters

Faster incident detection and troubleshooting
Better governance and compliance reporting
Data for capacity planning and optimization
Cost visibility without external observability platforms

For most teams, this becomes the single source of truth for AI operations.

Recommended operating practice

Define alert thresholds for key service indicators
Review usage and access trends regularly
Keep retention policies aligned with compliance requirements
Review and update model pricing when providers change rates

Public docs focus on observable outcomes and operating guidance, not internal telemetry plumbing.

Screenshots

Dashboard

Trace Viewer

Request cost trend graph

Logging

What you can observe​

Cost Observability Dashboard​

Overview Tab​

Requests Tab​

Pricing Tab​

How cost estimates work​

Enterprise Edition extensions​

Why this matters​

Recommended operating practice​

Screenshots​