How We Measure Generosity
Every provider in the Volksdroid catalog carries a generosity score from 1 to 10. The score answers one question a builder actually has: how much can I get done for free here, and what will it cost me in friction? This document defines exactly how that number is produced so the ranking is transparent, reproducible, and honest.
The score is not a single opinion. It is a weighted blend of five measured dimensions, normalized within each category, then calibrated so the results spread across the full range instead of clumping at the top.
The five dimensions
Each provider is scored 0–100 on five dimensions. The dimensions are deliberately orthogonal — they capture different ways a free tier can be good or bad.
| Dimension | Weight | What it captures |
|---|---|---|
| Quantity | 35% | How much you actually get for free, normalized within the category — tokens/requests, storage GB, compute hours, credit dollars, scheduled jobs, seats, domains. |
| Renewability | 20% | Whether the allowance refills. A monthly/daily renewable tier is worth far more over time than a one-time finite credit, which still beats a short trial. |
| Accessibility | 20% | How low the barrier to start is: no card, no approval, no eligibility gate, instant signup — versus a card on file, a minimum balance, student/startup status, a region lock, or manual review. |
| Usability | 15% | How production-viable the free tier is: latest/frontier capability, always-on, no crippling feature cuts — versus sleeps-when-idle, no high availability, throttled, or evaluation-only. |
| Longevity | 10% | How durable the offer is: a stable, long-standing free tier versus a promo, beta, or time-boxed program that may disappear. |
Quantity carries the most weight because “how much is free” is the first thing a builder cares about. But it is intentionally not the whole story: a small renewable, no-strings tier routinely outranks a large one-time credit behind a card, because over the life of a real project the renewable tier delivers more and demands less.
How each dimension is scored (no subjective judgment)
Every dimension is computed by an explicit formula over the provider’s typed
data, then percentile-ranked within its category so each indicator is evenly
distributed and two providers in the same category land on clearly different
numbers. The formulas live in scripts/score-generosity.py (the single source of
truth) and are described below. Ranking is what turns raw facts into comparable
points: for each indicator we sort every provider in a category by the formula’s
output and assign a 0–100 percentile (ties share the average rank).
Quantity — the real, web-verified value of the category’s primary key metric (table below), ranked within the category. Unlimited/unmetered ranks at the top, an absent allowance at the bottom. The primary metric and its supporting metrics are shown on every card and listed in full in the detail panel.
| Category | Primary metric | Supporting metrics |
|---|---|---|
| LLM providers | Free tokens (M tokens/mo) | Requests/day, Tokens/min, Trial credit, Commercial use |
| Databases | Storage (GB) | Egress, Compute, Row reads, Databases |
| App platforms | Bandwidth (GB/mo) | Build minutes, Function calls, Custom domains |
| Serverless functions | Invocations (M/mo) | Compute (GB-s), CPU/call, Bandwidth |
| Schedulers | Cron jobs (count) | Min interval, Executions, Run history, Job timeout |
| Storage | Storage (GB) | Free egress, Write ops, Read ops |
| Workspaces | Capacity (records/seats) | Storage, RAM, Workspaces |
| Domains | Free domains (count) | DNS records, Dynamic-DNS refresh, Ad-free |
Requests → tokens. Many LLM free tiers publish only a request cap (“50
requests/day”, “30 RPM”). To normalize everything to the primary unit
(M tokens/month) we convert using a researched average of ~1,000 tokens per
request (prompt + completion for a typical single-turn chat/agent call):
tokens/month ≈ requests/day × 30 × 1,000. Heavy RAG/long-context traffic runs
higher, but ~1,000 is the right reference class for normalizing consumer-grade
request limits.
Renewability — an ordinal from renewal + pricingModel + limitations, then
ranked:
5daily/monthly renewable, or a perpetualfree_forever/open_source_selfhosttier with nothing to deplete4annual renewal2one-time finite pool (total_volume_cap,finite_credit, orrenewal: one_time)1time-limited (time_limited) ortrial3otherwise
Accessibility — start at 100, subtract documented barrier penalties, then
rank (floored at 5):
−25card required ·−30minimum balance held ·−40restricted to students/startups/businesses/region/OSS ·−35manual approval ·−50waitlist
Usability — start at 100, subtract quality-cut penalties read from the typed
limitations, then rank (floored at 5):
−40non-commercial ·−25sleeps when idle ·−15throttled ·−8single-instance ·−6per feature cut (max 4) ·−5attribution required
Longevity — an ordinal from pricingModel, capped at 2 when time_limited,
then ranked:
5free_forever/freemium/open_source_selfhost·4byok·3finite_credit/student·2startup·1trial
The composite (0–100)
Each dimension sub-score is its 0–100 within-category percentile rank. The composite is their weighted sum:
raw = 0.35·quantity + 0.20·renewability + 0.20·accessibility
+ 0.15·usability + 0.10·longevity
stored on every provider as generosityRaw. Because the inputs are ranks, the
composite is naturally spread and has no clustering at the top.
Normalizing to the 1–10 score
The composite is mapped to the displayed 1–10 score with a robust min–max stretch. We take the 2nd and 98th percentiles of all composites as the low/high anchors and linearly rescale, clamping to [1, 10]:
score = clamp( 1 + (raw − P2) / (P98 − P2) · 9 , 1, 10 )
Using the 2nd/98th percentiles instead of the absolute min/max means a single outlier can’t compress everyone else, while the full 1–10 range is still used. The result is a normalized, well-spread distribution (≈ mean 5.9, every tier populated, the full 1–10 range in use).
Because every dimension is ranked within a category, sub-scores are most meaningful when comparing providers in the same category (LLM vs. LLM, database vs. database). The composite is comparable across categories — all inputs are percentiles — and the catalog is organized by category because that is the comparison that matters to a builder.
Tier bands
The 1–10 score maps to a named tier for quick scanning:
| Tier | Score | Meaning |
|---|---|---|
| Exceptional | 9.0–10 | Best-in-class free capacity with minimal strings attached. |
| Generous | 7.0–8.9 | A strong, genuinely useful free tier with manageable limits. |
| Solid | 5.0–6.9 | A workable free tier with real constraints to plan around. |
| Limited | 3.0–4.9 | Narrow free allowance or notable barriers; fine for trials. |
| Minimal | 1.0–2.9 | Token free tier — short trial, tiny pool, or heavy gating. |
What the score is not
- It is not a quality rating of the product. A provider can have an excellent paid product and a stingy free tier (low score), or a mediocre product and a remarkably generous free tier (high score). We measure the free offer, not the company.
- It is not a substitute for reading the limitations. Two providers can share a score for very different reasons. The typed limitations attached to each provider tell you which constraints apply to your use case.
- It is not static. Free tiers change. Each provider records whether its numbers were web-verified, and the score is recomputed when the underlying data is refreshed.
Where the numbers live
Every provider’s markdown frontmatter carries the full, inspectable assessment:
generosityScore— the calibrated 1–10 scoregenerosityRaw— the 0–100 compositegenerosityTier— the named tiergenerosityComponents— the five dimension sub-scores (0–100 each)limitations— the typed, exhaustive limitation list (see Free-Tier Conditions & Limitations)
The rubric itself — dimensions, weights, anchors, tier bands, and the limitation
taxonomy — is defined in code at src/data/generosity.ts as the single source of
truth shared by the catalog UI, the content schema, and this document.