MCP Server live — AI agents can now query 105M+ SEC facts. Connect your agent →
ValueinValuein
Bulk Data API

Bulk Data API Reference

Parquet-first edge gateway for bulk financial data. All endpoints stream ZSTD-compressed Parquet files — load directly into DuckDB, Pandas, or Polars.

https://data.valuein.biz
Live

Authentication

Authenticated endpoints require a Bearer token in the Authorization header. Tokens are provisioned automatically when you subscribe via Stripe. The /v1/sample/* endpoints are always public — no token required.

Unauthenticated (sample tier)

$ curl https://data.valuein.biz/v1/sample/entity \
    --output entity.parquet

Authenticated (sp500 / full tier)

$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer YOUR_TOKEN" \
    --output fact.parquet

Check your token plan

$ curl https://data.valuein.biz/v1/me \
    -H "Authorization: Bearer YOUR_TOKEN"
{
  "plan":   "sp500",
  "status": "active",
  "email":  "[email protected]"
}

Response Format

Data endpoints return raw Parquet bytes. The Content-Type is application/octet-stream. Files are ZSTD-compressed — DuckDB, Pandas, and Polars decompress automatically via read_parquet(). Non-data endpoints return application/json.

Response TypeContent-TypeEndpoints
Parquet streamapplication/octet-stream/v1/sample/*, /v1/sp500/*, /v1/full/*
JSONapplication/json/health, /v1/me, /v1/manifest, /v1/usage

Plans

Your token's plan determines which bucket you can access. A higher plan grants access to all lower tiers as well.

PlanAuth RequiredBucketCoverage
sample
NoR2_SAMPLEPublic 5-year S&P500 slice
sp500
YesR2_SP500Full S&P500 history 1994–present
pro
YesR2_PROActive + delisted US universe (~18,000 entities), 30-year history (1995→present)
full
YesR2_FULLInstitutional tier: US + foreign issuers, 1990→present, intraday accepted_at, webhooks, redistribution license

Endpoints

GET/health
Public
System

Liveness check. Returns 200 OK with service status.

Example Response
{ "status": "ok", "ts": "2026-04-11T00:00:00.000Z" }
GET/v1/me
Bearer token
Auth

Returns token metadata: plan, status, email, and token prefix.

Example Response
{ "plan": "sp500", "status": "active", "email": "[email protected]" }
GET/v1/manifest
Bearer token
Discovery

Returns available tables and last snapshot timestamp for your plan tier.

Example Response
{ "snapshot": "snapshot_20260411", "last_updated": "2026-04-11T00:00:00Z", "tables": [...] }
GET/v1/sample/manifest
Public
Sample (No Auth)

Public sample tier manifest — no token required. Includes upgrade CTA.

Example Response
{ "snapshot": "snapshot_20260411", "tables": [...], "upgrade_url": "/pricing" }
GET/v1/sample/{table}
Public
Sample (No Auth)

Parquet stream from the public sample bucket (5-year S&P500 slice). No token required. Valid tables: entity, security, filing, fact, valuation, taxonomy_guide, index_membership, references, ratio, factor_scores, earnings_signals.

Example Response
application/octet-stream  raw Parquet bytes
GET/v1/sp500/{table}
Bearer token
Data

Parquet stream from the S&P500 bucket. Requires sp500 or full plan token. Full history, 500+ tickers.

Example Response
application/octet-stream  raw Parquet bytes
GET/v1/full/{table}
Bearer token
Data

Parquet stream from the Institutional bucket. Requires `full` plan token. ~18,000 US-listed entities plus foreign issuers, active + delisted, 1990→present.

Example Response
application/octet-stream  raw Parquet bytes
GET/v1/usage
Bearer token
Analytics

Returns daily API call counts, error rates, and per-table breakdowns for the last N days (default 7, max 30).

Query Parameters

daysNumber of days to return (1–30). Defaults to 7.
Example Response
{ "period_days": 7, "total_calls": 1420, "error_rate": 0.012, "daily": [...], "table_breakdown": {...} }

Python Example

Download a Parquet table and query it locally with DuckDB in under 10 lines.

import duckdb
import requests

token = "YOUR_TOKEN"
url   = "https://data.valuein.biz/v1/sp500/fact"

r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, stream=True)
r.raise_for_status()

with open("fact.parquet", "wb") as f:
    for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

conn = duckdb.connect()
df   = conn.execute(
    "SELECT * FROM read_parquet('fact.parquet') LIMIT 5"
).df()
print(df)

Available Tables

Eight tables cover the full schema. Pass any table name as the {table} path segment. See Parquet Schema Reference for full field definitions.

TableDescription
entityCompany profiles: name, sector, SIC code, location, CEO, founding year, description. One row per CIK.
securityExchange listings: ticker, exchange, FIGI, valid date range (SCD Type 2). Multiple rows per company.
filingSEC EDGAR filing index: accession ID, form type, filing date, acceptance timestamp. Links entity to facts.
fact105M+ financial data points: XBRL concept values with accepted_at timestamps for PIT accuracy.
valuationPipeline-computed DCF and DDM intrinsic values with WACC and growth rate assumptions.
taxonomy_guideMapping of ~150 standard_concept labels to raw XBRL tags and human-readable descriptions.
index_membershipHistorical index constituents (SP500, NASDAQ100, RUSSELL3000, WILSHIRE5000) with effective_date / removal_date for PIT universe construction. Keys on cik (since migration 0015).
referencesDerived flat join of entity + security. One row per security. Start here for cross-company queries; JOIN index_membership on cik = cik for membership filters.
ratioPipeline-computed financial ratios per entity per fiscal period (recomputed on every pipeline run; not PIT). Filter by category for grouped screens.
factor_scoresCross-sectional factor scores and percentile ranks from the latest two 10-K filings. 10 factors + composite_rank.
earnings_signalsTrend-based earnings expectations and surprise metrics — eps_actual vs. trailing 4-quarter eps_trend_est.

Manifest Response

Call GET /v1/manifest to discover available tables and the current snapshot timestamp for your plan. Check this before downloading tables to detect updates.

{
  "snapshot":     "snapshot_20260411",
  "last_updated": "2026-04-11T00:00:00Z",
  "tables": ["entity", "security", "filing", "fact",
             "valuation", "taxonomy_guide",
             "index_membership", "references",
             "ratio", "factor_scores", "earnings_signals"]
}

Rate Limits & Retries

Limits are per Bearer token and enforced at the Cloudflare edge. Every response includes the standard rate-limit headers — read them before retrying so your client never busy-waits.

Response headers on every request

X-RateLimit-Limit:     120
X-RateLimit-Remaining: 117
X-RateLimit-Reset:     1735680000
Retry-After:           42

Retry-After appears only on 429 responses (seconds). X-RateLimit-Reset is the Unix epoch when the window rolls over.

Recommended retry policy

  • 429 — sleep for Retry-After seconds, then retry. Never retry sooner.
  • 503 — exponential backoff: 1s, 2s, 4s. Cap at 3 retries.
  • 5xx other — retry once after 1s, then surface the error.
  • 4xx other — never retry. Fix the request.

curl with retry-aware streaming

# --retry honors Retry-After on 429 and exponentially backs off on 5xx.
# --retry-max-time bounds total elapsed time; --retry-connrefused covers
# the cold-start case after a cache eviction.
$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer $VALUEIN_TOKEN" \
    --retry 5 --retry-max-time 120 \
    --retry-connrefused \
    --output fact.parquet

Parquet responses are a single binary stream — no pagination, no cursor tokens. The gateway sets Content-Length on every data response so clients can show progress and pre-allocate buffers. For incremental reads, query the latest snapshot from /v1/manifest and download only when the timestamp changes.

Bulk Data API vs MCP Server vs Python SDK

Three channels, one Bearer token, same warehouse. Pick by access pattern.

ChannelBest forReturnsLatency
Bulk Data APILoading full tables into DuckDB, Spark, or a warehouse. Periodic syncs.ZSTD Parquet (full table)Edge stream · MB-scale
MCP ServerSingle-fact lookups from AI agents. Conversational queries.JSON tool responsesSub-100ms typical
Python SDKDataFrame-shaped queries from notebooks and scripts.DuckDB over R2 Parquet · as-filed vs latest restatement columnsLocal DuckDB · ms

Use the SDK first if you're writing Python — it wraps this API and the MCP server with sensible defaults. Use the raw Bulk Data API when you're in a non-Python stack or building a partner integration.

Error Codes

StatusMeaningCommon Cause
200 OKSuccessRequest succeeded. Parquet bytes or JSON body in the response.
400 Bad RequestInvalid tableThe table name in the path is not in the valid tables list. Check spelling and trailing slashes.
401 UnauthorizedMissing or invalid tokenNo Authorization header, malformed Bearer token, or token not found in KV store.
403 ForbiddenPlan too lowYour token exists but its plan does not grant access to this bucket (e.g. sample token accessing /v1/sp500/).
429 Too Many RequestsRate limit exceededYou have exceeded your daily request quota. Resets at UTC midnight. Upgrade to a higher plan for higher limits.
503 Service UnavailableSnapshot loadingThe R2 snapshot is being refreshed. Retry after 30–60 seconds. This is rare and brief.

Get your API token

Subscribe to the S&P500 or Full plan to receive a Bearer token instantly. The sample tier is always free — no credit card required.