Bulk Data API

Bulk Data API Reference

Parquet-first edge gateway for bulk financial data. All endpoints stream ZSTD-compressed Parquet files — load directly into DuckDB, Pandas, or Polars.

https://data.valuein.biz

Live

Authentication

Authenticated endpoints require a Bearer token in the Authorization header. Tokens are provisioned automatically when you subscribe via Stripe. The /v1/sample/* endpoints are always public — no token required.

Unauthenticated (sample tier)

$ curl https://data.valuein.biz/v1/sample/entity \
    --output entity.parquet

Authenticated (sp500 / full tier)

$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer YOUR_TOKEN" \
    --output fact.parquet

Check your token plan

$ curl https://data.valuein.biz/v1/me \
    -H "Authorization: Bearer YOUR_TOKEN"

{
  "plan":   "sp500",
  "status": "active",
  "email":  "[email protected]"
}

Response Format

Data endpoints return raw Parquet bytes. The Content-Type is application/octet-stream. Files are ZSTD-compressed — DuckDB, Pandas, and Polars decompress automatically via read_parquet(). Non-data endpoints return application/json.

Response Type	Content-Type	Endpoints
Parquet stream	application/octet-stream	/v1/sample/, /v1/sp500/, /v1/full/*
JSON	application/json	/health, /v1/me, /v1/manifest, /v1/usage

Plans

Your token's plan determines which bucket you can access. A higher plan grants access to all lower tiers as well.

Plan	Auth Required	Bucket	Coverage
sample	No	R2_SAMPLE	Public 5-year S&P500 slice
sp500	Yes	R2_SP500	Full S&P500 history 1994–present
pro	Yes	R2_PRO	Active + delisted US universe (~18,000 entities), 30-year history (1995→present)
full	Yes	R2_FULL	Institutional tier: US + foreign issuers, 1990→present, intraday accepted_at, webhooks, redistribution license

Endpoints

GET/health

Public

System

Liveness check. Returns 200 OK with service status.

Example Response

{ "status": "ok", "ts": "2026-04-11T00:00:00.000Z" }

GET/v1/me

Bearer token

Auth

Returns token metadata: plan, status, email, and token prefix.

Example Response

{ "plan": "sp500", "status": "active", "email": "[email protected]" }

GET/v1/manifest

Bearer token

Discovery

Returns available tables and last snapshot timestamp for your plan tier.

Example Response

{ "snapshot": "snapshot_20260411", "last_updated": "2026-04-11T00:00:00Z", "tables": [...] }

GET/v1/sample/manifest

Public

Sample (No Auth)

Public sample tier manifest — no token required. Includes upgrade CTA.

Example Response

{ "snapshot": "snapshot_20260411", "tables": [...], "upgrade_url": "/pricing" }

GET/v1/sample/{table}

Public

Sample (No Auth)

Parquet stream from the public sample bucket (5-year S&P500 slice). No token required. Valid tables: entity, security, filing, fact, valuation, taxonomy_guide, index_membership, references, ratio, factor_scores, earnings_signals.

Example Response

application/octet-stream — raw Parquet bytes

GET/v1/sp500/{table}

Bearer token

Data

Parquet stream from the S&P500 bucket. Requires sp500 or full plan token. Full history, 500+ tickers.

Example Response

application/octet-stream — raw Parquet bytes

GET/v1/full/{table}

Bearer token

Data

Parquet stream from the Institutional bucket. Requires `full` plan token. ~18,000 US-listed entities plus foreign issuers, active + delisted, 1990→present.

Example Response

application/octet-stream — raw Parquet bytes

GET/v1/usage

Bearer token

Analytics

Returns daily API call counts, error rates, and per-table breakdowns for the last N days (default 7, max 30).

Query Parameters

daysNumber of days to return (1–30). Defaults to 7.

Example Response

{ "period_days": 7, "total_calls": 1420, "error_rate": 0.012, "daily": [...], "table_breakdown": {...} }

Python Example

Download a Parquet table and query it locally with DuckDB in under 10 lines.

import duckdb
import requests

token = "YOUR_TOKEN"
url   = "https://data.valuein.biz/v1/sp500/fact"

r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, stream=True)
r.raise_for_status()

with open("fact.parquet", "wb") as f:
    for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

conn = duckdb.connect()
df   = conn.execute(
    "SELECT * FROM read_parquet('fact.parquet') LIMIT 5"
).df()
print(df)

Available Tables

Eight tables cover the full schema. Pass any table name as the {table} path segment. See Parquet Schema Reference for full field definitions.

Table	Description
`entity`	Company profiles: name, sector, SIC code, location, CEO, founding year, description. One row per CIK.
`security`	Exchange listings: ticker, exchange, FIGI, valid date range (SCD Type 2). Multiple rows per company.
`filing`	SEC EDGAR filing index: accession ID, form type, filing date, acceptance timestamp. Links entity to facts.
`fact`	105M+ financial data points: XBRL concept values with accepted_at timestamps for PIT accuracy.
`valuation`	Pipeline-computed DCF and DDM intrinsic values with WACC and growth rate assumptions.
`taxonomy_guide`	Mapping of ~150 standard_concept labels to raw XBRL tags and human-readable descriptions.
`index_membership`	Historical index constituents (SP500, NASDAQ100, RUSSELL3000, WILSHIRE5000) with effective_date / removal_date for PIT universe construction. Keys on cik (since migration 0015).
`references`	Derived flat join of entity + security. One row per security. Start here for cross-company queries; JOIN index_membership on cik = cik for membership filters.
`ratio`	Pipeline-computed financial ratios per entity per fiscal period (recomputed on every pipeline run; not PIT). Filter by category for grouped screens.
`factor_scores`	Cross-sectional factor scores and percentile ranks from the latest two 10-K filings. 10 factors + composite_rank.
`earnings_signals`	Trend-based earnings expectations and surprise metrics — eps_actual vs. trailing 4-quarter eps_trend_est.

Manifest Response

Call GET /v1/manifest to discover available tables and the current snapshot timestamp for your plan. Check this before downloading tables to detect updates.

{
  "snapshot":     "snapshot_20260411",
  "last_updated": "2026-04-11T00:00:00Z",
  "tables": ["entity", "security", "filing", "fact",
             "valuation", "taxonomy_guide",
             "index_membership", "references",
             "ratio", "factor_scores", "earnings_signals"]
}

Rate Limits & Retries

Limits are per Bearer token and enforced at the Cloudflare edge. Every response includes the standard rate-limit headers — read them before retrying so your client never busy-waits.

Response headers on every request

X-RateLimit-Limit:     120
X-RateLimit-Remaining: 117
X-RateLimit-Reset:     1735680000
Retry-After:           42

Retry-After appears only on 429 responses (seconds). X-RateLimit-Reset is the Unix epoch when the window rolls over.

Recommended retry policy

429 — sleep for Retry-After seconds, then retry. Never retry sooner.
503 — exponential backoff: 1s, 2s, 4s. Cap at 3 retries.
5xx other — retry once after 1s, then surface the error.
4xx other — never retry. Fix the request.

curl with retry-aware streaming

# --retry honors Retry-After on 429 and exponentially backs off on 5xx.
# --retry-max-time bounds total elapsed time; --retry-connrefused covers
# the cold-start case after a cache eviction.
$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer $VALUEIN_TOKEN" \
    --retry 5 --retry-max-time 120 \
    --retry-connrefused \
    --output fact.parquet

Parquet responses are a single binary stream — no pagination, no cursor tokens. The gateway sets Content-Length on every data response so clients can show progress and pre-allocate buffers. For incremental reads, query the latest snapshot from /v1/manifest and download only when the timestamp changes.

Bulk Data API vs MCP Server vs Python SDK

Three channels, one Bearer token, same warehouse. Pick by access pattern.

Channel	Best for	Returns	Latency
Bulk Data API	Loading full tables into DuckDB, Spark, or a warehouse. Periodic syncs.	ZSTD Parquet (full table)	Edge stream · MB-scale
MCP Server	Single-fact lookups from AI agents. Conversational queries.	JSON tool responses	Sub-100ms typical
Python SDK	DataFrame-shaped queries from notebooks and scripts.	DuckDB over R2 Parquet · as-filed vs latest restatement columns	Local DuckDB · ms

Use the SDK first if you're writing Python — it wraps this API and the MCP server with sensible defaults. Use the raw Bulk Data API when you're in a non-Python stack or building a partner integration.

Error Codes

Status	Meaning	Common Cause
200 OK	Success	Request succeeded. Parquet bytes or JSON body in the response.
400 Bad Request	Invalid table	The table name in the path is not in the valid tables list. Check spelling and trailing slashes.
401 Unauthorized	Missing or invalid token	No Authorization header, malformed Bearer token, or token not found in KV store.
403 Forbidden	Plan too low	Your token exists but its plan does not grant access to this bucket (e.g. sample token accessing /v1/sp500/).
429 Too Many Requests	Rate limit exceeded	You have exceeded your daily request quota. Resets at UTC midnight. Upgrade to a higher plan for higher limits.
503 Service Unavailable	Snapshot loading	The R2 snapshot is being refreshed. Retry after 30–60 seconds. This is rare and brief.

Get your API token

Subscribe to the S&P500 or Full plan to receive a Bearer token instantly. The sample tier is always free — no credit card required.

Python SDK View pricing