MCP Server live — AI agents can now query 105M+ SEC facts. Connect your agent →
ValueinValuein
Schema

Parquet Schema Reference

Eleven ZSTD-compressed Parquet tables (8 core + 3 derived). 105M+ facts, 16,000+ entities, 1994–present. All tables use consistent join keys — cik for companies, accession_id for filings.

11
Tables
105M+
Facts
16,000+
Entities
ZSTD
Compression

Entity-Relationship Diagram

All tables link via two universal keys: cik (company) and accession_id (filing). The references table is a derived flat join — use it as your query starting point to avoid manual joins.

┌─────────────┐     ┌────────────────┐     ┌──────────────────────┐
│   entity    │     │    security    │     │   index_membership   │
│─────────────│     │────────────────│     │──────────────────────│
│ cik (PK)    │──┐  │ id (PK)        │  ┌──│ cik (FK → entity)    │
│ name        │  └──│ entity_id (FK) │  │  │ index_name           │
│ sector      │     │ symbol         │  │  │ effective_date       │
│ sic_code    │     │ exchange       │  │  │ removal_date         │
│ status      │     │ is_active      │  │  │ removal_reason       │
└─────────────┘     └────────────────┘  │  │ successor_cik        │
       │                                │  └──────────────────────┘
       │            ┌─────────────────┐ │           ▲
       └────────────│     filing      │ │           │
                    │─────────────────│ │           │
                    │ accession_id(PK)│ │           │
                    │ entity_id (FK)  │ │           │
                    │ form_type       │ │           │
                    │ filing_date     │ │           │
                    │ accepted_at     │ │           │
                    └─────────────────┘ │           │
                            │           │           │
                            └────────────┌──────────────────────┐
                                         │         fact         │
                                         │──────────────────────│
                                         │ fact_id (deterministic)│
                                         │ entity_id (FK)       │
                                         │ accession_id (FK)    │
                                         │ standard_concept     │
                                         │ numeric_value        │
                                         │ value_current        │
                                         │ value_as_filed       │
                                         │ accepted_at (PIT)    │
                                         └──────────────────────┘

 ┌──────────────────────────────────────────────────────────────┐
 │  references  (derived — start every cross-co query here)     │
 │──────────────────────────────────────────────────────────────│
 │  cik, name, sector, industry, sic_code, status, entity_type  │
 │  security_id, symbol, exchange, mic, is_active, valid_*      │
 │  figi, composite_figi, share_class_figi                      │
 │                                                              │
 │  Index membership? JOIN index_membership ON cik = cik        │
 │  (same column name both sides — migration 0015).             │
 └──────────────────────────────────────────────────────────────┘

 ┌──────────────────────────────────────────────────────────────┐
 │  Derived analytic tables (recomputed each pipeline run):     │
 │  ratio · factor_scores · earnings_signals                    │
 │  All key on entity_id (= references.cik) + period_end.       │
 └──────────────────────────────────────────────────────────────┘

Tables

entity

~16,000+ rowsPK: cik

One row per SEC-registered company. Contains company profile data: name, sector, SIC code, location, CEO, description, and status. The root table — all other tables link back via entity_id = entity.cik.

Tip: Always filter status = 'ACTIVE' unless you specifically need delisted or inactive companies.
32 fieldsBrowse in catalog

security

~18,000+ rowsPK: id

Exchange listings with SCD Type 2 history. Multiple rows per company — one per ticker/exchange combination. FIGI identifiers included for cross-system mapping.

Tip: Filter is_active = TRUE for current tickers. Use valid_from / valid_to for historical ticker reconstruction.
18 fieldsBrowse in catalog

filing

~800,000+ rowsPK: accession_id

SEC EDGAR filing index. One row per accepted filing. The bridge between companies and their financial facts. Every fact row carries this filing's accepted_at forward as its own PIT anchor.

Tip: Use form_type = '10-K' for annual data, '10-Q' for quarterly. Watch is_amendment for restated filings.
23 fieldsBrowse in catalog

fact

~105M+ rowsPK: entity_id + accession_id + standard_concept

The core financial data table. 105M+ XBRL-sourced data points normalized to ~150 standard_concept labels. Each row is a single financial value for a specific company, period, and concept.

Tip: Start with standard_concept (not raw concept) for cross-company queries. Use accepted_at for PIT filtering. Use COALESCE(derived_quarterly_value, numeric_value) for cash flow metrics.
32 fieldsBrowse in catalog

valuation

~50,000+ rowsPK: id

Pipeline-computed DCF and DDM intrinsic values. One row per company per valuation run. Includes WACC, growth rates, and margin of safety vs. market price.

Tip: Always check data_quality field. model_type = 'dcf' is the primary model. DDM only applies to dividend-paying companies.
26 fieldsBrowse in catalog

taxonomy_guide

~~150 rowsPK: standard_concept

Lookup table for standard_concept labels used in the fact table. Maps canonical names to human-readable labels, definitions, and GAAP taxonomy references.

Tip: Join fact on standard_concept to get human_name and definition for any concept.
6 fieldsBrowse in catalog

index_membership

~50,000+ rowsPK: id

Historical index constituent records — SP500, NASDAQ100, RUSSELL3000, WILSHIRE5000. One row per (cik, index, membership-period). Essential for point-in-time universe construction — prevents survivorship bias.

Tip: Filter removal_date IS NULL for current members. Use effective_date / removal_date for the half-open membership window. Keys on cik (since migration 0015) — JOIN with references on cik = cik.
14 fieldsBrowse in catalog

references

Start here
~18,000+ rowsPK: symbol + cik

Derived flat join of entity + security. One row per security. Eliminates the need for 2-table joins for company-level metadata. Always start here for cross-company queries.

Tip: is_active = TRUE for current tickers. For index membership (current OR historical), JOIN with index_membership on cik = cik — there is no is_sp500 flag (dropped 2026-05-02 because it was snapshot-only and single-index).
17 fieldsBrowse in catalog

ratio

~~1.2M rowsPK: entity_id + ratio_name + period_end

Pipeline-computed financial ratios per entity per fiscal period. Recomputed on each pipeline run (not PIT — use computed_at for freshness).

Tip: Filter by category (profitability | per_share | liquidity | leverage | efficiency | owner_earnings | valuation) for grouped screens. is_ttm = TRUE returns trailing-twelve-months rows.
10 fieldsBrowse in catalog

factor_scores

~16,000+ rowsPK: entity_id + period_end

Cross-sectional factor scores and percentile ranks per entity, computed from the latest two 10-K filings. 10 factors matching the SDK AlphaEngine built-in set.

Tip: _rank columns are PERCENT_RANK (0.0–1.0, 1.0 = best). composite_rank is the equal-weight average of all non-null factor ranks — the default sort key for screen_universe.
25 fieldsBrowse in catalog

earnings_signals

~~250K rowsPK: entity_id + period_end

Trend-based earnings expectations and surprise metrics per entity. eps_trend_est is a trailing 4-quarter average of EPSDiluted; eps_surprise_pct measures actual vs. trend.

Tip: Filter eps_surprise_pct IS NOT NULL for momentum screens — NULL when |trend_est| < 0.001. Use accepted_at for PIT filtering.
10 fieldsBrowse in catalog

Key Columns Guide

The most important columns across tables. Master these and you can write any query.

ColumnTableTypeNullableDescription
accepted_at
PIT
factTIMESTAMPTZNoPIT timestamp — when this fact became publicly known via SEC EDGAR acceptance. Use accepted_at <= trade_date for look-ahead-free backtests.
standard_concept
factVARCHARYesCanonical normalized concept name (e.g. Revenues, NetIncomeLoss). Use this — not raw concept — for cross-company queries.
numeric_value
factFLOATYesParsed financial value. NULL for non-numeric XBRL facts. Always use ABS() on capex; divide by NULLIF(denom, 0) on ratios.
derived_quarterly_value
factFLOATYesQ2/Q3 10-Q cash flows adjusted from YTD to single-quarter. Use COALESCE(derived_quarterly_value, numeric_value) for all cash flow metrics.
cik
index_membership / referencesVARCHARNoIndex membership join key. Same column name on both tables (since migration 0015) — JOIN index_membership ON references.cik = index_membership.cik. Filter index_name = 'SP500' AND removal_date IS NULL for the current S&P500.
is_active
references / securityBOOLEANNoTRUE if this ticker is currently active (valid_to IS NULL). Filter to avoid duplicate rows from ticker changes.
accession_id
filingVARCHARNoSEC EDGAR accession number — the unique identifier for a filing. Format: XXXXXXXXXX-YY-ZZZZZZ. The join key between filing and fact.
cik
entity / referencesVARCHARNoSEC Central Index Key — 10-digit company identifier. The universal join key across all tables. entity.cik = security.entity_id = filing.entity_id = fact.entity_id.

All Fields by Table

Quick-reference field list. For full descriptions and SEC XBRL tag mappings, use the Data Catalog.

entity

32 fields
FieldTypeDescription
cikVARCHARSEC Central Index Key — 10-digit unique company identifier
nameVARCHARLegal registered company name
leiVARCHARLegal Entity Identifier (ISO 17442 20-character code)
industryVARCHARIndustry classification description
sectorVARCHARBroad market sector (e.g. Technology, Healthcare, Financials)
sic_codeVARCHARStandard Industrial Classification code (4-digit)
sic_descriptionVARCHARFull SEC SIC industry label (e.g. 'Electronic Computers'). Companion to sic_code — use this when a human-readable string is more useful than the raw 4-digit code.
fiscal_year_endVARCHARFiscal year end as MMDD (e.g. 1231 = December 31)
locationVARCHARCompany headquarters location (city, state, country)
websiteVARCHARCompany primary website URL
ceoVARCHARChief Executive Officer name at most recent filing
founded_yearSMALLINTYear the company was founded
descriptionVARCHARCompany business description extracted from SEC filings
statusVARCHAREntity status: ACTIVE, INACTIVE, or DELISTED. Filter status != 'ACTIVE' to include all non-active companies
entity_typeVARCHARSEC-defined filer category (e.g. Non-accelerated Filer, Large Accelerated Filer)
einVARCHAREmployer Identification Number (IRS tax ID)
owner_orgVARCHARParent or controlling organization name
investor_websiteVARCHARInvestor relations website URL
categoryVARCHARSEC registrant category
state_of_incorporationVARCHARUS state or country of incorporation (2-letter code)
state_of_incorporation_descriptionVARCHARFull SEC label for the state-of-incorporation code (e.g. 'Delaware')
phoneVARCHARCompany contact phone number
business_addressVARCHARPrincipal registered business (HQ) address — single concatenated line. Renamed from 'address' in schema v2.1.0.
mailing_addressVARCHARMailing address (separate from HQ) — registered agent / correspondence target. Often differs from business_address for foreign issuers and trusts.
country_codeVARCHARISO country code of the issuer's primary location
is_foreignBOOLEANTRUE if the issuer is a non-US registrant (typically a 20-F / 40-F filer)
flagsVARCHARSEC-flagged status string (e.g. 'Shell', 'Delinquent'). NULL for clean issuers.
has_insider_transactionsBOOLEANSEC declares Form 4/5 insider activity exists for this issuer
is_insider_ownerBOOLEANSEC declares this CIK itself owns insider holdings
former_namesJSONSEC formerNames[] history — JSON array of {name, from, to} objects. Useful for ticker-history queries and merger lineage (e.g. FB → META). Empty array when the issuer has never renamed.
last_seen_atDATEDate this entity last appeared in any SEC filing
ingested_atTIMESTAMPTZTimestamp when Valuein first ingested this entity record

security

18 fields
FieldTypeDescription
idINTEGERSurrogate primary key for the security record
entity_idVARCHARForeign key to entity.cik — the issuing company
symbolVARCHARExchange ticker symbol
exchangeVARCHARStock exchange name (e.g. NASDAQ, NYSE)
micVARCHARMarket Identification Code (ISO 10383)
valid_fromDATESCD Type 2 start date — when this ticker became active on this exchange
valid_toDATESCD Type 2 end date — when this ticker was retired. NULL means currently active
is_activeBOOLEANDerived: TRUE when valid_to IS NULL. Use to filter current tickers only
figiVARCHARFinancial Instrument Global Identifier at share-class level (OpenFIGI)
composite_figiVARCHARComposite FIGI at the exchange level
share_class_figiVARCHARShare class FIGI
security_typeVARCHARInstrument type (e.g. Common Stock, ADR, ETF)
market_sectorVARCHARFIGI market sector classification
figi_tickerVARCHARTicker symbol as recorded in the OpenFIGI registry
figi_nameVARCHARSecurity name from the OpenFIGI registry
figi_exch_codeVARCHARExchange code from the OpenFIGI registry
is_primary_tickerBOOLEANTRUE for the ticker SEC's company_tickers.json marks as the entity's primary listing. Multi-share-class issuers (BRK-A/BRK-B, GOOG/GOOGL) emit multiple is_active=TRUE rows; default consumer queries filter is_primary_ticker=TRUE for one row per CIK.
ingested_atTIMESTAMPTZTimestamp when Valuein first ingested this security record

filing

23 fields
FieldTypeDescription
accession_idVARCHARSEC EDGAR accession number — unique filing identifier (format: XXXXXXXXXX-YY-ZZZZZZ)
entity_idVARCHARForeign key to entity.cik — the filing company
filing_dateDATEDate the SEC received and accepted the filing. Use this for PIT backtest filters — not report_date
form_typeVARCHARFiling type: 10-K (annual), 10-Q (quarterly), 20-F (foreign annual), 8-K (current report), etc.
filing_urlVARCHARFull URL to the filing document on SEC EDGAR
is_amendmentBOOLEANTRUE if this is an amended filing (e.g. 10-K/A, 10-Q/A)
amendment_noSMALLINTAmendment sequence number — 0 for the original filing, 1 for the first amendment, etc.
superseded_byVARCHARAccession-id of the filing that supersedes this one in the amendment chain (NULL when this is the latest version). Populated post-ingest.
report_dateDATEFiscal period end date covered by this filing. Using this as a PIT filter introduces look-ahead bias — use filing_date or accepted_at instead.
itemsVARCHAR[]Array of 8-K item numbers reported (e.g. ['1.01', '9.01']). NULL for non-8-K filings
sizeINTEGERFiling size in bytes (per SEC submissions feed)
is_xbrlBOOLEANSEC-declared: filing carries an XBRL exhibit
is_inline_xbrlBOOLEANSEC-declared: filing uses Inline XBRL (iXBRL)
is_xbrl_numericBOOLEANDistinct from is_xbrl/is_inline_xbrl — TRUE when the XBRL exhibit carries machine-readable numeric facts (the source of every fact-table row).
primary_documentVARCHARFilename of the lead document in the filing package
primary_doc_descriptionVARCHARSEC-provided label describing the primary document
core_typeVARCHARBase form_type stripped of the /A amendment suffix (e.g. 10-K/A → 10-K). Use this for form-family filtering when you don't care whether a filing is the original or an amendment.
file_numberVARCHARSEC EDGAR file number (e.g. 001-36743)
film_numberVARCHARSEC EDGAR microfiche / film number assigned at acceptance
actVARCHARSecurities-Act designator: '33' (1933 Act), '34' (1934 Act), or empty. Helps distinguish registration-statement vs. exchange-act filings.
is_auditedBOOLEANFiling is an annual (audited) report — TRUE for 10-K / 20-F and their amendments, FALSE for 10-Q / 8-K / etc.
accepted_atTIMESTAMPTZExact timestamp the SEC accepted the filing (millisecond precision). Every fact disclosed in this filing inherits the same accepted_at as its PIT anchor.
ingested_atTIMESTAMPTZTimestamp when Valuein ingested this filing record

fact

32 fields
FieldTypeDescription
fact_idVARCHARDeterministic identity hash: SHA-256(entity_id|accession_id|concept|period_end|unit). 64-character lowercase hex string. Use with verify_fact_lineage to trace any number to its source filing.
entity_idVARCHARForeign key to entity.cik
accession_idVARCHARForeign key to filing.accession_id
conceptVARCHARRaw XBRL tag as filed by the company (e.g. us-gaap:Revenues). Use standard_concept for cross-company comparisons.
standard_conceptVARCHARCanonical normalized concept label (e.g. Revenues, NetIncomeLoss). Maps 11,966 raw XBRL tags to ~150 standard concepts (95% coverage).
prioritySMALLINTStandardization priority — lower value = higher confidence in the concept mapping
labelVARCHARHuman-readable label from the company's XBRL filing context
numeric_valueDOUBLEThe value as reported in this specific filing (accession_id). NULL for non-numeric facts.
derived_quarterly_valueDOUBLEQuarterly value derived by subtracting prior YTD periods (Q2/Q3 10-Q cash flows are filed as YTD). Use COALESCE(derived_quarterly_value, numeric_value) for single-quarter cash flow metrics.
unitVARCHARUnit of measure: USD, shares, pure (dimensionless ratio), etc.
period_startDATEReporting period start date. NULL for instant/point-in-time facts (e.g. balance sheet items)
period_endDATEReporting period end date. Table is range-partitioned on this column by year.
fiscal_yearINTEGERFiscal year number (e.g. 2024). INT16 in the parquet — no cast needed for cross-table joins.
fiscal_periodVARCHARFiscal period label: FY (annual), Q1, Q2, Q3, Q4
frameVARCHARXBRL frame identifier linking this fact to a specific reporting period context
statement_typeVARCHARFinancial statement category: income, balance_sheet, cash_flow, dei, notes
confidence_scoreDOUBLEComposite data quality confidence metric (0.0–1.0). Use >= 0.70 for analytics.
is_negativeBOOLEANTRUE if the originally reported value was negative
is_auditedBOOLEANTRUE if this fact comes from an audited annual filing (10-K or 20-F)
is_estimatedBOOLEANTRUE if the value is estimated rather than directly reported
data_qualityVARCHARData quality flag: reported, estimated, or provisional
accepted_atTIMESTAMPTZPoint-in-time timestamp — when this fact became publicly known via SEC EDGAR. Filter accepted_at <= trade_date for look-ahead-free backtests.
ingested_atTIMESTAMPTZTimestamp when Valuein ingested this fact
reporting_currencyVARCHARISO-4217 currency code when the unit is monetary (e.g. USD, EUR, JPY). NULL for non-monetary facts (shares, ratios, days). Required for 20-F filers and any non-USD reporter.
period_span_daysINTEGERNumber of calendar days spanned by this reporting period
is_cumulativeBOOLEANTRUE if the value is cumulative year-to-date rather than for the period in isolation (common in 10-Q cash flow statements)
derivation_typeVARCHARMethod used to derive the quarterly value: ytd_subtraction, direct, or NULL if not derived
derivation_uncertainBOOLEANTRUE if the quarterly derivation has reduced confidence due to fiscal-period ambiguity
value_currentDOUBLEBloomberg Option-C: the latest-known value for this (entity_id, standard_concept, period_end, unit) — i.e. the value any analyst watching the tape today would see. Computed at export time via DuckDB window functions.
value_as_filedDOUBLEBloomberg Option-C: the originally-reported value (first-ever accepted_at in the partition). Compare with value_current to detect restatements.
first_filed_atTIMESTAMPTZTimestamp of first disclosure (min accepted_at in the partition)
restatedBOOLEANTRUE if any version of this fact in the partition differs from the others — signals that a later filing restated this number. Computed against the GLOBAL warehouse state, NOT PIT — for PIT reasoning use value_current vs. value_as_filed.

valuation

26 fields
FieldTypeDescription
idBIGINTSurrogate primary key for the valuation record
entity_idVARCHARForeign key to entity.cik — the company being valued
security_idINTEGERForeign key to security.id — the specific security valued (NULL if company-level only)
valuation_dateDATEThe fiscal period_end the valuation was computed against
model_typeVARCHARValuation model identifier. Live values: 'dcf' (earnings-based two-stage), 'dcf_fcf' (free-cash-flow-based two-stage), 'ddm' (dividend discount model). Multiple rows per (entity_id, valuation_date) are expected.
model_versionVARCHARVersion string of the valuation model algorithm (e.g. 1.0)
intrinsic_valueDOUBLETotal enterprise / equity value implied by the model (currency)
per_share_valueDOUBLEintrinsic_value / shares_outstanding — the model's fair-value estimate per share
margin_of_safetyDOUBLE(per_share_value - current_price) / per_share_value. >0 = undervalued.
valuation_labelVARCHARHuman-readable bucket — 'undervalued' / 'fairly_valued' / 'overvalued'
base_earningsDOUBLEDCF: trailing FCF base. DDM: trailing dividend base.
growth_rateDOUBLEStage-1 growth rate
terminal_rateDOUBLETerminal-stage growth rate
discount_rateDOUBLEWACC for DCF; cost of equity for DDM
growth_yearsINTEGERStage-1 horizon in years
terminal_yearsINTEGERYears modeled in the terminal phase before the perpetuity
shares_outstandingBIGINTDiluted shares outstanding used to convert total equity value to a per-share figure
current_priceDOUBLEMarket price used to compute margin_of_safety. NULL when the pipeline could not source a quote — margin_of_safety is then NULL too.
base_dividendDOUBLEDDM only: trailing dividend base
dividend_growth_rateDOUBLEDDM only: expected long-term dividend growth rate
payout_ratioDOUBLEDividend payout ratio (dividends ÷ earnings)
data_qualityVARCHARPipeline-assigned label — 'reported' | 'estimated' | 'low_confidence'
source_periodVARCHARWhich fiscal period (e.g. FY2024, TTM) supplied base_earnings
assumptionsJSONFull input bundle for the model — base_earnings, discount_rate, growth_rate, terminal_growth_rate, etc. Use to reproduce the calc.
created_atTIMESTAMPTZTimestamp when this valuation record was first computed
updated_atTIMESTAMPTZTimestamp when this valuation record was last recomputed (ON CONFLICT DO UPDATE)

taxonomy_guide

6 fields
FieldTypeDescription
standard_conceptVARCHARCanonical concept name (primary key) — matches fact.standard_concept for joins
human_nameVARCHARHuman-readable display name for the concept (e.g. Total Revenue)
definitionVARCHARPlain-English definition of what this financial concept measures
unit_typeVARCHARExpected unit of measure: monetary, shares, ratio, pure, etc.
balance_typeVARCHARAccounting balance type: debit or credit
source_referenceVARCHARUS-GAAP taxonomy reference identifier for this concept

index_membership

14 fields
FieldTypeDescription
idINTEGERSurrogate primary key
cikVARCHARForeign key to entity.cik — the index constituent. JOIN with references on cik = cik (same column name on both sides since migration 0015).
index_nameVARCHARCanonical short code: SP500 | NASDAQ100 | RUSSELL3000 | WILSHIRE5000
effective_dateDATEFirst trading day the company traded as a member
announcement_dateDATEDate S&P Dow Jones Indices publicly announced the addition. Typically 5–10 trading days before effective_date. NULL on rows from Wikipedia/GitHub seed/IVV daily layers — only the S&P press release scraper populates this.
removal_dateDATEFirst trading day the company was NOT a member. NULL = current. [) interval semantics — a company removed on 2017-06-19 is NOT a member on 2017-06-19.
removal_announcement_dateDATEDate the removal was publicly announced; symmetric with announcement_date. NULL when only Wikipedia/IVV layers wrote the row.
removal_reasonVARCHARmerger | acquisition | bankruptcy | criteria | spinoff | other. NULL when removal_date IS NULL.
successor_cikVARCHARAcquirer CIK when removal_reason in {merger, acquisition}. Essential for backtests crossing M&A — without it, a stock disappearing looks indistinguishable from a delisting.
sourceVARCHARProvenance: wikipedia | ivv | spglobal_press | github_seed | manual
confidenceVARCHARhigh | medium | low. Used by get_pit_universe to set pit_safe automatically — pre-2000 dates default to 'low'.
notesVARCHARFree-text annotation
ingested_atTIMESTAMPTZTimestamp when Valuein first ingested this membership row
updated_atTIMESTAMPTZTimestamp when this membership row was last updated

references

17 fields
FieldTypeDescription
cikVARCHARSEC Central Index Key — 10-digit unique company identifier (from entity)
nameVARCHARLegal registered company name (from entity)
sectorVARCHARBroad market sector (from entity)
industryVARCHARIndustry classification description (from entity)
sic_codeVARCHARStandard Industrial Classification code 4-digit (from entity)
statusVARCHAREntity status: ACTIVE, INACTIVE, or DELISTED (from entity)
entity_typeVARCHARSEC-defined filer category (from entity)
security_idINTEGERSurrogate PK of the security record (from security)
symbolVARCHARExchange ticker symbol (from security)
exchangeVARCHARStock exchange name e.g. NASDAQ, NYSE (from security)
micVARCHARMarket Identification Code ISO 10383 (from security)
valid_fromDATESCD Type 2 start date — when this ticker became active (from security)
valid_toDATESCD Type 2 end date — NULL means currently active (from security)
is_activeBOOLEANTRUE when valid_to IS NULL — use to filter current tickers only (from security). For index membership, JOIN index_membership ON references.cik = index_membership.cik — there is no is_sp500 flag (dropped 2026-05-02 because it was snapshot-only and single-index).
figiVARCHARFinancial Instrument Global Identifier share-class level (from security)
composite_figiVARCHARComposite FIGI at the exchange level (from security)
share_class_figiVARCHARShare class FIGI (from security)

ratio

10 fields
FieldTypeDescription
entity_idVARCHARForeign key to entity.cik
ratio_nameVARCHARRatio identifier (e.g. roe, gross_margin, debt_to_equity)
categoryVARCHARGroups ratios for filtering: profitability | per_share | liquidity | leverage | efficiency | owner_earnings | valuation
valueDOUBLENumeric value of the ratio. Inspect `unit` for the scale (percent vs decimal vs multiple).
unitVARCHARUnit: percent | per_share | pure | x
period_endDATEReporting period end date the ratio was computed for
fiscal_yearINTEGERFiscal year of the underlying period
fiscal_periodVARCHARFY | Q1 | Q2 | Q3 | Q4 | TTM
is_ttmBOOLEANTRUE for trailing-twelve-months rows
computed_atTIMESTAMPTZPipeline-run timestamp the ratio was last computed (NOT a PIT field — recomputed on every pipeline run via ON CONFLICT DO UPDATE).

factor_scores

25 fields
FieldTypeDescription
entity_idVARCHARForeign key to entity.cik
period_endDATEReporting period end date — derived from the latest two 10-K filings
fiscal_yearINTEGERFiscal year of the underlying period
accepted_atTIMESTAMPTZSource filing's SEC acceptance timestamp — use for PIT filtering
roeDOUBLEReturn on equity
gross_marginDOUBLEGross margin
operating_marginDOUBLEOperating margin
net_profit_marginDOUBLENet profit margin
revenue_growth_yoyDOUBLEYear-over-year revenue growth
fcf_to_assetsDOUBLEFree cash flow / total assets
debt_to_equityDOUBLEDebt-to-equity ratio (raw value — _rank column carries the cross-sectional rank)
asset_turnoverDOUBLEAsset turnover (revenue / total assets)
current_ratioDOUBLECurrent assets / current liabilities
piotroski_f_scoreINTEGERPiotroski F-Score (0–9) — financial-strength composite
roe_rankDOUBLECross-sectional PERCENT_RANK of roe (0.0–1.0; 1.0 = best)
gross_margin_rankDOUBLECross-sectional PERCENT_RANK of gross_margin
operating_margin_rankDOUBLECross-sectional PERCENT_RANK of operating_margin
net_profit_margin_rankDOUBLECross-sectional PERCENT_RANK of net_profit_margin
revenue_growth_yoy_rankDOUBLECross-sectional PERCENT_RANK of revenue_growth_yoy
fcf_to_assets_rankDOUBLECross-sectional PERCENT_RANK of fcf_to_assets
debt_to_equity_rankDOUBLECross-sectional PERCENT_RANK of debt_to_equity (inverted — lower leverage ranks higher)
asset_turnover_rankDOUBLECross-sectional PERCENT_RANK of asset_turnover
current_ratio_rankDOUBLECross-sectional PERCENT_RANK of current_ratio
piotroski_f_score_rankDOUBLECross-sectional PERCENT_RANK of piotroski_f_score
composite_rankDOUBLEEqual-weight average of all non-null factor _rank columns. The default sort key for screen_universe.

earnings_signals

10 fields
FieldTypeDescription
entity_idVARCHARForeign key to entity.cik
period_endDATEReporting period end date
fiscal_yearINTEGERFiscal year of the underlying period
fiscal_periodVARCHARFY | Q1 | Q2 | Q3 | Q4
accepted_atTIMESTAMPTZSource filing's SEC acceptance timestamp — use for PIT filtering
eps_actualDOUBLEReported diluted EPS for the period
eps_trend_estDOUBLETrailing 4-quarter average of EPSDiluted. NULL when fewer than 4 prior quarters exist.
eps_surprise_pctDOUBLE(eps_actual - eps_trend_est) / |eps_trend_est|. NULL when |eps_trend_est| < 0.001 (avoids divide-by-zero blow-ups). Filter IS NOT NULL for momentum screens.
revenue_actualDOUBLEReported revenue for the period
revenue_yoy_pctDOUBLELatest quarter revenue vs. 4 quarters prior

DuckDB Usage

All Parquet files load natively with DuckDB's read_parquet(). DuckDB handles ZSTD decompression automatically and supports lazy projection — only the columns you SELECT are read from disk.

-- Load a table directly from disk
SELECT *
FROM read_parquet('fact.parquet')
WHERE standard_concept = 'Revenues'
  AND accepted_at <= '2023-01-01'
LIMIT 10;

-- Query directly from the API URL (presigned or streamed).
-- Membership goes through index_membership — references no longer
-- carries an is_sp500 flag (dropped 2026-05-02).
SELECT r.symbol, fa.numeric_value / 1e9 AS revenue_billions
FROM read_parquet('https://data.valuein.biz/v1/sp500/references') r
JOIN read_parquet('https://data.valuein.biz/v1/sp500/index_membership') im
  ON im.cik = r.cik
JOIN read_parquet('https://data.valuein.biz/v1/sp500/fact') fa
  ON fa.entity_id = r.cik
WHERE im.index_name = 'SP500'
  AND im.removal_date IS NULL
  AND fa.standard_concept = 'Revenues'
  AND fa.accepted_at <= '2024-01-01'
ORDER BY revenue_billions DESC
LIMIT 20;

ZSTD compression

All files are ZSTD-compressed. Typical compression ratio is 6–10x vs uncompressed. DuckDB decompresses transparently.

Column pruning

SELECT only the columns you need — DuckDB reads only those column chunks from the Parquet file, not the full table.

Predicate pushdown

DuckDB pushes WHERE clauses into Parquet row group statistics, skipping groups that cannot contain matching rows.

Explore the full field catalog

Search and filter all fields with SEC XBRL tag mappings, types, and example values in the interactive Data Catalog.