Parquet Schema Reference
Eleven ZSTD-compressed Parquet tables (8 core + 3 derived). 105M+ facts, 16,000+ entities, 1994–present. All tables use consistent join keys — cik for companies, accession_id for filings.
Entity-Relationship Diagram
All tables link via two universal keys: cik (company) and accession_id (filing). The references table is a derived flat join — use it as your query starting point to avoid manual joins.
┌─────────────┐ ┌────────────────┐ ┌──────────────────────┐
│ entity │ │ security │ │ index_membership │
│─────────────│ │────────────────│ │──────────────────────│
│ cik (PK) │──┐ │ id (PK) │ ┌──│ cik (FK → entity) │
│ name │ └──│ entity_id (FK) │ │ │ index_name │
│ sector │ │ symbol │ │ │ effective_date │
│ sic_code │ │ exchange │ │ │ removal_date │
│ status │ │ is_active │ │ │ removal_reason │
└─────────────┘ └────────────────┘ │ │ successor_cik │
│ │ └──────────────────────┘
│ ┌─────────────────┐ │ ▲
└────────────│ filing │ │ │
│─────────────────│ │ │
│ accession_id(PK)│ │ │
│ entity_id (FK) │ │ │
│ form_type │ │ │
│ filing_date │ │ │
│ accepted_at │ │ │
└─────────────────┘ │ │
│ │ │
└────────────┌──────────────────────┐
│ fact │
│──────────────────────│
│ fact_id (deterministic)│
│ entity_id (FK) │
│ accession_id (FK) │
│ standard_concept │
│ numeric_value │
│ value_current │
│ value_as_filed │
│ accepted_at (PIT) │
└──────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ references (derived — start every cross-co query here) │
│──────────────────────────────────────────────────────────────│
│ cik, name, sector, industry, sic_code, status, entity_type │
│ security_id, symbol, exchange, mic, is_active, valid_* │
│ figi, composite_figi, share_class_figi │
│ │
│ Index membership? JOIN index_membership ON cik = cik │
│ (same column name both sides — migration 0015). │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Derived analytic tables (recomputed each pipeline run): │
│ ratio · factor_scores · earnings_signals │
│ All key on entity_id (= references.cik) + period_end. │
└──────────────────────────────────────────────────────────────┘Tables
entity
cikOne row per SEC-registered company. Contains company profile data: name, sector, SIC code, location, CEO, description, and status. The root table — all other tables link back via entity_id = entity.cik.
security
idExchange listings with SCD Type 2 history. Multiple rows per company — one per ticker/exchange combination. FIGI identifiers included for cross-system mapping.
filing
accession_idSEC EDGAR filing index. One row per accepted filing. The bridge between companies and their financial facts. Every fact row carries this filing's accepted_at forward as its own PIT anchor.
fact
entity_id + accession_id + standard_conceptThe core financial data table. 105M+ XBRL-sourced data points normalized to ~150 standard_concept labels. Each row is a single financial value for a specific company, period, and concept.
valuation
idPipeline-computed DCF and DDM intrinsic values. One row per company per valuation run. Includes WACC, growth rates, and margin of safety vs. market price.
taxonomy_guide
standard_conceptLookup table for standard_concept labels used in the fact table. Maps canonical names to human-readable labels, definitions, and GAAP taxonomy references.
index_membership
idHistorical index constituent records — SP500, NASDAQ100, RUSSELL3000, WILSHIRE5000. One row per (cik, index, membership-period). Essential for point-in-time universe construction — prevents survivorship bias.
references
symbol + cikDerived flat join of entity + security. One row per security. Eliminates the need for 2-table joins for company-level metadata. Always start here for cross-company queries.
ratio
entity_id + ratio_name + period_endPipeline-computed financial ratios per entity per fiscal period. Recomputed on each pipeline run (not PIT — use computed_at for freshness).
factor_scores
entity_id + period_endCross-sectional factor scores and percentile ranks per entity, computed from the latest two 10-K filings. 10 factors matching the SDK AlphaEngine built-in set.
earnings_signals
entity_id + period_endTrend-based earnings expectations and surprise metrics per entity. eps_trend_est is a trailing 4-quarter average of EPSDiluted; eps_surprise_pct measures actual vs. trend.
Key Columns Guide
The most important columns across tables. Master these and you can write any query.
| Column | Table | Type | Nullable | Description |
|---|---|---|---|---|
accepted_atPIT | fact | TIMESTAMPTZ | No | PIT timestamp — when this fact became publicly known via SEC EDGAR acceptance. Use accepted_at <= trade_date for look-ahead-free backtests. |
standard_concept | fact | VARCHAR | Yes | Canonical normalized concept name (e.g. Revenues, NetIncomeLoss). Use this — not raw concept — for cross-company queries. |
numeric_value | fact | FLOAT | Yes | Parsed financial value. NULL for non-numeric XBRL facts. Always use ABS() on capex; divide by NULLIF(denom, 0) on ratios. |
derived_quarterly_value | fact | FLOAT | Yes | Q2/Q3 10-Q cash flows adjusted from YTD to single-quarter. Use COALESCE(derived_quarterly_value, numeric_value) for all cash flow metrics. |
cik | index_membership / references | VARCHAR | No | Index membership join key. Same column name on both tables (since migration 0015) — JOIN index_membership ON references.cik = index_membership.cik. Filter index_name = 'SP500' AND removal_date IS NULL for the current S&P500. |
is_active | references / security | BOOLEAN | No | TRUE if this ticker is currently active (valid_to IS NULL). Filter to avoid duplicate rows from ticker changes. |
accession_id | filing | VARCHAR | No | SEC EDGAR accession number — the unique identifier for a filing. Format: XXXXXXXXXX-YY-ZZZZZZ. The join key between filing and fact. |
cik | entity / references | VARCHAR | No | SEC Central Index Key — 10-digit company identifier. The universal join key across all tables. entity.cik = security.entity_id = filing.entity_id = fact.entity_id. |
All Fields by Table
Quick-reference field list. For full descriptions and SEC XBRL tag mappings, use the Data Catalog.
entity
32 fields| Field | Type | Description |
|---|---|---|
cik | VARCHAR | SEC Central Index Key — 10-digit unique company identifier |
name | VARCHAR | Legal registered company name |
lei | VARCHAR | Legal Entity Identifier (ISO 17442 20-character code) |
industry | VARCHAR | Industry classification description |
sector | VARCHAR | Broad market sector (e.g. Technology, Healthcare, Financials) |
sic_code | VARCHAR | Standard Industrial Classification code (4-digit) |
sic_description | VARCHAR | Full SEC SIC industry label (e.g. 'Electronic Computers'). Companion to sic_code — use this when a human-readable string is more useful than the raw 4-digit code. |
fiscal_year_end | VARCHAR | Fiscal year end as MMDD (e.g. 1231 = December 31) |
location | VARCHAR | Company headquarters location (city, state, country) |
website | VARCHAR | Company primary website URL |
ceo | VARCHAR | Chief Executive Officer name at most recent filing |
founded_year | SMALLINT | Year the company was founded |
description | VARCHAR | Company business description extracted from SEC filings |
status | VARCHAR | Entity status: ACTIVE, INACTIVE, or DELISTED. Filter status != 'ACTIVE' to include all non-active companies |
entity_type | VARCHAR | SEC-defined filer category (e.g. Non-accelerated Filer, Large Accelerated Filer) |
ein | VARCHAR | Employer Identification Number (IRS tax ID) |
owner_org | VARCHAR | Parent or controlling organization name |
investor_website | VARCHAR | Investor relations website URL |
category | VARCHAR | SEC registrant category |
state_of_incorporation | VARCHAR | US state or country of incorporation (2-letter code) |
state_of_incorporation_description | VARCHAR | Full SEC label for the state-of-incorporation code (e.g. 'Delaware') |
phone | VARCHAR | Company contact phone number |
business_address | VARCHAR | Principal registered business (HQ) address — single concatenated line. Renamed from 'address' in schema v2.1.0. |
mailing_address | VARCHAR | Mailing address (separate from HQ) — registered agent / correspondence target. Often differs from business_address for foreign issuers and trusts. |
country_code | VARCHAR | ISO country code of the issuer's primary location |
is_foreign | BOOLEAN | TRUE if the issuer is a non-US registrant (typically a 20-F / 40-F filer) |
flags | VARCHAR | SEC-flagged status string (e.g. 'Shell', 'Delinquent'). NULL for clean issuers. |
has_insider_transactions | BOOLEAN | SEC declares Form 4/5 insider activity exists for this issuer |
is_insider_owner | BOOLEAN | SEC declares this CIK itself owns insider holdings |
former_names | JSON | SEC formerNames[] history — JSON array of {name, from, to} objects. Useful for ticker-history queries and merger lineage (e.g. FB → META). Empty array when the issuer has never renamed. |
last_seen_at | DATE | Date this entity last appeared in any SEC filing |
ingested_at | TIMESTAMPTZ | Timestamp when Valuein first ingested this entity record |
security
18 fields| Field | Type | Description |
|---|---|---|
id | INTEGER | Surrogate primary key for the security record |
entity_id | VARCHAR | Foreign key to entity.cik — the issuing company |
symbol | VARCHAR | Exchange ticker symbol |
exchange | VARCHAR | Stock exchange name (e.g. NASDAQ, NYSE) |
mic | VARCHAR | Market Identification Code (ISO 10383) |
valid_from | DATE | SCD Type 2 start date — when this ticker became active on this exchange |
valid_to | DATE | SCD Type 2 end date — when this ticker was retired. NULL means currently active |
is_active | BOOLEAN | Derived: TRUE when valid_to IS NULL. Use to filter current tickers only |
figi | VARCHAR | Financial Instrument Global Identifier at share-class level (OpenFIGI) |
composite_figi | VARCHAR | Composite FIGI at the exchange level |
share_class_figi | VARCHAR | Share class FIGI |
security_type | VARCHAR | Instrument type (e.g. Common Stock, ADR, ETF) |
market_sector | VARCHAR | FIGI market sector classification |
figi_ticker | VARCHAR | Ticker symbol as recorded in the OpenFIGI registry |
figi_name | VARCHAR | Security name from the OpenFIGI registry |
figi_exch_code | VARCHAR | Exchange code from the OpenFIGI registry |
is_primary_ticker | BOOLEAN | TRUE for the ticker SEC's company_tickers.json marks as the entity's primary listing. Multi-share-class issuers (BRK-A/BRK-B, GOOG/GOOGL) emit multiple is_active=TRUE rows; default consumer queries filter is_primary_ticker=TRUE for one row per CIK. |
ingested_at | TIMESTAMPTZ | Timestamp when Valuein first ingested this security record |
filing
23 fields| Field | Type | Description |
|---|---|---|
accession_id | VARCHAR | SEC EDGAR accession number — unique filing identifier (format: XXXXXXXXXX-YY-ZZZZZZ) |
entity_id | VARCHAR | Foreign key to entity.cik — the filing company |
filing_date | DATE | Date the SEC received and accepted the filing. Use this for PIT backtest filters — not report_date |
form_type | VARCHAR | Filing type: 10-K (annual), 10-Q (quarterly), 20-F (foreign annual), 8-K (current report), etc. |
filing_url | VARCHAR | Full URL to the filing document on SEC EDGAR |
is_amendment | BOOLEAN | TRUE if this is an amended filing (e.g. 10-K/A, 10-Q/A) |
amendment_no | SMALLINT | Amendment sequence number — 0 for the original filing, 1 for the first amendment, etc. |
superseded_by | VARCHAR | Accession-id of the filing that supersedes this one in the amendment chain (NULL when this is the latest version). Populated post-ingest. |
report_date | DATE | Fiscal period end date covered by this filing. Using this as a PIT filter introduces look-ahead bias — use filing_date or accepted_at instead. |
items | VARCHAR[] | Array of 8-K item numbers reported (e.g. ['1.01', '9.01']). NULL for non-8-K filings |
size | INTEGER | Filing size in bytes (per SEC submissions feed) |
is_xbrl | BOOLEAN | SEC-declared: filing carries an XBRL exhibit |
is_inline_xbrl | BOOLEAN | SEC-declared: filing uses Inline XBRL (iXBRL) |
is_xbrl_numeric | BOOLEAN | Distinct from is_xbrl/is_inline_xbrl — TRUE when the XBRL exhibit carries machine-readable numeric facts (the source of every fact-table row). |
primary_document | VARCHAR | Filename of the lead document in the filing package |
primary_doc_description | VARCHAR | SEC-provided label describing the primary document |
core_type | VARCHAR | Base form_type stripped of the /A amendment suffix (e.g. 10-K/A → 10-K). Use this for form-family filtering when you don't care whether a filing is the original or an amendment. |
file_number | VARCHAR | SEC EDGAR file number (e.g. 001-36743) |
film_number | VARCHAR | SEC EDGAR microfiche / film number assigned at acceptance |
act | VARCHAR | Securities-Act designator: '33' (1933 Act), '34' (1934 Act), or empty. Helps distinguish registration-statement vs. exchange-act filings. |
is_audited | BOOLEAN | Filing is an annual (audited) report — TRUE for 10-K / 20-F and their amendments, FALSE for 10-Q / 8-K / etc. |
accepted_at | TIMESTAMPTZ | Exact timestamp the SEC accepted the filing (millisecond precision). Every fact disclosed in this filing inherits the same accepted_at as its PIT anchor. |
ingested_at | TIMESTAMPTZ | Timestamp when Valuein ingested this filing record |
fact
32 fields| Field | Type | Description |
|---|---|---|
fact_id | VARCHAR | Deterministic identity hash: SHA-256(entity_id|accession_id|concept|period_end|unit). 64-character lowercase hex string. Use with verify_fact_lineage to trace any number to its source filing. |
entity_id | VARCHAR | Foreign key to entity.cik |
accession_id | VARCHAR | Foreign key to filing.accession_id |
concept | VARCHAR | Raw XBRL tag as filed by the company (e.g. us-gaap:Revenues). Use standard_concept for cross-company comparisons. |
standard_concept | VARCHAR | Canonical normalized concept label (e.g. Revenues, NetIncomeLoss). Maps 11,966 raw XBRL tags to ~150 standard concepts (95% coverage). |
priority | SMALLINT | Standardization priority — lower value = higher confidence in the concept mapping |
label | VARCHAR | Human-readable label from the company's XBRL filing context |
numeric_value | DOUBLE | The value as reported in this specific filing (accession_id). NULL for non-numeric facts. |
derived_quarterly_value | DOUBLE | Quarterly value derived by subtracting prior YTD periods (Q2/Q3 10-Q cash flows are filed as YTD). Use COALESCE(derived_quarterly_value, numeric_value) for single-quarter cash flow metrics. |
unit | VARCHAR | Unit of measure: USD, shares, pure (dimensionless ratio), etc. |
period_start | DATE | Reporting period start date. NULL for instant/point-in-time facts (e.g. balance sheet items) |
period_end | DATE | Reporting period end date. Table is range-partitioned on this column by year. |
fiscal_year | INTEGER | Fiscal year number (e.g. 2024). INT16 in the parquet — no cast needed for cross-table joins. |
fiscal_period | VARCHAR | Fiscal period label: FY (annual), Q1, Q2, Q3, Q4 |
frame | VARCHAR | XBRL frame identifier linking this fact to a specific reporting period context |
statement_type | VARCHAR | Financial statement category: income, balance_sheet, cash_flow, dei, notes |
confidence_score | DOUBLE | Composite data quality confidence metric (0.0–1.0). Use >= 0.70 for analytics. |
is_negative | BOOLEAN | TRUE if the originally reported value was negative |
is_audited | BOOLEAN | TRUE if this fact comes from an audited annual filing (10-K or 20-F) |
is_estimated | BOOLEAN | TRUE if the value is estimated rather than directly reported |
data_quality | VARCHAR | Data quality flag: reported, estimated, or provisional |
accepted_at | TIMESTAMPTZ | Point-in-time timestamp — when this fact became publicly known via SEC EDGAR. Filter accepted_at <= trade_date for look-ahead-free backtests. |
ingested_at | TIMESTAMPTZ | Timestamp when Valuein ingested this fact |
reporting_currency | VARCHAR | ISO-4217 currency code when the unit is monetary (e.g. USD, EUR, JPY). NULL for non-monetary facts (shares, ratios, days). Required for 20-F filers and any non-USD reporter. |
period_span_days | INTEGER | Number of calendar days spanned by this reporting period |
is_cumulative | BOOLEAN | TRUE if the value is cumulative year-to-date rather than for the period in isolation (common in 10-Q cash flow statements) |
derivation_type | VARCHAR | Method used to derive the quarterly value: ytd_subtraction, direct, or NULL if not derived |
derivation_uncertain | BOOLEAN | TRUE if the quarterly derivation has reduced confidence due to fiscal-period ambiguity |
value_current | DOUBLE | Bloomberg Option-C: the latest-known value for this (entity_id, standard_concept, period_end, unit) — i.e. the value any analyst watching the tape today would see. Computed at export time via DuckDB window functions. |
value_as_filed | DOUBLE | Bloomberg Option-C: the originally-reported value (first-ever accepted_at in the partition). Compare with value_current to detect restatements. |
first_filed_at | TIMESTAMPTZ | Timestamp of first disclosure (min accepted_at in the partition) |
restated | BOOLEAN | TRUE if any version of this fact in the partition differs from the others — signals that a later filing restated this number. Computed against the GLOBAL warehouse state, NOT PIT — for PIT reasoning use value_current vs. value_as_filed. |
valuation
26 fields| Field | Type | Description |
|---|---|---|
id | BIGINT | Surrogate primary key for the valuation record |
entity_id | VARCHAR | Foreign key to entity.cik — the company being valued |
security_id | INTEGER | Foreign key to security.id — the specific security valued (NULL if company-level only) |
valuation_date | DATE | The fiscal period_end the valuation was computed against |
model_type | VARCHAR | Valuation model identifier. Live values: 'dcf' (earnings-based two-stage), 'dcf_fcf' (free-cash-flow-based two-stage), 'ddm' (dividend discount model). Multiple rows per (entity_id, valuation_date) are expected. |
model_version | VARCHAR | Version string of the valuation model algorithm (e.g. 1.0) |
intrinsic_value | DOUBLE | Total enterprise / equity value implied by the model (currency) |
per_share_value | DOUBLE | intrinsic_value / shares_outstanding — the model's fair-value estimate per share |
margin_of_safety | DOUBLE | (per_share_value - current_price) / per_share_value. >0 = undervalued. |
valuation_label | VARCHAR | Human-readable bucket — 'undervalued' / 'fairly_valued' / 'overvalued' |
base_earnings | DOUBLE | DCF: trailing FCF base. DDM: trailing dividend base. |
growth_rate | DOUBLE | Stage-1 growth rate |
terminal_rate | DOUBLE | Terminal-stage growth rate |
discount_rate | DOUBLE | WACC for DCF; cost of equity for DDM |
growth_years | INTEGER | Stage-1 horizon in years |
terminal_years | INTEGER | Years modeled in the terminal phase before the perpetuity |
shares_outstanding | BIGINT | Diluted shares outstanding used to convert total equity value to a per-share figure |
current_price | DOUBLE | Market price used to compute margin_of_safety. NULL when the pipeline could not source a quote — margin_of_safety is then NULL too. |
base_dividend | DOUBLE | DDM only: trailing dividend base |
dividend_growth_rate | DOUBLE | DDM only: expected long-term dividend growth rate |
payout_ratio | DOUBLE | Dividend payout ratio (dividends ÷ earnings) |
data_quality | VARCHAR | Pipeline-assigned label — 'reported' | 'estimated' | 'low_confidence' |
source_period | VARCHAR | Which fiscal period (e.g. FY2024, TTM) supplied base_earnings |
assumptions | JSON | Full input bundle for the model — base_earnings, discount_rate, growth_rate, terminal_growth_rate, etc. Use to reproduce the calc. |
created_at | TIMESTAMPTZ | Timestamp when this valuation record was first computed |
updated_at | TIMESTAMPTZ | Timestamp when this valuation record was last recomputed (ON CONFLICT DO UPDATE) |
taxonomy_guide
6 fields| Field | Type | Description |
|---|---|---|
standard_concept | VARCHAR | Canonical concept name (primary key) — matches fact.standard_concept for joins |
human_name | VARCHAR | Human-readable display name for the concept (e.g. Total Revenue) |
definition | VARCHAR | Plain-English definition of what this financial concept measures |
unit_type | VARCHAR | Expected unit of measure: monetary, shares, ratio, pure, etc. |
balance_type | VARCHAR | Accounting balance type: debit or credit |
source_reference | VARCHAR | US-GAAP taxonomy reference identifier for this concept |
index_membership
14 fields| Field | Type | Description |
|---|---|---|
id | INTEGER | Surrogate primary key |
cik | VARCHAR | Foreign key to entity.cik — the index constituent. JOIN with references on cik = cik (same column name on both sides since migration 0015). |
index_name | VARCHAR | Canonical short code: SP500 | NASDAQ100 | RUSSELL3000 | WILSHIRE5000 |
effective_date | DATE | First trading day the company traded as a member |
announcement_date | DATE | Date S&P Dow Jones Indices publicly announced the addition. Typically 5–10 trading days before effective_date. NULL on rows from Wikipedia/GitHub seed/IVV daily layers — only the S&P press release scraper populates this. |
removal_date | DATE | First trading day the company was NOT a member. NULL = current. [) interval semantics — a company removed on 2017-06-19 is NOT a member on 2017-06-19. |
removal_announcement_date | DATE | Date the removal was publicly announced; symmetric with announcement_date. NULL when only Wikipedia/IVV layers wrote the row. |
removal_reason | VARCHAR | merger | acquisition | bankruptcy | criteria | spinoff | other. NULL when removal_date IS NULL. |
successor_cik | VARCHAR | Acquirer CIK when removal_reason in {merger, acquisition}. Essential for backtests crossing M&A — without it, a stock disappearing looks indistinguishable from a delisting. |
source | VARCHAR | Provenance: wikipedia | ivv | spglobal_press | github_seed | manual |
confidence | VARCHAR | high | medium | low. Used by get_pit_universe to set pit_safe automatically — pre-2000 dates default to 'low'. |
notes | VARCHAR | Free-text annotation |
ingested_at | TIMESTAMPTZ | Timestamp when Valuein first ingested this membership row |
updated_at | TIMESTAMPTZ | Timestamp when this membership row was last updated |
references
17 fields| Field | Type | Description |
|---|---|---|
cik | VARCHAR | SEC Central Index Key — 10-digit unique company identifier (from entity) |
name | VARCHAR | Legal registered company name (from entity) |
sector | VARCHAR | Broad market sector (from entity) |
industry | VARCHAR | Industry classification description (from entity) |
sic_code | VARCHAR | Standard Industrial Classification code 4-digit (from entity) |
status | VARCHAR | Entity status: ACTIVE, INACTIVE, or DELISTED (from entity) |
entity_type | VARCHAR | SEC-defined filer category (from entity) |
security_id | INTEGER | Surrogate PK of the security record (from security) |
symbol | VARCHAR | Exchange ticker symbol (from security) |
exchange | VARCHAR | Stock exchange name e.g. NASDAQ, NYSE (from security) |
mic | VARCHAR | Market Identification Code ISO 10383 (from security) |
valid_from | DATE | SCD Type 2 start date — when this ticker became active (from security) |
valid_to | DATE | SCD Type 2 end date — NULL means currently active (from security) |
is_active | BOOLEAN | TRUE when valid_to IS NULL — use to filter current tickers only (from security). For index membership, JOIN index_membership ON references.cik = index_membership.cik — there is no is_sp500 flag (dropped 2026-05-02 because it was snapshot-only and single-index). |
figi | VARCHAR | Financial Instrument Global Identifier share-class level (from security) |
composite_figi | VARCHAR | Composite FIGI at the exchange level (from security) |
share_class_figi | VARCHAR | Share class FIGI (from security) |
ratio
10 fields| Field | Type | Description |
|---|---|---|
entity_id | VARCHAR | Foreign key to entity.cik |
ratio_name | VARCHAR | Ratio identifier (e.g. roe, gross_margin, debt_to_equity) |
category | VARCHAR | Groups ratios for filtering: profitability | per_share | liquidity | leverage | efficiency | owner_earnings | valuation |
value | DOUBLE | Numeric value of the ratio. Inspect `unit` for the scale (percent vs decimal vs multiple). |
unit | VARCHAR | Unit: percent | per_share | pure | x |
period_end | DATE | Reporting period end date the ratio was computed for |
fiscal_year | INTEGER | Fiscal year of the underlying period |
fiscal_period | VARCHAR | FY | Q1 | Q2 | Q3 | Q4 | TTM |
is_ttm | BOOLEAN | TRUE for trailing-twelve-months rows |
computed_at | TIMESTAMPTZ | Pipeline-run timestamp the ratio was last computed (NOT a PIT field — recomputed on every pipeline run via ON CONFLICT DO UPDATE). |
factor_scores
25 fields| Field | Type | Description |
|---|---|---|
entity_id | VARCHAR | Foreign key to entity.cik |
period_end | DATE | Reporting period end date — derived from the latest two 10-K filings |
fiscal_year | INTEGER | Fiscal year of the underlying period |
accepted_at | TIMESTAMPTZ | Source filing's SEC acceptance timestamp — use for PIT filtering |
roe | DOUBLE | Return on equity |
gross_margin | DOUBLE | Gross margin |
operating_margin | DOUBLE | Operating margin |
net_profit_margin | DOUBLE | Net profit margin |
revenue_growth_yoy | DOUBLE | Year-over-year revenue growth |
fcf_to_assets | DOUBLE | Free cash flow / total assets |
debt_to_equity | DOUBLE | Debt-to-equity ratio (raw value — _rank column carries the cross-sectional rank) |
asset_turnover | DOUBLE | Asset turnover (revenue / total assets) |
current_ratio | DOUBLE | Current assets / current liabilities |
piotroski_f_score | INTEGER | Piotroski F-Score (0–9) — financial-strength composite |
roe_rank | DOUBLE | Cross-sectional PERCENT_RANK of roe (0.0–1.0; 1.0 = best) |
gross_margin_rank | DOUBLE | Cross-sectional PERCENT_RANK of gross_margin |
operating_margin_rank | DOUBLE | Cross-sectional PERCENT_RANK of operating_margin |
net_profit_margin_rank | DOUBLE | Cross-sectional PERCENT_RANK of net_profit_margin |
revenue_growth_yoy_rank | DOUBLE | Cross-sectional PERCENT_RANK of revenue_growth_yoy |
fcf_to_assets_rank | DOUBLE | Cross-sectional PERCENT_RANK of fcf_to_assets |
debt_to_equity_rank | DOUBLE | Cross-sectional PERCENT_RANK of debt_to_equity (inverted — lower leverage ranks higher) |
asset_turnover_rank | DOUBLE | Cross-sectional PERCENT_RANK of asset_turnover |
current_ratio_rank | DOUBLE | Cross-sectional PERCENT_RANK of current_ratio |
piotroski_f_score_rank | DOUBLE | Cross-sectional PERCENT_RANK of piotroski_f_score |
composite_rank | DOUBLE | Equal-weight average of all non-null factor _rank columns. The default sort key for screen_universe. |
earnings_signals
10 fields| Field | Type | Description |
|---|---|---|
entity_id | VARCHAR | Foreign key to entity.cik |
period_end | DATE | Reporting period end date |
fiscal_year | INTEGER | Fiscal year of the underlying period |
fiscal_period | VARCHAR | FY | Q1 | Q2 | Q3 | Q4 |
accepted_at | TIMESTAMPTZ | Source filing's SEC acceptance timestamp — use for PIT filtering |
eps_actual | DOUBLE | Reported diluted EPS for the period |
eps_trend_est | DOUBLE | Trailing 4-quarter average of EPSDiluted. NULL when fewer than 4 prior quarters exist. |
eps_surprise_pct | DOUBLE | (eps_actual - eps_trend_est) / |eps_trend_est|. NULL when |eps_trend_est| < 0.001 (avoids divide-by-zero blow-ups). Filter IS NOT NULL for momentum screens. |
revenue_actual | DOUBLE | Reported revenue for the period |
revenue_yoy_pct | DOUBLE | Latest quarter revenue vs. 4 quarters prior |
DuckDB Usage
All Parquet files load natively with DuckDB's read_parquet(). DuckDB handles ZSTD decompression automatically and supports lazy projection — only the columns you SELECT are read from disk.
-- Load a table directly from disk
SELECT *
FROM read_parquet('fact.parquet')
WHERE standard_concept = 'Revenues'
AND accepted_at <= '2023-01-01'
LIMIT 10;
-- Query directly from the API URL (presigned or streamed).
-- Membership goes through index_membership — references no longer
-- carries an is_sp500 flag (dropped 2026-05-02).
SELECT r.symbol, fa.numeric_value / 1e9 AS revenue_billions
FROM read_parquet('https://data.valuein.biz/v1/sp500/references') r
JOIN read_parquet('https://data.valuein.biz/v1/sp500/index_membership') im
ON im.cik = r.cik
JOIN read_parquet('https://data.valuein.biz/v1/sp500/fact') fa
ON fa.entity_id = r.cik
WHERE im.index_name = 'SP500'
AND im.removal_date IS NULL
AND fa.standard_concept = 'Revenues'
AND fa.accepted_at <= '2024-01-01'
ORDER BY revenue_billions DESC
LIMIT 20;ZSTD compression
All files are ZSTD-compressed. Typical compression ratio is 6–10x vs uncompressed. DuckDB decompresses transparently.
Column pruning
SELECT only the columns you need — DuckDB reads only those column chunks from the Parquet file, not the full table.
Predicate pushdown
DuckDB pushes WHERE clauses into Parquet row group statistics, skipping groups that cannot contain matching rows.
Explore the full field catalog
Search and filter all fields with SEC XBRL tag mappings, types, and example values in the interactive Data Catalog.