ValueinValuein
advanced
10 min

Point-in-Time Universe Construction

Build a survivorship-bias-free historical S&P500 universe for any rebalance date.

PITuniversebacktestingDuckDB
## The Survivorship Bias Problem If you use today's S&P500 list to backtest 2018 strategies, you're including companies that weren't in the index in 2018 — and excluding companies that were delisted since then. This is survivorship bias and it inflates backtest returns. `index_membership` keys on `cik` (not `security_id`) since migration 0015, and uses `effective_date` / `removal_date` as the half-open `[)` membership window.
python
from valuein_sdk import ValueinClient, ValueinError try:    with ValueinClient() as client:         def get_pit_universe(as_of_date: str, index: str = "SP500"):            """Returns the exact index constituents as of as_of_date."""            return client.run_query(f"""              SELECT r.cik, r.symbol, r.name, r.sector              FROM references r              JOIN index_membership im ON im.cik = r.cik              WHERE im.index_name = '{index}'                AND im.effective_date <= DATE '{as_of_date}'                AND (im.removal_date IS NULL OR im.removal_date > DATE '{as_of_date}')            """)         # S&P500 as of Jan 1, 2020 (before COVID additions)        universe_2020 = get_pit_universe("2020-01-01")        print(f"S&P500 size on 2020-01-01: {len(universe_2020)} companies")         # Compare to current S&P500        universe_now = get_pit_universe("2026-01-01")        print(f"S&P500 size on 2026-01-01: {len(universe_now)} companies")         # Companies that are in today's S&P500 but weren't in 2020        new_additions = set(universe_now.symbol) - set(universe_2020.symbol)        print(f"Added since 2020: {len(new_additions)} companies")        print(sorted(new_additions)[:10]) except ValueinError as e:    print(f"Error: {e}")

Try it yourself

Get your API token and run this notebook against 111M+ real SEC EDGAR facts.