What is synthetic
Effectively everything that carries financial meaning is generated:- Prices and OHLCV bars
- Income statements, balance sheets, cash-flow statements, and metrics
- Earnings results, surprises, guidance, and analyst estimates
- News articles, analyst notes, and SEC-style filings
- Insider trades, institutional holdings, and corporate actions
- The macro tape and central-bank policy path
- Events and their effects
What is real
Two things are grounded in real-world reference data:- Ticker symbols. David uses real exchange-style symbol aliases (e.g.
AAPL,NVDA,SPY) so your tooling and ticker lists work unchanged. The bundled universe contains 16,143 unique public US symbols from SEC/Nasdaq snapshots. - Company reference identity. Each company’s display metadata (name, sector, industry, business description, market-cap tier, instrument type) reflects the real-world company, so
AAPLreads as Apple, Consumer Electronics andSPYreads as an S&P 500 ETF.
Determinism
David’s defining property is determinism. A scenario is built from a seed plus configuration, and the same inputs always produce the same world, down to individual price bars and headline text. Generation also follows a fixed dependency order, so the world is internally consistent rather than a pile of independent random series:Company identities + sector archetypes
Each issuer’s sector behavior and business profile are established.
Financial statements + metrics
Income statement, balance sheet, cash flow, and ratios are generated.
Consensus estimates + guidance
Sell-side expectations and management guidance are set against those fundamentals.
Internal consistency
Because the world is derived from a single hidden truth, the data ties together:- Accounting identities hold (balance sheet balances, cash reconciles, EPS ties out).
- OHLC and volume invariants hold (
low ≤ open, close ≤ high; event days spike volume). - Prices react to earnings: significant beats/misses move the stock in the right direction.
- News and filings repeat the numbers in the structured records instead of drifting from them.
- Filings carry the metadata (accession, CIK, item codes) and link back to the events that triggered them via
source_event_id.
Calibration
David’s generator is tuned against stylized facts of real markets (return distributions, volatility clustering, drawdown frequency, volume/reaction coupling, tail behavior, and cross-sector correlation) measured from market panels. The/metadata/empirical-calibration endpoint reports the current calibration manifest and gate margins.
Calibration means the synthetic data resembles real markets statistically. It does not mean David copies real time series. David is a research and evaluation tool, not a licensed feed of real market data.
Next steps
Ticker universe
Real symbols, instrument types, and ETF suppression rules.
Market coverage
Universe size, horizons, and endpoint coverage.