Technical Architecture · For Collaborators

Momentum Candle Backtester
Internals & Decision Record

Every module, every constant, every design decision. This is the file to read before touching code. Higher-level summary first, deep technical detail behind expandable sections so you can pick your depth.

Core file
8,264 LOC
Backtest methods
72 per signal
Features
22 causal, audited
WFO windows
5 rolling + purged
Universe
300+ USDT pairs
01SYSTEM AT A GLANCE

A three-gate validation pipeline for altcoin breakout candles

The scanner finds momentum breakout candles across 300+ Binance USDT pairs. Each candidate signal is pushed through three sequential gates before any execution decision is made:

GATE 1 · STATISTICAL
Backtest + WFO
72 method combos tested on historical analogs. Rolling 5-window purged walk-forward validates generalization.
→ 2 candidates, PF, WR, EVw
GATE 2 · MACHINE LEARNING
Adaptive Classifier
LR / RF / GB picked by sample count. Isotonic-calibrated, time-decay + regime weighted, purged CV.
→ p(WIN) calibrated %
GATE 3 · REASONING
AI Dual-Verdict
Groq analyzes both candidates with canonical prices, picks a winner or rejects both.
→ TRADE / SKIP + rationale
NOTE
Each gate is stricter than the last. Backtest passes → ML still rejects? Skip. ML passes → AI flags structural issues? Skip. The system's core bias is false-negatives over false-positives — missed trades beat bad trades.
02STACK & FILE LAYOUT

Three files. One brain.

FileLOCResponsibility
app.py8,264Scanner + Manual + all pipeline stages (Step 1/2/3), UI rendering, Groq AI verdict
pulse_intel.py1,682On-chain intel module: DefiLlama TVL, Etherscan / Solscan flow, LunarCrush social, macro
lookahead_audit.py290Causality audit — proves all 22 features are forward-looking-leak-free

Dependencies

# requirements.txt
streamlit>=1.32.0
plotly>=5.19.0
pandas>=2.0.0
numpy>=1.26.0
scipy>=1.12.0
scikit-learn>=1.4.0
requests>=2.31.0

Three tabs, three concerns

🔭 SCANNER
Auto
Scans 300+ USDT pairs for momentum breakout candles. Last 3 closed candles per coin, per timeframe.
🔍 MANUAL
On-demand
Analyze any coin + any date + direction. No body/vol filter — user picks the candle. Same 3-step pipeline.
🫀 PULSE
Free intel
On-chain Nansen-lite: Flow 40% · TVL 35% · Social 25%. Composite ±15 score.
03DATA FLOW (END-TO-END)

One signal, three steps, one verdict

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Binance │──▶│ _clean_df │──▶│ _scanner_ │──▶│ sig dict │ │ /klines │ │ 22 features │ │ score_signal │ │ (candidate) │ └──────────────┘ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌───────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 1 · _scanner_quick_backtest(sig) → 72 methods, 2 candidates │ │ · _scanner_mini_wfo(sig, bt) → 5 rolling purged windows │ └────────────────────────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 2 · _scanner_train_ml(sig, method) → ml_a, ml_b (adaptive) │ │ · If Cand A == Cand B → unanimous, single fit │ └────────────────────────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 3 · _scanner_ai_verdict(sig, ml_a, ml_b, bt, wfo, cand_a, b) │ │ → {candidate_a, candidate_b, winner, winner_rationale} │ └────────────────────────────────────────────────────────────────────────┘ ▼ EXECUTION DECISION : TRADE (A|B) / SKIP + trade plan
Why split into three buttons and not one pipeline run DESIGN

Originally two steps. Split into three on Apr 13 because ML training needs to know which method it's labeling by — which requires the user to see backtest results and make a judgment first. Friction here is a feature, not a bug: it forces the human to acknowledge the backtest output before the ML and AI layers get involved.

Also: Step 1 is expensive (72 methods × ratchet loop). Running it blindly for every scan would waste Binance API calls and sklearn CPU.

04MARKET DATA LAYER

Binance klines, 1000 bars deep

All backtest/WFO/ML training pulls from _scanner_fetch_candles(symbol, interval, limit). Binance's /api/v3/klines hard-caps at 1000 bars — we always fetch the max.

_BINANCE_INTERVAL = {"1D":"1d", "4H":"4h", "1H":"1h", "1W":"1w"}
_DEEP_FETCH_LIMITS = {"1h":1000, "2h":1000, "4h":1000,
                      "6h":1000, "12h":1000, "1d":1000}

What 1000 bars means per timeframe

IntervalHistorical windowBars
1h~42 days1000
4h~167 days (5.5mo)1000
1d~2.7 years1000
1w~19 years (way past listing date on most alts)1000
Fallback & alternate sources RESILIENCE

_binance_klines is the primary path. _gateio_klines exists as a fallback for delisted Binance pairs where the ticker moved to Gate.io. fetch_live handles real-time current-bar polling for the Pulse tab.

Exchange-listed API limits are handled via adaptive retry with 500ms backoff on 429 / 418 status codes. No persistent rate limiter — thin wrapper assumes Streamlit's request cadence is low enough.

0522 CAUSAL FEATURES

Every feature is strictly past-looking

Built by _clean_df(df). Audited by lookahead_audit.py20/20 CLEAN (synthetic test, 25 random bars, all features match causal recomputation). A deliberately-leaking test feature was correctly flagged — the audit works.

The 11 ML features (subset of 22)

body_pct
Candle body as % of range
vol_mult
Volume vs 7-bar avg
adx
ADX(14) trend strength
di_gap
DI+ − DI− directional spread
atr_ratio
ATR(14) / close — vol regime
ema_score
Alignment of 21/50/200 EMAs
regime_score
Macro regime composite 0–100
candle_rank
Body rank pct in 20-bar window
vol_rank
Volume rank pct in 20-bar window
body_vs_atr
Body / ATR — explosiveness
dist_from_ema21
% distance from EMA21 (signed, flipped for SHORT)
Causality proofs per feature AUDIT
  • EMA uses .shift(1).ewm() — strictly past, no current bar.
  • vol_avg_7 uses .shift(1).rolling(7).mean() — past 7 bars, excluding current.
  • candle_rank_20, vol_rank_20 use .rolling(20).rank(pct=True) — causal in pandas (ranks within window ending at current bar, which is correct — the signal uses current candle's data to score itself).
  • body_vs_atr and dist_from_ema21_pct are derived from already-causal features.
  • regime_score composites ADX/F&G/funding/OI — each fetched at or before the bar's timestamp.

Full audit: python lookahead_audit.py BTCUSDT 1d

PROVEN
For SHORT signals, dist_from_ema21 is sign-flipped so "stretched the wrong way" is always a negative number, regardless of direction. This keeps the ML feature space directionally consistent.
06REGIME SCORING

A 0–100 score blending trend, sentiment, funding, OI

calculate_regime_score(df, bar_index, direction, adx_df, timeframe, ticker) composites four inputs into a single bar-level score. The score is cached per bar in _bar_regime_cache so the 72-method backtest doesn't recompute it 72× per candle.

TREND
ADX
DI gap direction
SENTIMENT
F&G
Fear & Greed Index
LEVERAGE
Funding
8h funding rate
POSITIONING
OI
Open interest delta

Soft regime similarity

def _regime_similarity_weight(current, historical):
    return max(0.15, 1 - abs(current - historical) / 100)

Applied to:

  • Backtest EVw / WRw — analogs from similar regimes pull harder
  • ML sample_weight — same logic, at the classifier level

NOT applied to WFO — WFO tests generalization across regimes on purpose. Regime-filtering the WFO would defeat the point.

FLOOR
Similarity floor = 0.15 (never 0). Prevents sample-size cliffs on illiquid coins where very few historical analogs match the current regime closely. A far-regime analog still contributes 15% signal.
07SCANNER

Finds breakout candidates across 300+ USDT pairs

Universe built by _scanner_get_universe(min_volume_usdt) — Binance spot USDT pairs with min 24h volume. Each coin scans its last 3 closed candles (not current bar — that's still forming) via _scan_one_symbol_scanner_score_signal.

Scoring a candidate

_scanner_score_signal(df, adx_df, bar_idx, direction,
                       timeframe, symbol, min_body_pct, min_vol_mult)
#  → returns sig dict  OR  None if below threshold

Returns a sig dict containing: direction, entry/SL/TP prices, body_pct, vol_mult, ADX, DI gap, ATR ratio, EMA alignment score, regime score, bar_index, timeframe, symbol, 11-feature vector extracted for current bar.

Scanner threshold defaults CONFIG

Defaults are intentionally loose — better to surface a borderline signal and let the 3-gate pipeline reject it than miss one entirely.

  • min_body_pct ≈ 0.60 (body takes up ≥60% of range)
  • min_vol_mult ≈ 1.8× (volume ≥1.8× 7-bar avg)
  • ADX filter soft — used in scoring, not hard-cut

Signals below threshold return None and never enter the pipeline.

08STEP 1 — BACKTEST + WFO

72 method combos against historical analogs

When the user clicks "Run Step 1": _scanner_quick_backtest(sig) scans historical bars looking for analogs to the current signal, then simulates all 72 method combinations on those analogs. _scanner_mini_wfo(sig, bt) then runs a rolling purged walk-forward on the full df to test generalization.

Key constants

NEUTRAL_R_THRESHOLD = 0.30        # ±0.30R = NEUTRAL, excluded from ML
MAX_HOLD            = 20          # max bars per trade
FIXED_SL            = 1.5%        # fixed stop width (alt: ATR SL)
MAX_HOLD
Every trade times out after 20 bars. On 4H that's 3.3 days; on daily, 20 days. This is critical for the purge logic — labels span up to 20 bars, so training samples near fold boundaries must be dropped.
09RATCHET FILTER

Progressive relaxation when analogs are scarce

Problem: a strong 85%-body, 5× volume signal has very few exact historical analogs. Requiring 70% of its own body/vol would find 4 samples on ETH 4H. Too few to backtest, way too few to train ML on.

Solution: a ratchet that relaxes the threshold until enough samples are found, with hard floors to prevent pure-noise analogs.

70%
STRICT
55%
TIGHT
45%
NORMAL
35%
RELAXED
25%
LOOSE
20%
FLOOR
# Backtest ratchet (target: 50 bars)
_BT_RATCHET_RATIOS = [0.70, 0.55, 0.45, 0.35, 0.25, 0.20]
_BT_MIN_BODY_FLOOR = 0.20
_BT_MIN_VOL_FLOOR  = 1.10

# ML ratchet (target: 80 samples)
_RATCHET_RATIOS = [0.70, 0.55, 0.45, 0.35, 0.25, 0.20]
Two-pass optimization PERF

The backtest does a cheap _count_passing() scan at each ratchet level first. Only when a level meets the 50-bar target does it run the full 72-method loop once. This avoids 6× redundant simulation.

Why hard floors matter GUARDRAIL

Without floors, a signal on an ultra-rare extreme candle could ratchet all the way down to matching any candle, producing garbage analogs. body_floor=0.20 and vol_floor=1.10 ensure every analog is at least a "real" candle, not dust.

UI badge

The result flows through to a user-facing badge:

STRICT 70% — analogs tightly matched, trust ML probability numerically. RELAXED 45% — broad analog set. LOOSE 20% — very broad; treat ML probability as directional only.

1072-METHOD MATRIX

Every signal tested against 72 trade strategies

The cartesian product of entry × SL × mgmt × TP:

ENTRY_ZONES  = ["Aggressive", "Standard", "Sniper"]              # 3
SL_METHODS   = ["Fixed SL", "ATR SL"]                           # 2
MGMT_MODES   = ["Simple", "Partial", "Partial-NoBE", "Trailing"]   # 4
TP_MULTS     = [2.0, 2.5, 3.0]                                  # 3
#            3 × 2 × 4 × 3 = 72 combinations

Entry zones

AGGRESSIVE
0%
Market entry at signal close. Best fill rate, worst R:R.
STANDARD
38.2%
Fib retracement pullback. Middle ground.
SNIPER
61.8%
Deep retracement. Best R:R, lowest fill rate.
INVARIANT
Zone validity rule: when the retrace entry price falls below the structural SL (happens on big-body candles with large ATR), the zone is added to _invalid_zones and excluded from candidate selection. This is mechanically correct — do not "fix" by widening SL. Widening SL changes the strategy's semantics.

Management modes

ModeBehavior
SimpleFull size, hold to TP2 or original SL. No BE, no partials.
PartialTP 50% at 1R, auto-move SL to BE on remaining half.
Partial-NoBETP 50% at 1R, keep original SL (real downside remains).
TrailingFull size, BE at 1R, then trail 0.5×ATR from close.
Why Partial vs Partial-NoBE distinction matters LABEL BUG

Partial trades that hit TP1 then reverse to BE produce r_mult ≈ +0.498R. Mathematically this is positive PnL but the outcome is "basically flat." Labeling these as WIN caused:

  • ML seeing only wins on trending coins → single-class collapse → can't train
  • Backtest looking invincible: PF=∞, WR=100% on REZ-like trending alts

Fix: _classify_outcome(r_mult) → WIN / LOSS / NEUTRAL. |r_mult| ≤ 0.30R = NEUTRAL, still counted in PF/WR but excluded from ML labels.

Partial-NoBE was added on Apr 17 because the user believed they were trading that style. Partial was moving SL to BE automatically — a different strategy entirely. Both are now available for honest comparison.

All 72 methods (visual)

Aggro · Fixed · Simple · 2.0
Aggro · Fixed · Partial · 2.0
Aggro · Fixed · P-NoBE · 2.0
Aggro · Fixed · Trail · 2.0
... · 2.5
... · 2.5
... · 2.5
... · 2.5
... · 3.0
... · 3.0
... · 3.0
... · 3.0
Aggro · ATR · Simple · 2.0
Aggro · ATR · Partial · 2.0
Aggro · ATR · P-NoBE · 2.0
Aggro · ATR · Trail · 2.0
... Standard (×24) ...
... Sniper (×24) ...

72 combinations · each with n, WR, EV, EVw, PF, fill_rate, 4-bucket decay breakdown

11TWO CANDIDATES

The dual-candidate decision system

Rather than picking a single "best" method from the 72, the system surfaces two:

🟢 CANDIDATE A — NEWEST
What's working now
Best method in the newest time-decay bucket. Captures current market behavior.
bt["candidate_newest"]
🔵 CANDIDATE B — WEIGHTED
What has worked historically
Best method by decay-weighted all-time EVw. Captures long-run robustness.
bt["candidate_weighted"]
UNANIMOUS
When A == B (same method wins both), the UI flags it as unanimous and ML is trained once instead of twice. Strong signal — the method dominates both recent and all-time.

Time-decay bucket scheme (adaptive)

Sample countBucketsWeights (oldest → newest)
n ≥ 40040.40, 0.60, 0.80, 1.00
n ≥ 20030.50, 0.75, 1.00
n ≥ 8020.60, 1.00
n < 8011.00
BUG FIX
best_key must use ev_weighted, not raw ev. Previous bug selected by raw EV, which could pick a method that crushed it in 2021 but has since gone dormant. Using EVw ensures newer trades pull harder on the selection. This is #2 on the do-not-break list.
12PURGED WALK-FORWARD

5 rolling windows with de Prado-style purge + embargo

A single in-sample/out-of-sample cut is one point estimate. Five rolling cuts give a distribution. "5/5 windows OOS PF ≥ 1.0" is real edge; "1 good cut" could be luck.

Rolling cuts at 50/60/70/80/90%

Win 1 · 50%
IS/OOS
Win 2 · 60%
IS/OOS
Win 3 · 70%
IS/OOS
Win 4 · 80%
IS/OOS
Win 5 · 90%
IS/OOS
In-sample Embargo Out-of-sample

Purge + embargo (de Prado Ch.7)

Every trade stores bar_index (entry bar) and label_end_bar (= j, resolution bar). PurgedTimeSeriesSplit drops training samples whose label period overlaps the test fold. Embargo = ceil(0.01 × n_total) after each fold.

CRITICAL
sklearn's default TimeSeriesSplit doesn't know labels span MAX_HOLD=20 bars. A training sample at the fold boundary has its label resolved inside the test fold → leak. On daily bars that's 20 days of leakage. Always PurgedTimeSeriesSplit.

WFO verdict thresholds

n_oosVerdict
≥ 8PASS
5–7BORDERLINE
< 5INSUFFICIENT
WFO return dict shape SCHEMA
{
  "ok": bool,
  "verdict": "PASS" | "BORDERLINE" | "FAIL" | "INSUFFICIENT",
  "is_pf", "oos_pf", "oos_wr",
  "is_pf_clean", "oos_pf_clean",      # honest PF excl. NEUTRAL
  "oos_n": int,
  "purge_diag": {n_purged, n_embargoed, embargo_bars},
  "label_diag": {n_neutral, raw_pf, honest_pf, pf_inflation_pct},
  "oos_pf_ci": {"lo": float, "hi": float},  # 1000-resample block bootstrap
  "rolling_wfo": {"edge_hit_rate", "windows": [...]},
  "regime_breakdown": {"STRONG": ..., "MID": ..., "WEAK": ...},
  "tier_label": "PURGED IS/OOS split (70%/30%, embargo 1%)",
}
Bootstrap 95% CI on OOS PF UNCERTAINTY

1000-resample block bootstrap on OOS r_mult list. Reports {lo, hi}. Context: n=8 trades with PF=1.3 has CI roughly [0.7, 2.8] — wide, honestly acknowledged. This beats reporting a point estimate as if it were gospel.

Regime-conditional OOS breakdown DIAGNOSTIC

Splits OOS trades by ATR-ratio proxy into STRONG/MID/WEAK regimes. Reports PF/WR per regime. An aggregate PF=1.4 can hide PF=2.8 in STRONG and PF=0.9 in WEAK — the breakdown makes this visible and actionable.

Fill-rate / survivor-bias diagnostic HONEST

Each method combo tracks:

  • n_qualifying — signals passing the filter
  • n_filled — trades that actually entered the zone
  • n_expired — never retraced to fill
  • fill_rate = n_filled / n_qualifying

Standard/Sniper zones on trending coins mostly don't fill (price never retraces). The backtest only sees the rare pullback-continuation subset → survivor bias inflates WR. fill_rate makes this visible. AI prompt warns when fill_rate < 40% on ≥20 qualifying signals.

13STEP 2 — ADAPTIVE ML

Classifier picked by sample count

Different sample sizes need different models. Too few samples and a heavy model overfits; too many and a simple model underutilizes the data.

n samplesModelWhy
n < 20Heuristic fallbackNo training. Deterministic rule-based probability.
n < 50Logistic RegressionLeast likely to overfit on small n. Pipeline with StandardScaler, class_weight=balanced.
50–149Random Forestn_estimators=150, max_depth=5, min_samples_leaf=5. Robust to feature scale.
≥ 150Gradient Boostingn_estimators=150, max_depth=3, lr=0.05, subsample=0.8. Best generalization at scale.

Calibration wrapper

if n_samples >= 60:
    model = CalibratedClassifierCV(model, method="isotonic", cv=3)
WHY
Uncalibrated RF/GB are systematically overconfident. Without isotonic calibration, a raw "68% probability" might really mean 53%. Calibration makes thresholds numerically meaningful. Only applied when n≥60 to avoid overfitting the calibration itself on tiny samples.

CV splitter

cv = PurgedTimeSeriesSplit(
    n_splits=min(5, n // 15),
    embargo_pct=0.01,
)
# Never sklearn's default TimeSeriesSplit — would leak labels at fold boundaries.
Training labeling: by method outcome LABELS

_scanner_train_ml labels historical candles by the chosen method's WIN/LOSS/NEUTRAL outcome — not by a fixed Aggressive/Simple baseline.

Reason: the ML should learn what works for the specific method you'll actually trade, not a generic proxy method. If Cand A is "Sniper · ATR · Partial-NoBE · 2.5", the ML learns to predict WIN for that exact configuration on historical analogs.

When Cand A ≠ Cand B, the ML trains twice — once per candidate.

14NEUTRAL LABELING

The |r_mult| ≤ 0.30R band excluded from ML

def _classify_outcome(r_mult):
    if r_mult >  NEUTRAL_R_THRESHOLD:  return "WIN"
    if r_mult < -NEUTRAL_R_THRESHOLD:  return "LOSS"
    return "NEUTRAL"
  • NEUTRAL trades still counted in PF/WR accounting (the money is real).
  • NEUTRAL trades excluded from ML labels (they'd cause single-class collapse on trending coins).
  • WFO reports honest_pf (excluding NEUTRAL) alongside raw PF.
SAVED
Before this fix, REZ-like trending alts on Partial+BE showed WR=100%, PF=∞ — unusable. After: the ±0.30R band is called what it is (flat), PF becomes realistic, ML can actually train.
15SAMPLE WEIGHTING

Every training sample gets a two-factor weight

sample_weight = time_decay_bucket_weight × regime_similarity_weight
TIME DECAY
Recent > old
Sample in newest bucket weighted 1.0, oldest bucket 0.40 (at n≥400). Aligns ML with backtest's EVw.
REGIME SIMILARITY
Floor 0.15
GREEN-regime signal shouldn't learn equally from RED-regime historical trades. Soft weighting, never zeroes.
Pipeline sample_weight error & fix BUGFIX

CalibratedClassifierCV wrapping a Pipeline raises various exception types (not just TypeError) when sample_weight is passed. Old code only caught TypeError and jumped to heuristic fallback.

Fix: broadened to except Exception on weighted fit. Falls back to unweighted fit before giving up and going to heuristic. Applied to both main fit and CV loop.

16STEP 3 — AI VERDICT

Groq reasoning with canonical prices

_scanner_ai_verdict(sig, ml_a, ml_b, bt, wfo, cand_a, cand_b)
#  → {candidate_a, candidate_b, winner, winner_rationale}

Model selection

ModelUse
openai/gpt-oss-120bDEFAULT Strongest free reasoning on Groq
openai/gpt-oss-20bFaster, slightly weaker
qwen/qwen3-32bAlt reasoning
llama-3.3-70b-versatileFallback
meta-llama/llama-4-scout-17b-16e-instructFast fallback

reasoning_effort="medium" only for gpt-oss/qwen. max_tokens=2500. Timeout 60s.

Price hallucination prevention

_compute_candidate_prices(cand, sig) is the single source of truth for entry/SL/TP1/TP2 prices. Both the UI cards and the AI prompt read from it. The AI prompt contains an explicit EXECUTION PRICES block with a "copy verbatim" instruction.

HISTORY
Early AI verdicts generated plausible-sounding but wrong prices — mixing between candidates. Bug fixed Apr 15 by injecting canonical prices into the prompt with strict copy-verbatim instruction. Now UI and AI always agree.

Dual verdict rules

  • When A == B (unanimous): single analysis mirrored to both sides.
  • Both TRADE → AI picks stronger as winner.
  • Only one TRADE → that one wins.
  • Neither → winner = NONE.
17SETUP GRADING

Dual-candidate-aware A+ / A / B / C

_scanner_setup_grade(sig, ml, bt) grades by best-of the two candidates. A previous bug read from legacy aggregate bt["win_2r"], which made Cand A excellent + aggregate bad return "C — Backtest negative" (false negative).

Rescue rules

  • "B rescue" — if any candidate is tradeable AND ml_pct ≥ 60, grade won't drop below B.
  • Grade color follows the badge palette: A+ A B C.
18PULSE INTEL

On-chain Nansen-lite, free tier

get_pulse_intel(symbol,
                etherscan_api_key, lunarcrush_api_key, solscan_api_key)
#  → composite_score (-15 to +15)
#  → composite_label: STRONGLY BULLISH / BULLISH / NEUTRAL / BEARISH / STRONGLY BEARISH

Composite weights

FLOW · 40%
CEX net
Etherscan (ERC-20) or Solscan (SPL). Net inflow = bearish, outflow = bullish.
TVL · 35%
DefiLlama
TVL 24h + 7d delta. Score −10 to +10. No API key needed.
SOCIAL · 25%
LunarCrush
Galaxy Score + sentiment + alt rank. Free v4 API.

Per-token composite scaled ×1.2 → ±12. Macro modifier (±3) added on top → final ±15.

VerdictScore band
STRONGLY BULLISH≥ +10
BULLISH+4 to +9
NEUTRAL−3 to +3
BEARISH−4 to −9
STRONGLY BEARISH≤ −10
Cache TTLs PERF
  • TVL: 3600s (updates slowly)
  • Flow: 900s
  • Social: 1800s
  • Macro: 14400s
19DO-NOT-BREAK LIST

Things that must stay exactly as they are

LAWS
Every item here was a real bug that cost real work to diagnose. Re-introducing any of them resets progress.
  1. Never use sklearn TimeSeriesSplit — always PurgedTimeSeriesSplit. Label-period overlap leaks.
  2. best_key uses ev_weighted, not raw ev. Fixed bug.
  3. Don't remove label_end_bar from trades_raw. Purge depends on it.
  4. Don't remove the ratchet. High-vol signals get 0–6 training samples without it.
  5. Don't remove NEUTRAL classification. Prevents single-class ML collapse on trending coins.
  6. Zone validity is correct. Do NOT widen SL to "fix" it.
  7. WFO is NOT regime-filtered by design. Keep it that way.
  8. _compute_candidate_prices is the single source of truth for AI prompt + UI.
  9. Dead code in render_auto_analyzer (manual_sig=None, _manual_render_signals block ~L4394-4430). Harmless. Do NOT activate without a dedicated session.
  10. def main(): must exist. A prior str_replace edit accidentally removed it, causing NameError on deploy. Always verify before shipping.

Intentionally NOT built

Wide-SL toggle for big-body candles

Would change strategy semantics. Revisit after 30+ days of journal data if "zone unavailable" costs meaningful edge.

Cross-coin feature pooling (master model)

Ratchet fix solved most sample starvation. Illiquid alts probably shouldn't be traded if no historical analog exists. Revisit after journal data.

Regime-conditional ML (separate models per regime)

Soft regime weighting in sample_weight already addresses this with less complexity. Needs ≥30 samples per regime to be reliable — most coins won't hit that.

ICT / S&R / trendline / volume-profile modules

Do not build additional strategies until momentum candle has 30+ days of live journal data proving edge. Building untested systems on top of an unvalidated primary is premature optimization.

Current state

✅ SHIPPED
  • Lookahead audit 20/20 CLEAN
  • Purge + embargo (PurgedTimeSeriesSplit)
  • Soft regime filtering
  • Rolling WFO + bootstrap CI + regime breakdown
  • Fill-rate survivor-bias diagnostic
  • 4 MGMT modes (incl. Partial-NoBE)
  • NEUTRAL label option A
  • Manual tab enrichment to match Scanner
🚧 BACKLOG
  • Dead-code cleanup in render_auto_analyzer
  • Zone-summary table in Manual
  • Confluence Grade breakdown in Manual
  • Meta-labeling (2nd classifier)
  • Pulse as ML feature
  • CPCV + PBO for QUANTFLOW
  • IDX BSJP port (yfinance .JK)