Technical Architecture · For Collaborators

Momentum Candle Backtester
Internals & Decision Record

Every module, every constant, every design decision. This is the file to read before touching code. Higher-level summary first, deep technical detail behind expandable sections so you can pick your depth.

Core file

8,264 LOC

Backtest methods

72 per signal

Features

22 causal, audited

WFO windows

5 rolling + purged

Universe

300+ USDT pairs

01SYSTEM AT A GLANCE

A three-gate validation pipeline for altcoin breakout candles

The scanner finds momentum breakout candles across 300+ Binance USDT pairs. Each candidate signal is pushed through three sequential gates before any execution decision is made:

GATE 1 · STATISTICAL

Backtest + WFO

72 method combos tested on historical analogs. Rolling 5-window purged walk-forward validates generalization.

→ 2 candidates, PF, WR, EVw

GATE 2 · MACHINE LEARNING

Adaptive Classifier

LR / RF / GB picked by sample count. Isotonic-calibrated, time-decay + regime weighted, purged CV.

→ p(WIN) calibrated %

GATE 3 · REASONING

AI Dual-Verdict

Groq analyzes both candidates with canonical prices, picks a winner or rejects both.

→ TRADE / SKIP + rationale

NOTE

Each gate is stricter than the last. Backtest passes → ML still rejects? Skip. ML passes → AI flags structural issues? Skip. The system's core bias is false-negatives over false-positives — missed trades beat bad trades.

02STACK & FILE LAYOUT

Three files. One brain.

File	LOC	Responsibility
app.py	8,264	Scanner + Manual + all pipeline stages (Step 1/2/3), UI rendering, Groq AI verdict
pulse_intel.py	1,682	On-chain intel module: DefiLlama TVL, Etherscan / Solscan flow, LunarCrush social, macro
lookahead_audit.py	290	Causality audit — proves all 22 features are forward-looking-leak-free

Dependencies

# requirements.txt
streamlit>=1.32.0
plotly>=5.19.0
pandas>=2.0.0
numpy>=1.26.0
scipy>=1.12.0
scikit-learn>=1.4.0
requests>=2.31.0

Three tabs, three concerns

🔭 SCANNER

Auto

Scans 300+ USDT pairs for momentum breakout candles. Last 3 closed candles per coin, per timeframe.

🔍 MANUAL

On-demand

Analyze any coin + any date + direction. No body/vol filter — user picks the candle. Same 3-step pipeline.

🫀 PULSE

Free intel

On-chain Nansen-lite: Flow 40% · TVL 35% · Social 25%. Composite ±15 score.

03DATA FLOW (END-TO-END)

One signal, three steps, one verdict

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Binance │──▶│ _clean_df │──▶│ _scanner_ │──▶│ sig dict │ │ /klines │ │ 22 features │ │ score_signal │ │ (candidate) │ └──────────────┘ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌───────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 1 · _scanner_quick_backtest(sig) → 72 methods, 2 candidates │ │ · _scanner_mini_wfo(sig, bt) → 5 rolling purged windows │ └────────────────────────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 2 · _scanner_train_ml(sig, method) → ml_a, ml_b (adaptive) │ │ · If Cand A == Cand B → unanimous, single fit │ └────────────────────────────────────────────────────────────────────────┘ ▼ ┌────────────────────────────────────────────────────────────────────────┐ │ STEP 3 · _scanner_ai_verdict(sig, ml_a, ml_b, bt, wfo, cand_a, b) │ │ → {candidate_a, candidate_b, winner, winner_rationale} │ └────────────────────────────────────────────────────────────────────────┘ ▼ EXECUTION DECISION : TRADE (A|B) / SKIP + trade plan

Why split into three buttons and not one pipeline run DESIGN

Originally two steps. Split into three on Apr 13 because ML training needs to know which method it's labeling by — which requires the user to see backtest results and make a judgment first. Friction here is a feature, not a bug: it forces the human to acknowledge the backtest output before the ML and AI layers get involved.

Also: Step 1 is expensive (72 methods × ratchet loop). Running it blindly for every scan would waste Binance API calls and sklearn CPU.

04MARKET DATA LAYER

Binance klines, 1000 bars deep

All backtest/WFO/ML training pulls from _scanner_fetch_candles(symbol, interval, limit). Binance's /api/v3/klines hard-caps at 1000 bars — we always fetch the max.

_BINANCE_INTERVAL = {"1D":"1d", "4H":"4h", "1H":"1h", "1W":"1w"}
_DEEP_FETCH_LIMITS = {"1h":1000, "2h":1000, "4h":1000,
                      "6h":1000, "12h":1000, "1d":1000}

What 1000 bars means per timeframe

Interval	Historical window	Bars
1h	~42 days	1000
4h	~167 days (5.5mo)	1000
1d	~2.7 years	1000
1w	~19 years (way past listing date on most alts)	1000

Fallback & alternate sources RESILIENCE

_binance_klines is the primary path. _gateio_klines exists as a fallback for delisted Binance pairs where the ticker moved to Gate.io. fetch_live handles real-time current-bar polling for the Pulse tab.

Exchange-listed API limits are handled via adaptive retry with 500ms backoff on 429 / 418 status codes. No persistent rate limiter — thin wrapper assumes Streamlit's request cadence is low enough.

0522 CAUSAL FEATURES

Every feature is strictly past-looking

Built by _clean_df(df). Audited by lookahead_audit.py — 20/20 CLEAN (synthetic test, 25 random bars, all features match causal recomputation). A deliberately-leaking test feature was correctly flagged — the audit works.

The 11 ML features (subset of 22)

body_pct

Candle body as % of range

vol_mult

Volume vs 7-bar avg

adx

ADX(14) trend strength

di_gap

DI+ − DI− directional spread

atr_ratio

ATR(14) / close — vol regime

ema_score

Alignment of 21/50/200 EMAs

regime_score

Macro regime composite 0–100

candle_rank

Body rank pct in 20-bar window

vol_rank

Volume rank pct in 20-bar window

body_vs_atr

Body / ATR — explosiveness

dist_from_ema21

% distance from EMA21 (signed, flipped for SHORT)

Causality proofs per feature AUDIT

EMA uses .shift(1).ewm() — strictly past, no current bar.
vol_avg_7 uses .shift(1).rolling(7).mean() — past 7 bars, excluding current.
candle_rank_20, vol_rank_20 use .rolling(20).rank(pct=True) — causal in pandas (ranks within window ending at current bar, which is correct — the signal uses current candle's data to score itself).
body_vs_atr and dist_from_ema21_pct are derived from already-causal features.
regime_score composites ADX/F&G/funding/OI — each fetched at or before the bar's timestamp.

Full audit: python lookahead_audit.py BTCUSDT 1d

PROVEN

For SHORT signals, dist_from_ema21 is sign-flipped so "stretched the wrong way" is always a negative number, regardless of direction. This keeps the ML feature space directionally consistent.

06REGIME SCORING

A 0–100 score blending trend, sentiment, funding, OI

calculate_regime_score(df, bar_index, direction, adx_df, timeframe, ticker) composites four inputs into a single bar-level score. The score is cached per bar in _bar_regime_cache so the 72-method backtest doesn't recompute it 72× per candle.

TREND

ADX

DI gap direction

SENTIMENT

F&G

Fear & Greed Index

LEVERAGE

Funding

8h funding rate

POSITIONING

Open interest delta

Soft regime similarity

def _regime_similarity_weight(current, historical):
    return max(0.15, 1 - abs(current - historical) / 100)

Applied to:

Backtest EVw / WRw — analogs from similar regimes pull harder
ML sample_weight — same logic, at the classifier level

NOT applied to WFO — WFO tests generalization across regimes on purpose. Regime-filtering the WFO would defeat the point.

FLOOR

Similarity floor = 0.15 (never 0). Prevents sample-size cliffs on illiquid coins where very few historical analogs match the current regime closely. A far-regime analog still contributes 15% signal.

07SCANNER

Finds breakout candidates across 300+ USDT pairs

Universe built by _scanner_get_universe(min_volume_usdt) — Binance spot USDT pairs with min 24h volume. Each coin scans its last 3 closed candles (not current bar — that's still forming) via _scan_one_symbol → _scanner_score_signal.

Scoring a candidate

_scanner_score_signal(df, adx_df, bar_idx, direction,
                       timeframe, symbol, min_body_pct, min_vol_mult)
#  → returns sig dict  OR  None if below threshold

Returns a sig dict containing: direction, entry/SL/TP prices, body_pct, vol_mult, ADX, DI gap, ATR ratio, EMA alignment score, regime score, bar_index, timeframe, symbol, 11-feature vector extracted for current bar.

Scanner threshold defaults CONFIG

Defaults are intentionally loose — better to surface a borderline signal and let the 3-gate pipeline reject it than miss one entirely.

min_body_pct ≈ 0.60 (body takes up ≥60% of range)
min_vol_mult ≈ 1.8× (volume ≥1.8× 7-bar avg)
ADX filter soft — used in scoring, not hard-cut

Signals below threshold return None and never enter the pipeline.

08STEP 1 — BACKTEST + WFO

72 method combos against historical analogs

When the user clicks "Run Step 1": _scanner_quick_backtest(sig) scans historical bars looking for analogs to the current signal, then simulates all 72 method combinations on those analogs. _scanner_mini_wfo(sig, bt) then runs a rolling purged walk-forward on the full df to test generalization.

Key constants

NEUTRAL_R_THRESHOLD = 0.30        # ±0.30R = NEUTRAL, excluded from ML
MAX_HOLD            = 20          # max bars per trade
FIXED_SL            = 1.5%        # fixed stop width (alt: ATR SL)

MAX_HOLD

Every trade times out after 20 bars. On 4H that's 3.3 days; on daily, 20 days. This is critical for the purge logic — labels span up to 20 bars, so training samples near fold boundaries must be dropped.

09RATCHET FILTER

Progressive relaxation when analogs are scarce

Problem: a strong 85%-body, 5× volume signal has very few exact historical analogs. Requiring 70% of its own body/vol would find 4 samples on ETH 4H. Too few to backtest, way too few to train ML on.

Solution: a ratchet that relaxes the threshold until enough samples are found, with hard floors to prevent pure-noise analogs.

70%

STRICT

55%

TIGHT

45%

NORMAL

35%

RELAXED

25%

LOOSE

20%

FLOOR

# Backtest ratchet (target: 50 bars)
_BT_RATCHET_RATIOS = [0.70, 0.55, 0.45, 0.35, 0.25, 0.20]
_BT_MIN_BODY_FLOOR = 0.20
_BT_MIN_VOL_FLOOR  = 1.10

# ML ratchet (target: 80 samples)
_RATCHET_RATIOS = [0.70, 0.55, 0.45, 0.35, 0.25, 0.20]

Two-pass optimization PERF

The backtest does a cheap _count_passing() scan at each ratchet level first. Only when a level meets the 50-bar target does it run the full 72-method loop once. This avoids 6× redundant simulation.

Why hard floors matter GUARDRAIL

Without floors, a signal on an ultra-rare extreme candle could ratchet all the way down to matching any candle, producing garbage analogs. body_floor=0.20 and vol_floor=1.10 ensure every analog is at least a "real" candle, not dust.

UI badge

The result flows through to a user-facing badge:

STRICT 70% — analogs tightly matched, trust ML probability numerically. RELAXED 45% — broad analog set. LOOSE 20% — very broad; treat ML probability as directional only.

1072-METHOD MATRIX

Every signal tested against 72 trade strategies

The cartesian product of entry × SL × mgmt × TP:

ENTRY_ZONES  = ["Aggressive", "Standard", "Sniper"]              # 3
SL_METHODS   = ["Fixed SL", "ATR SL"]                           # 2
MGMT_MODES   = ["Simple", "Partial", "Partial-NoBE", "Trailing"]   # 4
TP_MULTS     = [2.0, 2.5, 3.0]                                  # 3
#            3 × 2 × 4 × 3 = 72 combinations

Entry zones

AGGRESSIVE

Market entry at signal close. Best fill rate, worst R:R.

STANDARD

38.2%

Fib retracement pullback. Middle ground.

SNIPER

61.8%

Deep retracement. Best R:R, lowest fill rate.

INVARIANT

Zone validity rule: when the retrace entry price falls below the structural SL (happens on big-body candles with large ATR), the zone is added to _invalid_zones and excluded from candidate selection. This is mechanically correct — do not "fix" by widening SL. Widening SL changes the strategy's semantics.

Management modes

Mode	Behavior
Simple	Full size, hold to TP2 or original SL. No BE, no partials.
Partial	TP 50% at 1R, auto-move SL to BE on remaining half.
Partial-NoBE	TP 50% at 1R, keep original SL (real downside remains).
Trailing	Full size, BE at 1R, then trail 0.5×ATR from close.

Why Partial vs Partial-NoBE distinction matters LABEL BUG

Partial trades that hit TP1 then reverse to BE produce r_mult ≈ +0.498R. Mathematically this is positive PnL but the outcome is "basically flat." Labeling these as WIN caused:

ML seeing only wins on trending coins → single-class collapse → can't train
Backtest looking invincible: PF=∞, WR=100% on REZ-like trending alts

Fix: _classify_outcome(r_mult) → WIN / LOSS / NEUTRAL. |r_mult| ≤ 0.30R = NEUTRAL, still counted in PF/WR but excluded from ML labels.

Partial-NoBE was added on Apr 17 because the user believed they were trading that style. Partial was moving SL to BE automatically — a different strategy entirely. Both are now available for honest comparison.

All 72 methods (visual)

Aggro · Fixed · Simple · 2.0

Aggro · Fixed · Partial · 2.0

Aggro · Fixed · P-NoBE · 2.0

Aggro · Fixed · Trail · 2.0

... · 2.5

... · 3.0

Aggro · ATR · Simple · 2.0

Aggro · ATR · Partial · 2.0

Aggro · ATR · P-NoBE · 2.0

Aggro · ATR · Trail · 2.0

... Standard (×24) ...

... Sniper (×24) ...

72 combinations · each with n, WR, EV, EVw, PF, fill_rate, 4-bucket decay breakdown

11TWO CANDIDATES

The dual-candidate decision system

Rather than picking a single "best" method from the 72, the system surfaces two:

🟢 CANDIDATE A — NEWEST

What's working now

Best method in the newest time-decay bucket. Captures current market behavior.

bt["candidate_newest"]

🔵 CANDIDATE B — WEIGHTED

What has worked historically

Best method by decay-weighted all-time EVw. Captures long-run robustness.

bt["candidate_weighted"]

UNANIMOUS

When A == B (same method wins both), the UI flags it as unanimous and ML is trained once instead of twice. Strong signal — the method dominates both recent and all-time.

Time-decay bucket scheme (adaptive)

Sample count	Buckets	Weights (oldest → newest)
n ≥ 400	4	0.40, 0.60, 0.80, 1.00
n ≥ 200	3	0.50, 0.75, 1.00
n ≥ 80	2	0.60, 1.00
n < 80	1	1.00

BUG FIX

best_key must use ev_weighted, not raw ev. Previous bug selected by raw EV, which could pick a method that crushed it in 2021 but has since gone dormant. Using EVw ensures newer trades pull harder on the selection. This is #2 on the do-not-break list.

12PURGED WALK-FORWARD

5 rolling windows with de Prado-style purge + embargo

A single in-sample/out-of-sample cut is one point estimate. Five rolling cuts give a distribution. "5/5 windows OOS PF ≥ 1.0" is real edge; "1 good cut" could be luck.

Rolling cuts at 50/60/70/80/90%

Win 1 · 50%

IS/OOS

Win 2 · 60%

IS/OOS

Win 3 · 70%

IS/OOS

Win 4 · 80%

IS/OOS

Win 5 · 90%

IS/OOS

In-sample Embargo Out-of-sample

Purge + embargo (de Prado Ch.7)

Every trade stores bar_index (entry bar) and label_end_bar (= j, resolution bar). PurgedTimeSeriesSplit drops training samples whose label period overlaps the test fold. Embargo = ceil(0.01 × n_total) after each fold.

CRITICAL

sklearn's default TimeSeriesSplit doesn't know labels span MAX_HOLD=20 bars. A training sample at the fold boundary has its label resolved inside the test fold → leak. On daily bars that's 20 days of leakage. Always PurgedTimeSeriesSplit.

WFO verdict thresholds

n_oos	Verdict
≥ 8	PASS
5–7	BORDERLINE
< 5	INSUFFICIENT

WFO return dict shape SCHEMA

{
  "ok": bool,
  "verdict": "PASS" | "BORDERLINE" | "FAIL" | "INSUFFICIENT",
  "is_pf", "oos_pf", "oos_wr",
  "is_pf_clean", "oos_pf_clean",      # honest PF excl. NEUTRAL
  "oos_n": int,
  "purge_diag": {n_purged, n_embargoed, embargo_bars},
  "label_diag": {n_neutral, raw_pf, honest_pf, pf_inflation_pct},
  "oos_pf_ci": {"lo": float, "hi": float},  # 1000-resample block bootstrap
  "rolling_wfo": {"edge_hit_rate", "windows": [...]},
  "regime_breakdown": {"STRONG": ..., "MID": ..., "WEAK": ...},
  "tier_label": "PURGED IS/OOS split (70%/30%, embargo 1%)",
}

Bootstrap 95% CI on OOS PF UNCERTAINTY

1000-resample block bootstrap on OOS r_mult list. Reports {lo, hi}. Context: n=8 trades with PF=1.3 has CI roughly [0.7, 2.8] — wide, honestly acknowledged. This beats reporting a point estimate as if it were gospel.

Regime-conditional OOS breakdown DIAGNOSTIC

Splits OOS trades by ATR-ratio proxy into STRONG/MID/WEAK regimes. Reports PF/WR per regime. An aggregate PF=1.4 can hide PF=2.8 in STRONG and PF=0.9 in WEAK — the breakdown makes this visible and actionable.

Fill-rate / survivor-bias diagnostic HONEST

Each method combo tracks:

n_qualifying — signals passing the filter
n_filled — trades that actually entered the zone
n_expired — never retraced to fill
fill_rate = n_filled / n_qualifying

Standard/Sniper zones on trending coins mostly don't fill (price never retraces). The backtest only sees the rare pullback-continuation subset → survivor bias inflates WR. fill_rate makes this visible. AI prompt warns when fill_rate < 40% on ≥20 qualifying signals.

13STEP 2 — ADAPTIVE ML

Classifier picked by sample count

Different sample sizes need different models. Too few samples and a heavy model overfits; too many and a simple model underutilizes the data.

n samples	Model	Why
n < 20	Heuristic fallback	No training. Deterministic rule-based probability.
n < 50	Logistic Regression	Least likely to overfit on small n. Pipeline with StandardScaler, class_weight=balanced.
50–149	Random Forest	n_estimators=150, max_depth=5, min_samples_leaf=5. Robust to feature scale.
≥ 150	Gradient Boosting	n_estimators=150, max_depth=3, lr=0.05, subsample=0.8. Best generalization at scale.

Calibration wrapper

if n_samples >= 60:
    model = CalibratedClassifierCV(model, method="isotonic", cv=3)

WHY

Uncalibrated RF/GB are systematically overconfident. Without isotonic calibration, a raw "68% probability" might really mean 53%. Calibration makes thresholds numerically meaningful. Only applied when n≥60 to avoid overfitting the calibration itself on tiny samples.

CV splitter

cv = PurgedTimeSeriesSplit(
    n_splits=min(5, n // 15),
    embargo_pct=0.01,
)
# Never sklearn's default TimeSeriesSplit — would leak labels at fold boundaries.

Training labeling: by method outcome LABELS

_scanner_train_ml labels historical candles by the chosen method's WIN/LOSS/NEUTRAL outcome — not by a fixed Aggressive/Simple baseline.

Reason: the ML should learn what works for the specific method you'll actually trade, not a generic proxy method. If Cand A is "Sniper · ATR · Partial-NoBE · 2.5", the ML learns to predict WIN for that exact configuration on historical analogs.

When Cand A ≠ Cand B, the ML trains twice — once per candidate.

14NEUTRAL LABELING

The `|r_mult| ≤ 0.30R` band excluded from ML

def _classify_outcome(r_mult):
    if r_mult >  NEUTRAL_R_THRESHOLD:  return "WIN"
    if r_mult < -NEUTRAL_R_THRESHOLD:  return "LOSS"
    return "NEUTRAL"

NEUTRAL trades still counted in PF/WR accounting (the money is real).
NEUTRAL trades excluded from ML labels (they'd cause single-class collapse on trending coins).
WFO reports honest_pf (excluding NEUTRAL) alongside raw PF.

SAVED

Before this fix, REZ-like trending alts on Partial+BE showed WR=100%, PF=∞ — unusable. After: the ±0.30R band is called what it is (flat), PF becomes realistic, ML can actually train.

15SAMPLE WEIGHTING

Every training sample gets a two-factor weight

sample_weight = time_decay_bucket_weight × regime_similarity_weight

TIME DECAY

Recent > old

Sample in newest bucket weighted 1.0, oldest bucket 0.40 (at n≥400). Aligns ML with backtest's EVw.

REGIME SIMILARITY

Floor 0.15

GREEN-regime signal shouldn't learn equally from RED-regime historical trades. Soft weighting, never zeroes.

Pipeline sample_weight error & fix BUGFIX

CalibratedClassifierCV wrapping a Pipeline raises various exception types (not just TypeError) when sample_weight is passed. Old code only caught TypeError and jumped to heuristic fallback.

Fix: broadened to except Exception on weighted fit. Falls back to unweighted fit before giving up and going to heuristic. Applied to both main fit and CV loop.

16STEP 3 — AI VERDICT

Groq reasoning with canonical prices

_scanner_ai_verdict(sig, ml_a, ml_b, bt, wfo, cand_a, cand_b)
#  → {candidate_a, candidate_b, winner, winner_rationale}

Model selection

Model	Use
openai/gpt-oss-120b	DEFAULT Strongest free reasoning on Groq
openai/gpt-oss-20b	Faster, slightly weaker
qwen/qwen3-32b	Alt reasoning
llama-3.3-70b-versatile	Fallback
meta-llama/llama-4-scout-17b-16e-instruct	Fast fallback

reasoning_effort="medium" only for gpt-oss/qwen. max_tokens=2500. Timeout 60s.

Price hallucination prevention

_compute_candidate_prices(cand, sig) is the single source of truth for entry/SL/TP1/TP2 prices. Both the UI cards and the AI prompt read from it. The AI prompt contains an explicit EXECUTION PRICES block with a "copy verbatim" instruction.

HISTORY

Early AI verdicts generated plausible-sounding but wrong prices — mixing between candidates. Bug fixed Apr 15 by injecting canonical prices into the prompt with strict copy-verbatim instruction. Now UI and AI always agree.

Dual verdict rules

When A == B (unanimous): single analysis mirrored to both sides.
Both TRADE → AI picks stronger as winner.
Only one TRADE → that one wins.
Neither → winner = NONE.

17SETUP GRADING

Dual-candidate-aware A+ / A / B / C

_scanner_setup_grade(sig, ml, bt) grades by best-of the two candidates. A previous bug read from legacy aggregate bt["win_2r"], which made Cand A excellent + aggregate bad return "C — Backtest negative" (false negative).

Rescue rules

"B rescue" — if any candidate is tradeable AND ml_pct ≥ 60, grade won't drop below B.
Grade color follows the badge palette: A+ A B C.

18PULSE INTEL

On-chain Nansen-lite, free tier

get_pulse_intel(symbol,
                etherscan_api_key, lunarcrush_api_key, solscan_api_key)
#  → composite_score (-15 to +15)
#  → composite_label: STRONGLY BULLISH / BULLISH / NEUTRAL / BEARISH / STRONGLY BEARISH

Composite weights

FLOW · 40%

CEX net

Etherscan (ERC-20) or Solscan (SPL). Net inflow = bearish, outflow = bullish.

TVL · 35%

DefiLlama

TVL 24h + 7d delta. Score −10 to +10. No API key needed.

SOCIAL · 25%

LunarCrush

Galaxy Score + sentiment + alt rank. Free v4 API.

Per-token composite scaled ×1.2 → ±12. Macro modifier (±3) added on top → final ±15.

Verdict	Score band
STRONGLY BULLISH	≥ +10
BULLISH	+4 to +9
NEUTRAL	−3 to +3
BEARISH	−4 to −9
STRONGLY BEARISH	≤ −10

Cache TTLs PERF

TVL: 3600s (updates slowly)
Flow: 900s
Social: 1800s
Macro: 14400s

19DO-NOT-BREAK LIST

Things that must stay exactly as they are

LAWS

Every item here was a real bug that cost real work to diagnose. Re-introducing any of them resets progress.

Never use sklearn TimeSeriesSplit — always PurgedTimeSeriesSplit. Label-period overlap leaks.
best_key uses ev_weighted, not raw ev. Fixed bug.
Don't remove label_end_bar from trades_raw. Purge depends on it.
Don't remove the ratchet. High-vol signals get 0–6 training samples without it.
Don't remove NEUTRAL classification. Prevents single-class ML collapse on trending coins.
Zone validity is correct. Do NOT widen SL to "fix" it.
WFO is NOT regime-filtered by design. Keep it that way.
_compute_candidate_prices is the single source of truth for AI prompt + UI.
Dead code in render_auto_analyzer (manual_sig=None, _manual_render_signals block ~L4394-4430). Harmless. Do NOT activate without a dedicated session.
def main(): must exist. A prior str_replace edit accidentally removed it, causing NameError on deploy. Always verify before shipping.

Intentionally NOT built

Wide-SL toggle for big-body candles

Would change strategy semantics. Revisit after 30+ days of journal data if "zone unavailable" costs meaningful edge.

Cross-coin feature pooling (master model)

Ratchet fix solved most sample starvation. Illiquid alts probably shouldn't be traded if no historical analog exists. Revisit after journal data.

Regime-conditional ML (separate models per regime)

Soft regime weighting in sample_weight already addresses this with less complexity. Needs ≥30 samples per regime to be reliable — most coins won't hit that.

ICT / S&R / trendline / volume-profile modules

Do not build additional strategies until momentum candle has 30+ days of live journal data proving edge. Building untested systems on top of an unvalidated primary is premature optimization.

Current state

✅ SHIPPED

Lookahead audit 20/20 CLEAN
Purge + embargo (PurgedTimeSeriesSplit)
Soft regime filtering
Rolling WFO + bootstrap CI + regime breakdown
Fill-rate survivor-bias diagnostic
4 MGMT modes (incl. Partial-NoBE)
NEUTRAL label option A
Manual tab enrichment to match Scanner

🚧 BACKLOG

Dead-code cleanup in render_auto_analyzer
Zone-summary table in Manual
Confluence Grade breakdown in Manual
Meta-labeling (2nd classifier)
Pulse as ML feature
CPCV + PBO for QUANTFLOW
IDX BSJP port (yfinance .JK)

Momentum Candle BacktesterInternals & Decision Record

A three-gate validation pipeline for altcoin breakout candles

Three files. One brain.

Dependencies

Three tabs, three concerns

One signal, three steps, one verdict

Binance klines, 1000 bars deep

What 1000 bars means per timeframe

Every feature is strictly past-looking

The 11 ML features (subset of 22)

A 0–100 score blending trend, sentiment, funding, OI

Soft regime similarity

Finds breakout candidates across 300+ USDT pairs

Scoring a candidate

72 method combos against historical analogs

Key constants

Progressive relaxation when analogs are scarce

UI badge

Every signal tested against 72 trade strategies

Entry zones

Management modes

All 72 methods (visual)

The dual-candidate decision system

Time-decay bucket scheme (adaptive)

5 rolling windows with de Prado-style purge + embargo

Rolling cuts at 50/60/70/80/90%

Purge + embargo (de Prado Ch.7)

WFO verdict thresholds

Classifier picked by sample count

Calibration wrapper

CV splitter

The |r_mult| ≤ 0.30R band excluded from ML

Every training sample gets a two-factor weight

Groq reasoning with canonical prices

Model selection

Price hallucination prevention

Dual verdict rules

Dual-candidate-aware A+ / A / B / C

Rescue rules

On-chain Nansen-lite, free tier

Composite weights

Things that must stay exactly as they are

Intentionally NOT built

Current state

Momentum Candle Backtester
Internals & Decision Record

The `|r_mult| ≤ 0.30R` band excluded from ML