Strategy Leaderboard · BTC Trading AI

🟢 En producción (deployed)

V66 (V2-V6 bots) HybridV115Trader.py · canonical

PRODUCTION

GRU 2 capas × 128 hidden, 5-seed ensemble (42/123/456/789/1024). Always-invested + 5-level exit cascade (init_sl → trail ratchet → DRSI > 78 → GRU danger vote → peak_drop).

Indicadores

RSI(14)MACD line/signal/histATR(14)Bollinger BandsSMA daily (slope/distance)Daily RSIVolume z-scoreReturn percentiles

Métricas

Backtest compound+70,576%

Min α (worst fold)+245%

Beats B&H4/4 folds

OOT 2026 PnL-7.6%

OOT 2026 alpha+11.4%

Trades/fold~75

Bootstrap CIF1/F2 fragile (P(α<0) 7-11%)

Cómo funciona

V66 entra siempre con todo el capital. Sale cuando: stop fijo 10%, trail tight (3 niveles según unrealized: 22%→18%→5%), DRSI diario > 78 (overbought), 3/5 GRUs votan danger (P(SL)>0.70 o P(TP)<exit_th), o peak_drop (caída del avg_tp_bar desde su pico). Cooldown post-loss 48 bars + post-any-exit 4 bars. Threshold por régimen (bull/lateral/bear) detectado vía slope diario.

Recomendación

MANTENER como sleeve principal. Hyperparams calibrados rigurosamente — cualquier modificación destruye performance (probado 13 veces en sprint R141-R158, R162, R168-R169).

🏆 Candidato a deployment (validado audit + OOT)

⚠️ AUDIT POST-MORTEM (R173): R163/R170-B Pyramid + R166 Combined eran 100% PHANTOM (apalancamiento sin coste). Marcados como REJECT. Solo R151-A sobrevive el bug-fixed engine. Numbers below son HONESTOS bajo audit.

R151-A Shorts ★ (ONLY survivor of sprint audit) requires Binance perp · validated R174+R177

CANDIDATE

V66 long sleeve canonical (unchanged) + módulo SHORT: cuando V66 está en cash, usa P(SL) del MISMO GRU ensemble canónico. Si P(SL) > 0.65 (3/5 seeds), abre short. Cierra en P(TP) > 0.35, stop 8%, o 40h timeout. Cash properly tracked (no phantom leverage).

Indicadores

Mismo que V66 (RSI, MACD, ATR, Bollinger, daily SMA/RSI)

Métricas

Backtest compound (bug-fixed)+113,981% — REAL

Min α (worst fold)+194 (parejo con V66 +193)

Beats B&H4/4 folds

OOT 2026 PnL (bug-fixed)-3.7% (vs BH -19%)

OOT 2026 alpha+15.3% (vs V66 +12.4%)

Edge vs V66 alone (honest)+167% compound, +2.9pp OOT

Shorts triggered (8y backtest)94, WR 54%

Cómo funciona

Aprovecha que el GRU canónico predice CAÍDAS además de subidas. V66 lo usa solo para SALIR de longs. R151-A lo usa también para ABRIR shorts durante los gaps en cash de V66. Captura alpha bear/correction que V66 deja sobre la mesa. Modesta pero REAL — sobrevive bug-fixed engine + audit + OOT 2026.

Recomendación

★ DEPLOYMENT LEADER (único después de R173 audit). Build R151ShortsTrader.py extender HybridV115Trader. Requires Binance perp BTCUSDT. Add funding cost overlay realista. Paper trade 60 días.

❌ R170-B Pyramid (KILLED by R173 audit) PHANTOM alpha — DO NOT DEPLOY

FAILED

Pyramid module añadía leg de tamaño 0.65 × notional cuando posición tenía +12% unrealized. PROBLEMA: cap=0 después de V66 entry (always-invested) pero pyramid add gastaba $6,500 desde nothing = phantom money. Free leverage 1.75× sin coste.

Indicadores

—

Métricas

Backtest ORIGINAL+257,754% (PHANTOM)

Backtest HONEST (R173/R174)+42,636% IDENTICAL to V66 alone

Inflation factor6.05× (100% phantom)

Pyramid attempts blocked (cap=0)517/517

Cómo funciona

En backtest aparenta +258K compound pero el motor permitía apalancamiento sin coste (Bug #4 audit). En honest engine, cada uno de los 517 intentos de pyramid se bloquea porque V66 está always-invested (cap=0). En spot Binance no existe leverage gratis. En perp pagas funding. R163 baseline tiene exactamente el mismo problema.

Recomendación

NO desplegar. Pyramid se podría re-explorar con modelado realista de margin trading + funding cost overlay — pero coste-beneficio incierto.

R151-A Shorts requires Binance perp · funding risk

CANDIDATE

V66 long sleeve (canonical) + módulo SHORT: cuando V66 está en cash, mira P(SL) del MISMO GRU ensemble. Si P(SL) > 0.65 (3/5 seeds) y P(TP) < 0.35, abre short. Cierra en P(TP) > 0.35 (safe), stop 8% en contra, o timeout 40h.

Indicadores

Mismo que V66

Métricas

Backtest compound+212,591% (3× V66)

Min α (worst fold)+245%

Beats B&H4/4 folds

OOT 2026 PnL-5.2% (vs BH -19%)

OOT 2026 alpha+13.8%

Friction-adjusted compound+89,759% (con funding 0.01%/8h + slippage + half-Kelly)

Bootstrap CIF2 fragile (P(α<0)=17%)

Cómo funciona

Aprovecha que el GRU canónico predice CAÍDAS (P(SL)) además de subidas. V66 lo usa solo para SALIR de longs. R151-A lo usa también para ABRIR shorts durante los gaps en cash de V66. Captura alpha bear/correction que V66 deja sobre la mesa.

Recomendación

Tercera opción si quieres exposición direccional simétrica (longs + shorts). Requiere Binance perp BTCUSDT — costo de funding ~0.01%/8h. Tail risk en F2 (recovery) documentado. Half-Kelly sizing reduce varianza.

⚠️ Validadas en backtest pero NO en OOT

R166 COMBINED (R163 + R151-A) super-additive in-sample · interferes in OOT

EXPERIMENTAL

V66 long + pyramid en posiciones long + shorts durante cash gaps. Combina los dos breakthroughs del sprint.

Indicadores

Mismo que V66

Métricas

Backtest compound+454,068% (6.4× V66)

Min α (worst fold)+287%

Beats B&H4/4 folds

OOT 2026 PnL-5.2% (¡no mejor que R151-A!)

OOT 2026 alpha+13.8% (cannibalization)

Cómo funciona

En backtest aparenta super-aditividad: pyramid captura F2/F3 (recovery/lateral) mientras shorts capturan F0/F1 (bear/bull). Pero en OOT 2026 los shorts triggerearon en el mismo gap window donde pyramid iba a fire → cannibalización.

Recomendación

NO desplegar como está. Re-explorar si pyramid y shorts pueden usar CAPITAL DIFERENTE (no compartido) o si pyramid puede operar dentro de positions existentes mientras shorts operan en sub-account.

❌ Direcciones cerradas (13 fine-tuning attempts)

Round	Tried	Result
`R141`	Loose event filter (return_pct=80)	compound +4.6%, min α -220
`R142`	Tight event filter (return_pct=95)	compound +38%, min α -216
`R143`	Labeling tb3_vol_15_10	compound +6K, min α +15
`R144`	Bigger arch GRU 2x256	compound -38%, min α -203
`R146`	ATR-adaptive stops	all variants worse than canonical 10%
`R148`	R134 as veto filter	skip rate 92-97%, kills V66
`R150`	Max-DD circuit breaker	no DD/compound tradeoff worth it
`R155/R156`	Short-specific GRU (tb3_vol_7_10)	P(SL) noisier than canonical
`R157/R158`	Short-specific GRU (tb3_vol_15_5)	catastrophic, min α -290
`R162`	R134 as size multiplier	forces V66 to size down at wrong times
`R168/R169`	Multi-timeframe features (1H+4H)	val_loss better but backtest collapses

Lección consolidada: 13 fine-tuning attempts vs V66 fallaron. 2 paradigmas funcionaron (R163 pyramid, R151-A shorts) — ambos AÑADEN sleeves a V66 sin modificarla. Regla de oro: ADD orthogonal sleeves, NEVER modify V66 inputs/architecture.

📊 Ranking POST-AUDIT (R173 bug-fixed engine, OOT 2026)

Rank	Strategy	OOT 2026 α	Backtest min α	Backtest compound	Status	Recomendación
1	R151-A Shorts (honest)	+15.3%	+194	+113,981%	CANDIDATE	★ DEPLOY V7
2	V66 canonical (current, honest)	+12.4%	+193	+42,636%	PRODUCTION	Mantener V2-V6
3	R151-A half-Kelly (honest)	+11.5%	+178	+76,873%	VARIANT	Conservative alt
—	R163 Pyramid baseline	+60.3% (PHANTOM)	+373 (PHANTOM)	+199K (PHANTOM)	FAILED	NOT REAL (R173 audit)
—	R170-B Pyramid	+71.9% (PHANTOM)	+415 (PHANTOM)	+257K (PHANTOM)	FAILED	NOT REAL (R173 audit)
—	R166 COMBINED	+13.8% (artifact)	+287 (PHANTOM)	+454K (PHANTOM)	FAILED	Cannibalization + audit
—	R175 Transformer	—	-224	+91.9%	FAILED	Calibration mismatch

Note: Original "RECORD" rankings of R163/R170-B/R166 from R163-R171 were inflated 1.66-6.05× by R173 audit-discovered bugs (intra-bar lookahead + unfunded leverage). Numbers above for those rows show ORIGINAL (PHANTOM) values for reference — these strategies cannot be deployed because they rely on impossible cash mechanics.