Think like a quant.
Trade like a machine.
From basic math transformations to advanced statistical validation, this handbook acts as a production-grade blueprint for automated strategy creation. Handcrafted for Python + Pandas. Return to this reference whenever you develop new models.
What Algo Trading Actually Is
At its core, algorithmic trading is simple: you write rules, the computer follows them exactly, and you measure whether those rules made money. No emotion. No gut feeling. No "I think this stock looks good." Just rules and math.
While a manual retail trader might think, "Reliance looks strong today, I'll buy some," an algorithmic trader designs a system that evaluates historical data parameters: "Buy Reliance when its 10-day average price crosses above its 50-day average, with volume at least 20% above its 20-day average volume. Sell when the 10-day crosses back below the 50-day." That's a strategy.
Why Systematic Strategies Work
If prices were random, no strategy could work. But **human psychology creates repeatable patterns** — panic selling, momentum chasing, and overreactions to news events.
Large funds can't buy ₹10,000 crore overnight — they take weeks. This creates **persistent price trends** that systematic strategies can ride.
A mediocre strategy applied with **perfect discipline beats a great strategy** applied inconsistently. Algos never have bad days or FOMO.
A computer can **scan 500 stocks in seconds**, apply 20 conditions to each, and execute in milliseconds. Humans can't compete on breadth.
The 6-Stage Quant Workflow
Every algo trading operation — from a retail Python script to a multi-billion dollar hedge fund — follows this iterative loop. Click each stage to see what actually happens there.
Stage 1 — The Idea (Hypothesis)
Everything starts with a hypothesis about why a pattern should exist. "Stocks that cross above their 50-day average tend to keep rising." "When RSI drops below 30, the stock is oversold and usually bounces back." These conditions are called **signals**.
Good ideas come from: academic research (momentum, mean-reversion are extensively documented), market microstructure (how liquidity works), or just observing price patterns. Bad ideas come from random chart gazing without a structural why.
Stage 2 — Get Data
With yfinance, one line gets you years of daily OHLCV data (Open, High, Low, Close, Volume) for any ticker. For NSE stocks, append .NS to the symbol: RELIANCE.NS, TCS.NS, HDFCBANK.NS.
Data quality matters: auto_adjust=True adjusts for splits and dividends automatically — always use this. Missing data and corporate actions (splits, mergers) are the most common source of silent bugs in backtests.
import yfinance as yf
# Single stock
df = yf.download('RELIANCE.NS', start='2019-01-01', end='2024-12-31', auto_adjust=True)
# Multiple stocks at once
tickers = ['RELIANCE.NS', 'TCS.NS', 'HDFCBANK.NS', 'INFY.NS']
data = yf.download(tickers, start='2019-01-01', auto_adjust=True)
Stage 3 — Build the Strategy
Turn your hypothesis into **precise, unambiguous rules**. "It looks bullish" is not a rule. "The 10-day SMA crosses above the 50-day SMA AND today's volume is > 1.2× the 20-day average volume" is a rule.
Every strategy needs: an **entry condition** (when to buy), an **exit condition** (when to sell), and a **position size rule** (how much to buy). Missing any one of these is not a complete strategy.
Stage 4 — Backtest
Run your rules on historical data and simulate what would have happened. The critical discipline: **never look at the OOS (out-of-sample) data while building**. Hold back 20–30% of your data as a clean validation set.
Vectorized backtesting (what we do with pandas) simulates all trades at once using array operations — fast, simple. Event-driven backtesting (backtesting.py, zipline) simulates day-by-day like a real system — more realistic, captures things like partial fills.
Stage 5 — Analyse the Output
Most beginners either trust a good-looking return number blindly, or drown in metrics they don't understand. Section 09 of this guide decodes every number.
The key questions: Did it beat buy-and-hold? Is the Sharpe Ratio > 1? Is the max drawdown something you could psychologically survive? Are there enough trades to be statistically meaningful (>30)?
Stage 6 — Deploy / Iterate
Paper trading first — most brokers offer paper trading (Zerodha Sensibull, Interactive Brokers paper, etc.). Run the strategy for at least 1–3 months on paper before risking real money. Real markets have slippage, partial fills, and data feed delays that backtests miss.
Usually you find problems and go back to Stage 1. This is normal. Even professional quants iterate dozens of times. The loop is the process.
Your First Strategy — SMA Crossover
The SMA crossover is the "Hello World" of algo trading. Simple enough to understand completely, complex enough to teach you all the core concepts. The idea: when the short moving average crosses above the long one, a new uptrend may be starting.
import yfinance as yf
import pandas as pd
import numpy as np
# ── STEP 1: GET DATA ───────────────────────────────────
# .NS suffix = NSE (National Stock Exchange India)
# auto_adjust=True → splits and dividends handled automatically
ticker = 'RELIANCE.NS'
start_date = '2020-01-01'
end_date = '2024-12-31'
data = yf.download(ticker, start=start_date, end=end_date, auto_adjust=True)
data = data[['Close']].copy()
# ── STEP 2: COMPUTE MOVING AVERAGES ───────────────────
# SMA(10) = average of last 10 days → reacts fast to price changes
# SMA(50) = average of last 50 days → shows the bigger, slower trend
# When fast > slow: uptrend. When fast < slow: downtrend.
short_window = 10
long_window = 50
data['SMA_short'] = data['Close'].rolling(window=short_window).mean()
data['SMA_long'] = data['Close'].rolling(window=long_window).mean()
# ── STEP 3: GENERATE SIGNALS ──────────────────────────
# Signal = 1 when we SHOULD be in the market, 0 when we shouldn't
# Short MA above long MA = bullish = hold the stock
data['Signal'] = 0
data.loc[data['SMA_short'] > data['SMA_long'], 'Signal'] = 1
# Position = CHANGE in signal
# +1 = just crossed up → BUY today
# -1 = just crossed down → SELL today
# 0 = no change, hold current position
data['Position'] = data['Signal'].diff()
# ── STEP 4: CALCULATE RETURNS ─────────────────────────
# Daily return = today's % price change
# Strategy return = daily return ONLY on days we're in the market
# .shift(1) is CRITICAL: use yesterday's signal for today's return
# (In real life, you see the crossover after market close,
# so you can only act the NEXT day's open)
data['Daily_Return'] = data['Close'].pct_change()
data['Strategy_Return'] = data['Daily_Return'] * data['Signal'].shift(1)
# Cumulative returns: how ₹1 grows over time
data['Cum_Market'] = (1 + data['Daily_Return']).cumprod()
data['Cum_Strategy'] = (1 + data['Strategy_Return']).cumprod()
# ── STEP 5: COMPUTE METRICS ───────────────────────────
years = len(data) / 252 # 252 = avg trading days in a year
total_ret = (data['Cum_Strategy'].iloc[-1] - 1) * 100
cagr = (data['Cum_Strategy'].iloc[-1] ** (1/years) - 1) * 100
mkt_cagr = (data['Cum_Market'].iloc[-1] ** (1/years) - 1) * 100
# Sharpe: annualized excess return / annualized volatility
daily_avg = data['Strategy_Return'].mean()
daily_std = data['Strategy_Return'].std()
sharpe = (daily_avg / daily_std) * (252 ** 0.5) if daily_std != 0 else 0
# Max drawdown: worst peak-to-trough loss
rolling_max = data['Cum_Strategy'].cummax()
drawdown = (data['Cum_Strategy'] - rolling_max) / rolling_max
max_dd = drawdown.min() * 100
# Win rate
in_market = data[data['Signal'].shift(1) == 1]['Strategy_Return']
win_rate = (in_market > 0).sum() / len(in_market) * 100 if len(in_market) > 0 else 0
gross_profit = in_market[in_market > 0].sum()
gross_loss = abs(in_market[in_market < 0].sum())
profit_factor = gross_profit / gross_loss if gross_loss != 0 else float('inf')
num_trades = (data['Position'] == 1).sum()
print(f"CAGR: {cagr:+.1f}% (market: {mkt_cagr:+.1f}%)")
print(f"Total Return: {total_ret:+.1f}%")
print(f"Sharpe Ratio: {sharpe:.2f} (>1 = good)")
print(f"Max Drawdown: {max_dd:.1f}%")
print(f"Win Rate: {win_rate:.1f}%")
print(f"Profit Factor: {profit_factor:.2f}")
print(f"Num Trades: {num_trades}")
How to Try Variations
Try TCS.NS, INFY.NS, TATASTEEL.NS, NIFTYBEES.NS (Nifty ETF). Each has a different personality — tech stocks trend differently than cyclicals.
Try 5/20 (faster, more trades), 20/100 (slower, fewer trades), 50/200 (the "Golden Cross" — very well known). Notice how metrics shift.
Only enter when Volume > 1.2 × Volume.rolling(20).mean(). This filters low-conviction crossovers. Observe if win rate improves.
Replace .rolling(n).mean() with .ewm(span=n).mean(). EMA reacts faster — signals come earlier but may be noisier.
Reading Your Backtest Results
When you run that script, you get a block of numbers. Here's exactly how to interpret them — with a good example and a bad example that looks deceptively okay.
CAGR: +18.2% (market: +12.1%)
Total Return: +142%
Sharpe Ratio: 1.4 ← above 1.0
Max Drawdown: -18% ← survivable
Win Rate: 54%
Profit Factor: 1.8 ← gains > losses
Num Trades: 38 ← statistically ok
CAGR: +14% (market: +12%)
Total Return: +96%
Sharpe Ratio: 0.6 ← poor risk/reward
Max Drawdown: -55% ← could you hold?
Win Rate: 62% ← looks good...
Profit Factor: 0.9 ← losses > gains!
Num Trades: 4 ← just luck
The Questions to Ask Every Time
| Question | What to look for | Red Flag |
|---|---|---|
| Did it beat buy-and-hold? | Strategy CAGR > market CAGR by a meaningful margin | Barely beats market → not worth the execution complexity |
| Is risk-adjusted return good? | Sharpe Ratio > 1.0 | Sharpe < 0.7 → returns don't justify the trading volatility |
| Could you survive the drawdown? | Max drawdown < 20-25% | >40% drawdown → most people quit before recovery occurs |
| Enough trades to be real? | >30 trades minimum, >100 preferred | <10 trades → purely luck, proves nothing |
| Do gains outsize losses? | Profit Factor > 1.35 | < 1.0 → you're actually losing money overall after transaction costs |
| Does it beat the market after costs? | Positive after ₹40/trade + 0.1% slippage | Strategy returns evaporate after modeling execution fees |
Strategy Types — What Edge Are You Exploiting?
Every strategy is a bet on a specific **market inefficiency**. Knowing which type you're running determines what data matters, how to judge results, and what structural failure looks like.
Trend Following
The oldest and most robust category. Assets that have been rising tend to continue rising — because **institutional fund flows are persistent**. Large funds can't buy ₹10,000 crore in a day, so they accumulate over weeks, creating trends.
The counterintuitive truth: Trend strategies lose more often than they win. A 40% win rate can still be very profitable if winning trades are 3× the size of losing trades. Never judge a trend strategy by win rate alone.
Signals: SMA/EMA crossovers, ADX > 25 (confirms trend), Ichimoku cloud position, Donchian channel breakout.
Mean Reversion
Bets that prices which have moved too far from their average will snap back. Works because **short-term returns are negatively autocorrelated** — statistically, extreme single-day moves tend to partially reverse.
The Trap: High win rate feels great until a trending move wipes out 10 wins at once. Always use a strict stop-loss. The classic blowup is "it'll come back" — sometimes it doesn't.
Signals: RSI < 30 / > 70, Bollinger Band lower/upper touch, Z-score of price deviation from MA.
Momentum (Cross-Sectional)
Based on the academically documented fact that **stocks outperforming in the last 3–12 months tend to keep outperforming for the next 1–3 months**. One of the most replicated anomalies in finance literature.
Cross-sectional: Rank a universe of stocks by their 12-1 month return (skip the last month — it tends to reverse). Go long the top decile, short the bottom. Market-neutral.
Time-series: Go long a stock only if its own recent return is positive.
Breakout
Enters when price breaks through a resistance or support level. Theory: when a level is broken, trapped traders on the wrong side accelerate the move by covering positions.
Signals: Donchian channel breakout (N-day high), volume confirmation (>150% of 20-day avg — critical), ATR-based price targets.
Statistical Arbitrage (Pairs Trading)
Exploits mispricings between statistically related instruments. If HDFCBANK and ICICIBANK historically move together (cointegrated), when they diverge, short the outperformer and long the underperformer — betting on convergence.
Z = (Spread - mean(Spread)) / std(Spread)
Enter long spread when Z < -2, exit when Z → 0
Enter short spread when Z > +2, exit when Z → 0
Test cointegration with statsmodels.tsa.stattools.coint(). If p-value < 0.05, the pair is statistically cointegrated and suitable for pairs trading.
ML-Based Strategies
Uses machine learning to find non-linear patterns in features. The dirty secret: simpler models generalize better in finance. XGBoost usually beats LSTM for tabular price data. LSTM sounds more impressive but overfits harder.
What works: ML on classification (will price be higher in 5 days? yes/no) or ranking (which of these 50 stocks will do best?). ML on exact price prediction almost never works — the signal-to-noise ratio in price data is too low.
Indicators — The Math Behind Them
Indicators are not magic. They are **mathematical transformations of price and volume**. Understanding their formulas tells you exactly what they measure — and exactly where they lie to you.
SMA vs EMA — The Foundation
EMA — Exponential Moving Average EMA(today) = Price(today) * k + EMA(yesterday) * (1 - k)
k = 2 / (n + 1) → EMA(10): k = 0.1818 → 18.2% weight on today's price
vs SMA(10): only 10% weight on today's price
EMA reacts faster to recent moves (useful in fast markets, noisier). SMA is smoother (better for trend identification, less whipsawing). For short-term signals, use EMA. For long-term trend detection, SMA is fine.
RSI — What It Actually Measures
RSI = 100 - (100 / (1 + RS))
Extreme moves: RS=9 → RSI=90 (9× more up days). RS=0.11 → RSI=10 (9× more down days)
| RSI Level | Traditional Read | Reality |
|---|---|---|
| > 70 | Overbought → sell | In strong uptrends, RSI stays >70 for weeks. Selling purely on this signal in a bull market is expensive. |
| 50–70 | Bullish momentum | RSI crossing 50 from below is a reliable trend-confirmation signal. Better use than the 70 level. |
| 30–50 | Bearish momentum | RSI crossing 50 from above signals building selling pressure. |
| < 30 | Oversold → buy | In downtrends, RSI stays <30 for days. "Catching falling knives" without a trend filter is dangerous. |
Bollinger Bands — Volatility Encoded
Upper Band = SMA(20) + 2 * StdDev(Close, 20)
Lower Band = SMA(20) - 2 * StdDev(Close, 20)
%B = (Close - Lower) / (Upper - Lower) → 0 = at lower, 1 = at upper, 0.5 = at middle
BB Width = (Upper - Lower) / Middle → measures current volatility relative to average
The bands encode **volatility** — they expand in high-vol periods and contract in low-vol periods. Tight bands (low BB Width) = volatility compression = potential breakout loading. Wide bands = high volatility = mean reversion more likely.
MACD — Three Signals in One
Signal Line = EMA(9) of MACD Line ← smoothed version of MACD
Histogram = MACD Line - Signal Line ← momentum of momentum
The histogram is the most useful part. **Growing histogram = momentum building.** Shrinking histogram = momentum fading — often precedes a crossover signal by a few bars, giving earlier warning than the raw crossover.
ATR — For Stops and Sizing
ATR(14) = Wilder smooth of 14-period True Range averages
If ATR = ₹50: the stock moves ~₹50 on an average day (normal noise)
ATR is critical for stop-loss placement. A stop-loss at ₹20 on a stock with ATR=₹50 guarantees you get stopped out by random daily noise. **Rule: stops at 1.5×–2× ATR from entry.**
ADX — Is There a Trend at All?
ADX 20–25: Trend developing → caution
ADX 25–40: Strong trend → ideal for trend-following entry
ADX > 40: Very strong trend → consider partial profit-taking
Signal Engineering
A raw indicator reading is not a signal. A **signal is a precise, testable rule with a binary output: enter or exit**. The engineering between "RSI is low" and a real trading signal is where most of the actual edge comes from.
Signal Quality Hierarchy
RSI < 30 → buy. Maximum false signals. Works occasionally by coincidence, not by design.
High NoiseRSI < 30 AND price > SMA(200). Filters out bear market trades. Better, but still incomplete.
BetterRSI < 30 AND SMA(50) > SMA(200) AND ADX > 20. Only trades in the right market environment.
StrongerDaily trend up → zoom into 4H → RSI oversold there → enter. Higher TF sets direction, lower TF gives timing.
Professional GradeRegime-Aware Signal Logic
# Compute regime indicators first
data['SMA200'] = data['Close'].rolling(200).mean()
data['BB_width'] = (data['BB_upper'] - data['BB_lower']) / data['SMA20']
# (assumes you've computed ADX, BB_upper, BB_lower, SMA20 already)
def get_regime(row):
if row['ADX_14'] > 25 and row['Close'] > row['SMA200']:
return 'trending_up'
elif row['ADX_14'] > 25 and row['Close'] < row['SMA200']:
return 'trending_down' # stay flat or short
elif row['BB_width'] < 0.05:
return 'squeeze' # breakout strategy
else:
return 'ranging' # mean reversion strategy
data['regime'] = data.apply(get_regime, axis=1)
# Apply different logic per regime
data['signal'] = 0
data.loc[(data['regime'] == 'trending_up') & (data['SMA_short'] > data['SMA_long']), 'signal'] = 1
data.loc[(data['regime'] == 'ranging') & (data['RSI_14'] < 30), 'signal'] = 1
Backtesting — The Right Way
Backtesting is the most **abused tool in retail trading**. Done naively, it tells you nothing useful. The difference between a useful backtest and a lie is methodology.
The Three-Way Data Split
Design and tune parameters here. You're allowed to look at this data. e.g. 2015–2020.
Design ZoneNever touch during design. Run the strategy here once to validate. e.g. 2021–2022.
ValidationThe only truly clean test is real future data. Paper trade for 1-3 months before allocating real money.
Final TestRealistic Cost Modelling
| Cost | What it is | Estimate | How to Model |
|---|---|---|---|
| Commission | Broker fee | ₹20–40/trade (Zerodha) | Subtract from each trade return |
| Slippage | Signal price vs fill price | 0.05–0.15% per trade | Adjust fill price by 0.1% against you |
| Bid-Ask Spread | Cost of crossing the spread | 0.02–0.05% (Nifty large-cap) | Add to slippage estimate |
def walk_forward_test(data, is_months=18, oos_months=6):
"""
Rolls a window: train on is_months, test on oos_months,
advance by oos_months, repeat. Returns stitched OOS results.
More rigorous than a single IS/OOS split.
"""
results = []
data['date'] = pd.to_datetime(data.index)
window_start = data['date'].iloc[0]
while True:
is_end = window_start + pd.DateOffset(months=is_months)
oos_end = is_end + pd.DateOffset(months=oos_months)
if oos_end > data['date'].iloc[-1]: break
is_data = data[(data['date'] >= window_start) & (data['date'] < is_end)]
oos_data = data[(data['date'] >= is_end) & (data['date'] < oos_end)]
best_params = optimize_strategy(is_data) # your param search
oos_result = run_strategy(oos_data, best_params)
results.append(oos_result)
window_start += pd.DateOffset(months=oos_months)
return pd.concat(results)
Risk Management & Position Sizing
Position sizing is the **most underrated concept in retail trading**. You can have a profitable strategy and blow up your account with bad sizing. Conversely, a mediocre strategy with excellent sizing can survive for years.
Fixed Fractional Sizing
Position Size = Risk Amount / (Entry Price - Stop Loss Price)
Example: ₹1,00,000 account, 1% risk, entry ₹500, stop at ₹480 (₹20 risk)
Position = ₹1,000 / ₹20 = 50 shares
The Kelly Criterion
The Kelly Criterion calculates the mathematically optimal allocation size to maximize long-term growth. Because Full Kelly sizing results in highly volatile drawdowns, most professional quants use **Half Kelly** sizing.
Half Kelly = Kelly % / 2
Stop-Loss Placement Methods
| Method | Formula | Best For |
|---|---|---|
| ATR-based | Entry - 1.5 * ATR(14) | Most strategies — adapts to current volatility regime |
| Structure-based | Below recent swing low | Trend and breakout strategies |
| Fixed % | Entry * (1 - 0.05) | Simple, but ignores volatility — can be too wide or too tight |
| Chandelier Exit | Highest High(22) - 3 * ATR(22) | Trailing stop for trend-following — adapts as price moves up |
Analytics — Every Metric Decoded
Return & Risk-Adjusted Metrics
Compound Annual Growth Rate. Measures geometric progression rates. Target: >15% to beat benchmark indices.
Measures annualized excess return per unit of overall volatility standard deviation. Target: >1.0.
Only penalizes downside volatility. Prefer over Sharpe for asymmetrical returns. Target: >1.5.
Calculates CAGR divided by absolute maximum drawdown. Target: >1.5.
Expectancy
Example: 45% win rate, avg win ₹2,000, avg loss ₹800:
Expectancy = (0.45 * 2000) - (0.55 * 800) = 900 - 440 = +₹460 expected per trade
Drawdown Analysis
| Metric | Measures | Thresholds |
|---|---|---|
| Max Drawdown | Worst peak-to-trough loss | <15% = comfortable, 15-30% = survivable, >45% = painful |
| Avg Drawdown | Mean depth of all drawdowns | Should remain <50% of the max drawdown level |
| Max DD Duration | Longest time to recover to new high | <3 months is good. >12 months is very tough psychologically |
| Recovery Factor | Total return / Max Drawdown | Target >3.0 (meaning total gains exceed the worst drawdown by 3x) |
Common Traps — How Backtests Lie
Lookahead Bias — The Silent Killer
Classic Example: Generating a signal using today's closing price, then "buying" at today's close. In reality, you only know the close after the market has closed.
Fix: Always shift your signals vector:
df['signal'].shift(1). This ensures you execute tomorrow using today's market close signal.
Curve-Fitting / Overfitting
Signs: Strategy has 5+ parameters. Returns are suspiciously smooth. Tiny parameter changes dramatically alter results. Sharpe drops >40% on OOS data.
Fix: Keep strategies simple (2–3 parameters). If OOS Sharpe is <60% of IS Sharpe, the strategy is overfit.
Survivorship Bias
Testing on current index constituents (e.g. Nifty 50) ignores stocks that were delisted, merged, or went bankrupt. Your universe is biased toward survivors. Returns can be overstated by 1–3% CAGR. Just be aware and discount your returns slightly.
Other Traps
| Trap | What happens | Fix |
|---|---|---|
| Too few trades | 5 trades with good results = luck, not signal | Minimum 30 trades; 100+ preferred for statistical confidence |
| Ignoring costs | Daily-signal strategies look great; costs eat all returns | Model commissions + slippage explicitly before declaring success |
| Data snooping | Testing 100 variations — some look great by pure chance | Fewer variations, strict IS/OOS separation |
| Regime mismatch | Strategy trained in a bull market fails in a bear market | Include at least one bear market period in IS data |
| Ignoring liquidity | Backtesting on thinly traded small-caps with no fills | Filter: 20-day average volume > ₹5 crore before testing |
Python Quant Toolkit — Copy & Use
This production-ready utility toolkit calculates technical indicators, computes risk-adjusted performance metrics, and runs cointegration checks for pairs trading. Copy and paste this directly into your Python files.
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import coint
def compute_indicators(df):
"""
Computes moving averages, ATR, RSI, MACD, and Bollinger Bands with standard definitions.
"""
h = df['High']; l = df['Low']; c = df['Close']
data = df.copy()
# Moving Averages
for n in [10, 20, 50, 200]:
data[f'SMA_{n}'] = c.rolling(n).mean()
data[f'EMA_{n}'] = c.ewm(span=n, adjust=False).mean()
# ATR (Average True Range)
prev_c = c.shift(1)
tr = pd.concat([h-l, (h-prev_c).abs(), (l-prev_c).abs()], axis=1).max(axis=1)
data['ATR_14'] = tr.ewm(alpha=1/14, adjust=False).mean()
# RSI (Wilder smoothing via ewm)
delta = c.diff()
gain = delta.clip(lower=0).ewm(alpha=1/14, adjust=False).mean()
loss = (-delta.clip(upper=0)).ewm(alpha=1/14, adjust=False).mean()
data['RSI_14'] = 100 - (100 / (1 + gain / loss))
# MACD
ema12 = c.ewm(span=12, adjust=False).mean()
ema26 = c.ewm(span=26, adjust=False).mean()
data['MACD'] = ema12 - ema26
data['MACD_signal'] = data['MACD'].ewm(span=9, adjust=False).mean()
data['MACD_hist'] = data['MACD'] - data['MACD_signal']
# Bollinger Bands
sma20 = c.rolling(20).mean()
std20 = c.rolling(20).std()
data['BB_upper'] = sma20 + 2 * std20
data['BB_middle'] = sma20
data['BB_lower'] = sma20 - 2 * std20
data['BB_width'] = (data['BB_upper'] - data['BB_lower']) / sma20
# ADX
up = h - h.shift(1)
down = l.shift(1) - l
pdm = up.where((up > down) & (up > 0), 0)
ndm = down.where((down > up) & (down > 0), 0)
pdi = 100 * pdm.ewm(alpha=1/14, adjust=False).mean() / data['ATR_14']
ndi = 100 * ndm.ewm(alpha=1/14, adjust=False).mean() / data['ATR_14']
dx = 100 * (pdi - ndi).abs() / (pdi + ndi)
data['ADX_14'] = dx.ewm(alpha=1/14, adjust=False).mean()
return data
def backtest_metrics(returns, risk_free=0.07):
"""
Computes CAGR, Sharpe Ratio, Sortino Ratio, Calmar, Max Drawdown, and Profit Factor.
"""
r = returns.dropna()
cum = (1 + r).cumprod()
n_years = len(r) / 252
daily_rf = risk_free / 252
total_return = cum.iloc[-1] - 1
cagr = cum.iloc[-1] ** (1 / n_years) - 1
vol_ann = r.std() * np.sqrt(252)
# Downside deviation (for Sortino)
downside = r[r < daily_rf]
sortino_vol = downside.std() * np.sqrt(252) if len(downside) > 0 else 1e-9
# Drawdown
rolling_max = cum.cummax()
drawdown = (cum - rolling_max) / rolling_max
max_dd = drawdown.min()
dd_dur = (drawdown < 0).groupby((drawdown < 0 != (drawdown < 0).shift()).cumsum()).sum().max()
excess = cagr - risk_free
sharpe = excess / vol_ann if vol_ann != 0 else 0
sortino = excess / sortino_vol if sortino_vol != 0 else 0
calmar = cagr / abs(max_dd) if max_dd != 0 else 0
in_mkt = r[r != 0]
win_rate = (in_mkt > 0).mean()
gross_profit = in_mkt[in_mkt > 0].sum()
gross_loss = abs(in_mkt[in_mkt < 0].sum())
pf = gross_profit / gross_loss if gross_loss != 0 else np.inf
expectancy = win_rate * in_mkt[in_mkt > 0].mean() + (1 - win_rate) * in_mkt[in_mkt < 0].mean()
return {
'total_return_pct': round(total_return * 100, 2),
'cagr_pct': round(cagr * 100, 2),
'sharpe': round(sharpe, 3),
'sortino': round(sortino, 3),
'calmar': round(calmar, 3),
'max_dd_pct': round(max_dd * 100, 2),
'max_dd_days': int(dd_dur),
'win_rate_pct': round(win_rate * 100, 2),
'profit_factor': round(pf, 3),
'expectancy_pct': round(expectancy * 100, 4)
}
def pairs_analysis(price_a, price_b, lookback=60):
"""
Runs OLS regression and tests for Granger Cointegration.
"""
import statsmodels.api as sm
score, pvalue, _ = coint(price_a, price_b)
if pvalue > 0.05:
print("⚠ Not cointegrated — do not trade this pair")
return None
result = sm.OLS(price_a, sm.add_constant(price_b)).fit()
hedge_ratio = result.params[price_b.name]
spread = price_a - hedge_ratio * price_b
mu = spread.rolling(lookback).mean()
sd = spread.rolling(lookback).std() # standard deviation
zscore = (spread - mu) / sd
signal = pd.Series(0, index=zscore.index)
signal[zscore < -2.0] = 1 # long spread
signal[zscore > 2.0] = -1 # short spread
signal[(zscore > -0.5) & (zscore < 0.5)] = 0 # exit zone
signal = signal.ffill()
return {'spread': spread, 'zscore': zscore, 'signal': signal, 'hedge_ratio': hedge_ratio}
Useful Libraries Reference
| Library | Install | Use For |
|---|---|---|
| backtesting.py | pip install backtesting |
Clean event-driven backtesting with built-in interactive HTML charts |
| vectorbt | pip install vectorbt |
Extremely fast vectorized backtesting and parameter optimization grids |
| statsmodels | pip install statsmodels |
Cointegration test parameters, OLS regressions, ARIMA time series models |
| pyfolio-reloaded | pip install pyfolio-reloaded |
Hedge-fund tear sheets (rolling Sharpe, risk allocations, drawdown curves) |
| plotly | pip install plotly |
High-end zoomable and hoverable charting canvas overlays |