Quant Reference / Systematic Speculation

Think like a quant.
Trade like a machine.

From basic math transformations to advanced statistical validation, this handbook acts as a production-grade blueprint for automated strategy creation. Handcrafted for Python + Pandas. Return to this reference whenever you develop new models.

Python + Pandas Statistical Edge NSE / NYSE Equities v1.0 Production Ready
00

What Algo Trading Actually Is

At its core, algorithmic trading is simple: you write rules, the computer follows them exactly, and you measure whether those rules made money. No emotion. No gut feeling. No "I think this stock looks good." Just rules and math.

While a manual retail trader might think, "Reliance looks strong today, I'll buy some," an algorithmic trader designs a system that evaluates historical data parameters: "Buy Reliance when its 10-day average price crosses above its 50-day average, with volume at least 20% above its 20-day average volume. Sell when the 10-day crosses back below the 50-day." That's a strategy.

Why Systematic Strategies Work

01 / Markets are not perfectly efficient

If prices were random, no strategy could work. But **human psychology creates repeatable patterns** — panic selling, momentum chasing, and overreactions to news events.

02 / Institutions move slowly

Large funds can't buy ₹10,000 crore overnight — they take weeks. This creates **persistent price trends** that systematic strategies can ride.

03 / Discipline beats intelligence

A mediocre strategy applied with **perfect discipline beats a great strategy** applied inconsistently. Algos never have bad days or FOMO.

04 / Speed and scale

A computer can **scan 500 stocks in seconds**, apply 20 conditions to each, and execute in milliseconds. Humans can't compete on breadth.

Minimum Viable Setup (What You Already Have) Python + yfinance + pandas. That's enough to build, test, and analyze real strategies on real data. You don't need a Bloomberg terminal, API access to broker data, or ₹10 lakh capital to start learning. Free TradingView is perfect for visual validation of your logic.
01

The 6-Stage Quant Workflow

Every algo trading operation — from a retail Python script to a multi-billion dollar hedge fund — follows this iterative loop. Click each stage to see what actually happens there.

01 Idea Hypothesis Edge
02 Get Data yfinance Pipeline
03 Build Strategy Rules → Signals
04 Backtest Historical Run
05 Analyse Risk Metrics
06 Deploy Live Execution

Stage 1 — The Idea (Hypothesis)

Everything starts with a hypothesis about why a pattern should exist. "Stocks that cross above their 50-day average tend to keep rising." "When RSI drops below 30, the stock is oversold and usually bounces back." These conditions are called **signals**.

Good ideas come from: academic research (momentum, mean-reversion are extensively documented), market microstructure (how liquidity works), or just observing price patterns. Bad ideas come from random chart gazing without a structural why.

Key Question: Why should this edge exist? If you can't answer that, the backtest result is probably statistical noise.

Stage 2 — Get Data

With yfinance, one line gets you years of daily OHLCV data (Open, High, Low, Close, Volume) for any ticker. For NSE stocks, append .NS to the symbol: RELIANCE.NS, TCS.NS, HDFCBANK.NS.

Data quality matters: auto_adjust=True adjusts for splits and dividends automatically — always use this. Missing data and corporate actions (splits, mergers) are the most common source of silent bugs in backtests.

data_fetch.py Python
import yfinance as yf

# Single stock
df = yf.download('RELIANCE.NS', start='2019-01-01', end='2024-12-31', auto_adjust=True)

# Multiple stocks at once
tickers = ['RELIANCE.NS', 'TCS.NS', 'HDFCBANK.NS', 'INFY.NS']
data = yf.download(tickers, start='2019-01-01', auto_adjust=True)

Stage 3 — Build the Strategy

Turn your hypothesis into **precise, unambiguous rules**. "It looks bullish" is not a rule. "The 10-day SMA crosses above the 50-day SMA AND today's volume is > 1.2× the 20-day average volume" is a rule.

Every strategy needs: an **entry condition** (when to buy), an **exit condition** (when to sell), and a **position size rule** (how much to buy). Missing any one of these is not a complete strategy.

Stage 4 — Backtest

Run your rules on historical data and simulate what would have happened. The critical discipline: **never look at the OOS (out-of-sample) data while building**. Hold back 20–30% of your data as a clean validation set.

Vectorized backtesting (what we do with pandas) simulates all trades at once using array operations — fast, simple. Event-driven backtesting (backtesting.py, zipline) simulates day-by-day like a real system — more realistic, captures things like partial fills.

Stage 5 — Analyse the Output

Most beginners either trust a good-looking return number blindly, or drown in metrics they don't understand. Section 09 of this guide decodes every number.

The key questions: Did it beat buy-and-hold? Is the Sharpe Ratio > 1? Is the max drawdown something you could psychologically survive? Are there enough trades to be statistically meaningful (>30)?

Stage 6 — Deploy / Iterate

Paper trading first — most brokers offer paper trading (Zerodha Sensibull, Interactive Brokers paper, etc.). Run the strategy for at least 1–3 months on paper before risking real money. Real markets have slippage, partial fills, and data feed delays that backtests miss.

Usually you find problems and go back to Stage 1. This is normal. Even professional quants iterate dozens of times. The loop is the process.

Roadmap after paper trading: For live execution with Python, look into Zerodha Kite Connect API (NSE) or Alpaca (US markets) — both have free tiers and Python SDKs.
02

Your First Strategy — SMA Crossover

The SMA crossover is the "Hello World" of algo trading. Simple enough to understand completely, complex enough to teach you all the core concepts. The idea: when the short moving average crosses above the long one, a new uptrend may be starting.

complete_crossover.py Python
import yfinance as yf
import pandas as pd
import numpy as np

# ── STEP 1: GET DATA ───────────────────────────────────
# .NS suffix = NSE (National Stock Exchange India)
# auto_adjust=True → splits and dividends handled automatically
ticker     = 'RELIANCE.NS'
start_date = '2020-01-01'
end_date   = '2024-12-31'

data = yf.download(ticker, start=start_date, end=end_date, auto_adjust=True)
data = data[['Close']].copy()

# ── STEP 2: COMPUTE MOVING AVERAGES ───────────────────
# SMA(10) = average of last 10 days → reacts fast to price changes
# SMA(50) = average of last 50 days → shows the bigger, slower trend
# When fast > slow: uptrend. When fast < slow: downtrend.
short_window = 10
long_window  = 50

data['SMA_short'] = data['Close'].rolling(window=short_window).mean()
data['SMA_long']  = data['Close'].rolling(window=long_window).mean()

# ── STEP 3: GENERATE SIGNALS ──────────────────────────
# Signal = 1 when we SHOULD be in the market, 0 when we shouldn't
# Short MA above long MA = bullish = hold the stock
data['Signal'] = 0
data.loc[data['SMA_short'] > data['SMA_long'], 'Signal'] = 1

# Position = CHANGE in signal
# +1 = just crossed up → BUY today
# -1 = just crossed down → SELL today
#  0 = no change, hold current position
data['Position'] = data['Signal'].diff()

# ── STEP 4: CALCULATE RETURNS ─────────────────────────
# Daily return = today's % price change
# Strategy return = daily return ONLY on days we're in the market
# .shift(1) is CRITICAL: use yesterday's signal for today's return
# (In real life, you see the crossover after market close, 
#  so you can only act the NEXT day's open)
data['Daily_Return']    = data['Close'].pct_change()
data['Strategy_Return'] = data['Daily_Return'] * data['Signal'].shift(1)

# Cumulative returns: how ₹1 grows over time
data['Cum_Market']   = (1 + data['Daily_Return']).cumprod()
data['Cum_Strategy'] = (1 + data['Strategy_Return']).cumprod()

# ── STEP 5: COMPUTE METRICS ───────────────────────────
years = len(data) / 252   # 252 = avg trading days in a year

total_ret  = (data['Cum_Strategy'].iloc[-1] - 1) * 100
cagr       = (data['Cum_Strategy'].iloc[-1] ** (1/years) - 1) * 100
mkt_cagr   = (data['Cum_Market'].iloc[-1]   ** (1/years) - 1) * 100

# Sharpe: annualized excess return / annualized volatility
daily_avg    = data['Strategy_Return'].mean()
daily_std    = data['Strategy_Return'].std()
sharpe       = (daily_avg / daily_std) * (252 ** 0.5) if daily_std != 0 else 0

# Max drawdown: worst peak-to-trough loss
rolling_max  = data['Cum_Strategy'].cummax()
drawdown     = (data['Cum_Strategy'] - rolling_max) / rolling_max
max_dd       = drawdown.min() * 100

# Win rate
in_market    = data[data['Signal'].shift(1) == 1]['Strategy_Return']
win_rate     = (in_market > 0).sum() / len(in_market) * 100 if len(in_market) > 0 else 0
gross_profit = in_market[in_market > 0].sum()
gross_loss   = abs(in_market[in_market < 0].sum())
profit_factor = gross_profit / gross_loss if gross_loss != 0 else float('inf')
num_trades   = (data['Position'] == 1).sum()

print(f"CAGR:          {cagr:+.1f}%  (market: {mkt_cagr:+.1f}%)")
print(f"Total Return:  {total_ret:+.1f}%")
print(f"Sharpe Ratio:  {sharpe:.2f}  (>1 = good)")
print(f"Max Drawdown:  {max_dd:.1f}%")
print(f"Win Rate:      {win_rate:.1f}%")
print(f"Profit Factor: {profit_factor:.2f}")
print(f"Num Trades:    {num_trades}")

How to Try Variations

Different Tickers

Try TCS.NS, INFY.NS, TATASTEEL.NS, NIFTYBEES.NS (Nifty ETF). Each has a different personality — tech stocks trend differently than cyclicals.

Different Windows

Try 5/20 (faster, more trades), 20/100 (slower, fewer trades), 50/200 (the "Golden Cross" — very well known). Notice how metrics shift.

Add a Volume Filter

Only enter when Volume > 1.2 × Volume.rolling(20).mean(). This filters low-conviction crossovers. Observe if win rate improves.

Use EMA Instead

Replace .rolling(n).mean() with .ewm(span=n).mean(). EMA reacts faster — signals come earlier but may be noisier.

03

Reading Your Backtest Results

When you run that script, you get a block of numbers. Here's exactly how to interpret them — with a good example and a bad example that looks deceptively okay.

✓ Good Result

CAGR: +18.2% (market: +12.1%)
Total Return: +142%
Sharpe Ratio: 1.4 ← above 1.0
Max Drawdown: -18% ← survivable
Win Rate: 54%
Profit Factor: 1.8 ← gains > losses
Num Trades: 38 ← statistically ok

✗ Deceptive / Poor Result

CAGR: +14% (market: +12%)
Total Return: +96%
Sharpe Ratio: 0.6 ← poor risk/reward
Max Drawdown: -55% ← could you hold?
Win Rate: 62% ← looks good...
Profit Factor: 0.9 ← losses > gains!
Num Trades: 4 ← just luck

The Questions to Ask Every Time

Question What to look for Red Flag
Did it beat buy-and-hold? Strategy CAGR > market CAGR by a meaningful margin Barely beats market → not worth the execution complexity
Is risk-adjusted return good? Sharpe Ratio > 1.0 Sharpe < 0.7 → returns don't justify the trading volatility
Could you survive the drawdown? Max drawdown < 20-25% >40% drawdown → most people quit before recovery occurs
Enough trades to be real? >30 trades minimum, >100 preferred <10 trades → purely luck, proves nothing
Do gains outsize losses? Profit Factor > 1.35 < 1.0 → you're actually losing money overall after transaction costs
Does it beat the market after costs? Positive after ₹40/trade + 0.1% slippage Strategy returns evaporate after modeling execution fees
04

Strategy Types — What Edge Are You Exploiting?

Every strategy is a bet on a specific **market inefficiency**. Knowing which type you're running determines what data matters, how to judge results, and what structural failure looks like.

Trend Following
Mean Reversion
Momentum
Breakout
Stat Arbitrage
ML-Based

Trend Following

The oldest and most robust category. Assets that have been rising tend to continue rising — because **institutional fund flows are persistent**. Large funds can't buy ₹10,000 crore in a day, so they accumulate over weeks, creating trends.

Period: Weeks to months Win Rate: ~35–45% High R:R ratio

The counterintuitive truth: Trend strategies lose more often than they win. A 40% win rate can still be very profitable if winning trades are 3× the size of losing trades. Never judge a trend strategy by win rate alone.

Signals: SMA/EMA crossovers, ADX > 25 (confirms trend), Ichimoku cloud position, Donchian channel breakout.

When it fails: Choppy sideways markets. ADX below 20 means no trend — trend strategies will get whipsawed repeatedly. Use ADX as a regime filter.

Mean Reversion

Bets that prices which have moved too far from their average will snap back. Works because **short-term returns are negatively autocorrelated** — statistically, extreme single-day moves tend to partially reverse.

Period: Hours to days Win Rate: ~60–70% Low R:R ratio

The Trap: High win rate feels great until a trending move wipes out 10 wins at once. Always use a strict stop-loss. The classic blowup is "it'll come back" — sometimes it doesn't.

Signals: RSI < 30 / > 70, Bollinger Band lower/upper touch, Z-score of price deviation from MA.

Key Rule: Never run a mean-reversion strategy without a stop-loss. One unhedged trend move can erase weeks of small wins.

Momentum (Cross-Sectional)

Based on the academically documented fact that **stocks outperforming in the last 3–12 months tend to keep outperforming for the next 1–3 months**. One of the most replicated anomalies in finance literature.

Period: Weeks to months Win Rate: ~50–60% Cross-sectional ranking

Cross-sectional: Rank a universe of stocks by their 12-1 month return (skip the last month — it tends to reverse). Go long the top decile, short the bottom. Market-neutral.
Time-series: Go long a stock only if its own recent return is positive.

Momentum crash risk: Momentum strategies can crash violently during sharp market reversals (e.g. March 2020). Size appropriately.

Breakout

Enters when price breaks through a resistance or support level. Theory: when a level is broken, trapped traders on the wrong side accelerate the move by covering positions.

Period: Days to weeks Win Rate: ~35–45% Very high R:R potential

Signals: Donchian channel breakout (N-day high), volume confirmation (>150% of 20-day avg — critical), ATR-based price targets.

False breakout problem: 60–70% of breakouts fail without volume confirmation. Always wait for a daily close above the level, not just an intraday pierce. Volume is not optional.

Statistical Arbitrage (Pairs Trading)

Exploits mispricings between statistically related instruments. If HDFCBANK and ICICIBANK historically move together (cointegrated), when they diverge, short the outperformer and long the underperformer — betting on convergence.

Market-neutral Requires cointegration test Medium complexity
Z-score of spread — the trading signal Spread = Price_A - (Hedge_Ratio * Price_B)
Z = (Spread - mean(Spread)) / std(Spread)
Enter long spread when Z < -2, exit when Z → 0
Enter short spread when Z > +2, exit when Z → 0

Test cointegration with statsmodels.tsa.stattools.coint(). If p-value < 0.05, the pair is statistically cointegrated and suitable for pairs trading.

ML-Based Strategies

Uses machine learning to find non-linear patterns in features. The dirty secret: simpler models generalize better in finance. XGBoost usually beats LSTM for tabular price data. LSTM sounds more impressive but overfits harder.

High overfitting risk Feature engineering critical Non-linear patterns

What works: ML on classification (will price be higher in 5 days? yes/no) or ranking (which of these 50 stocks will do best?). ML on exact price prediction almost never works — the signal-to-noise ratio in price data is too low.

ML Pitfall — Lookahead Bias: Only use features available at time T to predict time T+1. Rolling window features must use `.shift(1)`. A model trained with future data looks perfect and fails completely in live trading.
05

Indicators — The Math Behind Them

Indicators are not magic. They are **mathematical transformations of price and volume**. Understanding their formulas tells you exactly what they measure — and exactly where they lie to you.

SMA vs EMA — The Foundation

SMA — Simple Moving Average SMA(n) = (P_1 + P_2 + ... + P_n) / n ← equal weight to all n days

EMA — Exponential Moving Average EMA(today) = Price(today) * k + EMA(yesterday) * (1 - k)
k = 2 / (n + 1) → EMA(10): k = 0.1818 → 18.2% weight on today's price
vs SMA(10): only 10% weight on today's price

EMA reacts faster to recent moves (useful in fast markets, noisier). SMA is smoother (better for trend identification, less whipsawing). For short-term signals, use EMA. For long-term trend detection, SMA is fine.

RSI — What It Actually Measures

RSI (14-period default) RS = Average Gain over 14 days / Average Loss over 14 days
RSI = 100 - (100 / (1 + RS))

Extreme moves: RS=9 → RSI=90 (9× more up days). RS=0.11 → RSI=10 (9× more down days)
RSI Level Traditional Read Reality
> 70 Overbought → sell In strong uptrends, RSI stays >70 for weeks. Selling purely on this signal in a bull market is expensive.
50–70 Bullish momentum RSI crossing 50 from below is a reliable trend-confirmation signal. Better use than the 70 level.
30–50 Bearish momentum RSI crossing 50 from above signals building selling pressure.
< 30 Oversold → buy In downtrends, RSI stays <30 for days. "Catching falling knives" without a trend filter is dangerous.
RSI Divergence (Stronger Signal): Price makes a new high but RSI makes a lower high → bearish divergence, often precedes a reversal. Much stronger than the 70/30 levels alone because it shows momentum fading before the price reacts.

Bollinger Bands — Volatility Encoded

Bollinger Bands (20-period, 2σ) Middle Band = SMA(20)
Upper Band = SMA(20) + 2 * StdDev(Close, 20)
Lower Band = SMA(20) - 2 * StdDev(Close, 20)

%B = (Close - Lower) / (Upper - Lower) → 0 = at lower, 1 = at upper, 0.5 = at middle
BB Width = (Upper - Lower) / Middle → measures current volatility relative to average

The bands encode **volatility** — they expand in high-vol periods and contract in low-vol periods. Tight bands (low BB Width) = volatility compression = potential breakout loading. Wide bands = high volatility = mean reversion more likely.

MACD — Three Signals in One

MACD (12/26/9 — the universal defaults) MACD Line = EMA(12) - EMA(26) ← faster MA minus slower MA
Signal Line = EMA(9) of MACD Line ← smoothed version of MACD
Histogram = MACD Line - Signal Line ← momentum of momentum

The histogram is the most useful part. **Growing histogram = momentum building.** Shrinking histogram = momentum fading — often precedes a crossover signal by a few bars, giving earlier warning than the raw crossover.

ATR — For Stops and Sizing

ATR (Average True Range, 14-period) True Range = max(High - Low, |High - Prev Close|, |Low - Prev Close|)
ATR(14) = Wilder smooth of 14-period True Range averages

If ATR = ₹50: the stock moves ~₹50 on an average day (normal noise)

ATR is critical for stop-loss placement. A stop-loss at ₹20 on a stock with ATR=₹50 guarantees you get stopped out by random daily noise. **Rule: stops at 1.5×–2× ATR from entry.**

ADX — Is There a Trend at All?

ADX Interpretation (Threshold-based) ADX < 20: No meaningful trend → ranging market, avoid trend strategies
ADX 20–25: Trend developing → caution
ADX 25–40: Strong trend → ideal for trend-following entry
ADX > 40: Very strong trend → consider partial profit-taking
ADX as a Regime Filter: The most powerful use of ADX is not as a signal generator — it's as a regime switch. Only run trend strategies when ADX > 25. Switch to mean-reversion when ADX < 20. This one change can dramatically improve both strategy types.
06

Signal Engineering

A raw indicator reading is not a signal. A **signal is a precise, testable rule with a binary output: enter or exit**. The engineering between "RSI is low" and a real trading signal is where most of the actual edge comes from.

Signal Quality Hierarchy

Level 1 — Raw Threshold

RSI < 30 → buy. Maximum false signals. Works occasionally by coincidence, not by design.

High Noise
Level 2 — Confirmation

RSI < 30 AND price > SMA(200). Filters out bear market trades. Better, but still incomplete.

Better
Level 3 — Regime Filter

RSI < 30 AND SMA(50) > SMA(200) AND ADX > 20. Only trades in the right market environment.

Stronger
Level 4 — Multi-Timeframe

Daily trend up → zoom into 4H → RSI oversold there → enter. Higher TF sets direction, lower TF gives timing.

Professional Grade

Regime-Aware Signal Logic

regime_logic.py Python
# Compute regime indicators first
data['SMA200']   = data['Close'].rolling(200).mean()
data['BB_width']  = (data['BB_upper'] - data['BB_lower']) / data['SMA20']
# (assumes you've computed ADX, BB_upper, BB_lower, SMA20 already)

def get_regime(row):
    if row['ADX_14'] > 25 and row['Close'] > row['SMA200']:
        return 'trending_up'
    elif row['ADX_14'] > 25 and row['Close'] < row['SMA200']:
        return 'trending_down'   # stay flat or short
    elif row['BB_width'] < 0.05:
        return 'squeeze'           # breakout strategy
    else:
        return 'ranging'           # mean reversion strategy

data['regime'] = data.apply(get_regime, axis=1)

# Apply different logic per regime
data['signal'] = 0
data.loc[(data['regime'] == 'trending_up')  & (data['SMA_short'] > data['SMA_long']), 'signal'] = 1
data.loc[(data['regime'] == 'ranging')       & (data['RSI_14'] < 30), 'signal'] = 1
07

Backtesting — The Right Way

Backtesting is the most **abused tool in retail trading**. Done naively, it tells you nothing useful. The difference between a useful backtest and a lie is methodology.

The Three-Way Data Split

In-Sample (IS) ~60%

Design and tune parameters here. You're allowed to look at this data. e.g. 2015–2020.

Design Zone
Out-of-Sample (OOS) ~20%

Never touch during design. Run the strategy here once to validate. e.g. 2021–2022.

Validation
Live / Paper ~20%

The only truly clean test is real future data. Paper trade for 1-3 months before allocating real money.

Final Test

Realistic Cost Modelling

Cost What it is Estimate How to Model
Commission Broker fee ₹20–40/trade (Zerodha) Subtract from each trade return
Slippage Signal price vs fill price 0.05–0.15% per trade Adjust fill price by 0.1% against you
Bid-Ask Spread Cost of crossing the spread 0.02–0.05% (Nifty large-cap) Add to slippage estimate
Rule of Thumb: A strategy making 20% before costs often makes 14–16% after realistic transaction fees. If your strategy trades daily, costs alone can consume 5–8% of annual returns.
walk_forward.py Python
def walk_forward_test(data, is_months=18, oos_months=6):
    """
    Rolls a window: train on is_months, test on oos_months,
    advance by oos_months, repeat. Returns stitched OOS results.
    More rigorous than a single IS/OOS split.
    """
    results = []
    data['date'] = pd.to_datetime(data.index)
    window_start = data['date'].iloc[0]

    while True:
        is_end  = window_start + pd.DateOffset(months=is_months)
        oos_end = is_end       + pd.DateOffset(months=oos_months)
        if oos_end > data['date'].iloc[-1]: break

        is_data  = data[(data['date'] >= window_start) & (data['date'] < is_end)]
        oos_data = data[(data['date'] >= is_end)       & (data['date'] < oos_end)]

        best_params = optimize_strategy(is_data)   # your param search
        oos_result  = run_strategy(oos_data, best_params)
        results.append(oos_result)
        window_start += pd.DateOffset(months=oos_months)

    return pd.concat(results)
08

Risk Management & Position Sizing

Position sizing is the **most underrated concept in retail trading**. You can have a profitable strategy and blow up your account with bad sizing. Conversely, a mediocre strategy with excellent sizing can survive for years.

Fixed Fractional Sizing

Fixed Fractional Position Sizing Risk Amount = Account Balance * Risk% (e.g. 1%)
Position Size = Risk Amount / (Entry Price - Stop Loss Price)

Example: ₹1,00,000 account, 1% risk, entry ₹500, stop at ₹480 (₹20 risk)
Position = ₹1,000 / ₹20 = 50 shares

The Kelly Criterion

The Kelly Criterion calculates the mathematically optimal allocation size to maximize long-term growth. Because Full Kelly sizing results in highly volatile drawdowns, most professional quants use **Half Kelly** sizing.

Kelly Sizing % Kelly % = Win Rate - ((1 - Win Rate) / Reward-Risk Ratio)
Half Kelly = Kelly % / 2

Stop-Loss Placement Methods

Method Formula Best For
ATR-based Entry - 1.5 * ATR(14) Most strategies — adapts to current volatility regime
Structure-based Below recent swing low Trend and breakout strategies
Fixed % Entry * (1 - 0.05) Simple, but ignores volatility — can be too wide or too tight
Chandelier Exit Highest High(22) - 3 * ATR(22) Trailing stop for trend-following — adapts as price moves up
09

Analytics — Every Metric Decoded

Return & Risk-Adjusted Metrics

CAGR

Compound Annual Growth Rate. Measures geometric progression rates. Target: >15% to beat benchmark indices.

Sharpe Ratio

Measures annualized excess return per unit of overall volatility standard deviation. Target: >1.0.

Sortino Ratio

Only penalizes downside volatility. Prefer over Sharpe for asymmetrical returns. Target: >1.5.

Calmar Ratio

Calculates CAGR divided by absolute maximum drawdown. Target: >1.5.

Sortino vs Sharpe: Sharpe penalizes ALL volatility (both up and down moves). Sortino only penalizes downside volatility. A strategy with large winning days and small losing days has a much better Sortino than Sharpe. Prefer Sortino.

Expectancy

Expectancy — The Most Important Per-Trade Metric Expectancy = (Win Rate * Avg Win) - (Loss Rate * Avg Loss)

Example: 45% win rate, avg win ₹2,000, avg loss ₹800:
Expectancy = (0.45 * 2000) - (0.55 * 800) = 900 - 440 = +₹460 expected per trade

Drawdown Analysis

Metric Measures Thresholds
Max Drawdown Worst peak-to-trough loss <15% = comfortable, 15-30% = survivable, >45% = painful
Avg Drawdown Mean depth of all drawdowns Should remain <50% of the max drawdown level
Max DD Duration Longest time to recover to new high <3 months is good. >12 months is very tough psychologically
Recovery Factor Total return / Max Drawdown Target >3.0 (meaning total gains exceed the worst drawdown by 3x)
10

Common Traps — How Backtests Lie

Lookahead Bias — The Silent Killer

Definition: Using information in your signal calculation that would not have been available at trade execution time.

Classic Example: Generating a signal using today's closing price, then "buying" at today's close. In reality, you only know the close after the market has closed.

Fix: Always shift your signals vector: df['signal'].shift(1). This ensures you execute tomorrow using today's market close signal.

Curve-Fitting / Overfitting

Definition: Tuning parameters until the strategy looks perfect on historical data. The model memorizes history instead of learning a real edge.

Signs: Strategy has 5+ parameters. Returns are suspiciously smooth. Tiny parameter changes dramatically alter results. Sharpe drops >40% on OOS data.

Fix: Keep strategies simple (2–3 parameters). If OOS Sharpe is <60% of IS Sharpe, the strategy is overfit.

Survivorship Bias

Testing on current index constituents (e.g. Nifty 50) ignores stocks that were delisted, merged, or went bankrupt. Your universe is biased toward survivors. Returns can be overstated by 1–3% CAGR. Just be aware and discount your returns slightly.

Other Traps

Trap What happens Fix
Too few trades 5 trades with good results = luck, not signal Minimum 30 trades; 100+ preferred for statistical confidence
Ignoring costs Daily-signal strategies look great; costs eat all returns Model commissions + slippage explicitly before declaring success
Data snooping Testing 100 variations — some look great by pure chance Fewer variations, strict IS/OOS separation
Regime mismatch Strategy trained in a bull market fails in a bear market Include at least one bear market period in IS data
Ignoring liquidity Backtesting on thinly traded small-caps with no fills Filter: 20-day average volume > ₹5 crore before testing
11

Python Quant Toolkit — Copy & Use

This production-ready utility toolkit calculates technical indicators, computes risk-adjusted performance metrics, and runs cointegration checks for pairs trading. Copy and paste this directly into your Python files.

quant_toolkit.py Python
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import coint

def compute_indicators(df):
    """
    Computes moving averages, ATR, RSI, MACD, and Bollinger Bands with standard definitions.
    """
    h = df['High']; l = df['Low']; c = df['Close']
    data = df.copy()

    # Moving Averages
    for n in [10, 20, 50, 200]:
        data[f'SMA_{n}'] = c.rolling(n).mean()
        data[f'EMA_{n}'] = c.ewm(span=n, adjust=False).mean()

    # ATR (Average True Range)
    prev_c = c.shift(1)
    tr     = pd.concat([h-l, (h-prev_c).abs(), (l-prev_c).abs()], axis=1).max(axis=1)
    data['ATR_14'] = tr.ewm(alpha=1/14, adjust=False).mean()

    # RSI (Wilder smoothing via ewm)
    delta = c.diff()
    gain  = delta.clip(lower=0).ewm(alpha=1/14, adjust=False).mean()
    loss  = (-delta.clip(upper=0)).ewm(alpha=1/14, adjust=False).mean()
    data['RSI_14'] = 100 - (100 / (1 + gain / loss))

    # MACD
    ema12 = c.ewm(span=12, adjust=False).mean()
    ema26 = c.ewm(span=26, adjust=False).mean()
    data['MACD']        = ema12 - ema26
    data['MACD_signal'] = data['MACD'].ewm(span=9, adjust=False).mean()
    data['MACD_hist']   = data['MACD'] - data['MACD_signal']

    # Bollinger Bands
    sma20 = c.rolling(20).mean()
    std20 = c.rolling(20).std()
    data['BB_upper']  = sma20 + 2 * std20
    data['BB_middle'] = sma20
    data['BB_lower']  = sma20 - 2 * std20
    data['BB_width']  = (data['BB_upper'] - data['BB_lower']) / sma20

    # ADX
    up   = h - h.shift(1)
    down = l.shift(1) - l
    pdm  = up.where((up > down) & (up > 0), 0)
    ndm  = down.where((down > up) & (down > 0), 0)
    pdi  = 100 * pdm.ewm(alpha=1/14, adjust=False).mean() / data['ATR_14']
    ndi  = 100 * ndm.ewm(alpha=1/14, adjust=False).mean() / data['ATR_14']
    dx   = 100 * (pdi - ndi).abs() / (pdi + ndi)
    data['ADX_14'] = dx.ewm(alpha=1/14, adjust=False).mean()

    return data

def backtest_metrics(returns, risk_free=0.07):
    """
    Computes CAGR, Sharpe Ratio, Sortino Ratio, Calmar, Max Drawdown, and Profit Factor.
    """
    r        = returns.dropna()
    cum      = (1 + r).cumprod()
    n_years  = len(r) / 252
    daily_rf = risk_free / 252

    total_return = cum.iloc[-1] - 1
    cagr         = cum.iloc[-1] ** (1 / n_years) - 1
    vol_ann      = r.std() * np.sqrt(252)

    # Downside deviation (for Sortino)
    downside = r[r < daily_rf]
    sortino_vol = downside.std() * np.sqrt(252) if len(downside) > 0 else 1e-9

    # Drawdown
    rolling_max = cum.cummax()
    drawdown    = (cum - rolling_max) / rolling_max
    max_dd      = drawdown.min()
    dd_dur      = (drawdown < 0).groupby((drawdown < 0 != (drawdown < 0).shift()).cumsum()).sum().max()

    excess = cagr - risk_free
    sharpe  = excess / vol_ann      if vol_ann != 0      else 0
    sortino = excess / sortino_vol  if sortino_vol != 0  else 0
    calmar  = cagr / abs(max_dd)    if max_dd != 0        else 0

    in_mkt       = r[r != 0]
    win_rate     = (in_mkt > 0).mean()
    gross_profit = in_mkt[in_mkt > 0].sum()
    gross_loss   = abs(in_mkt[in_mkt < 0].sum())
    pf           = gross_profit / gross_loss if gross_loss != 0 else np.inf
    expectancy   = win_rate * in_mkt[in_mkt > 0].mean() + (1 - win_rate) * in_mkt[in_mkt < 0].mean()

    return {
        'total_return_pct': round(total_return * 100, 2), 
        'cagr_pct': round(cagr * 100, 2),
        'sharpe': round(sharpe, 3),   
        'sortino': round(sortino, 3), 
        'calmar': round(calmar, 3),
        'max_dd_pct': round(max_dd * 100, 2), 
        'max_dd_days': int(dd_dur),
        'win_rate_pct': round(win_rate * 100, 2), 
        'profit_factor': round(pf, 3),
        'expectancy_pct': round(expectancy * 100, 4)
    }

def pairs_analysis(price_a, price_b, lookback=60):
    """
    Runs OLS regression and tests for Granger Cointegration.
    """
    import statsmodels.api as sm
    score, pvalue, _ = coint(price_a, price_b)
    
    if pvalue > 0.05:
        print("⚠ Not cointegrated — do not trade this pair")
        return None

    result      = sm.OLS(price_a, sm.add_constant(price_b)).fit()
    hedge_ratio = result.params[price_b.name]
    spread      = price_a - hedge_ratio * price_b

    mu     = spread.rolling(lookback).mean()
    sd     = spread.rolling(lookback).std() # standard deviation
    zscore = (spread - mu) / sd

    signal = pd.Series(0, index=zscore.index)
    signal[zscore < -2.0] =  1   # long spread
    signal[zscore >  2.0] = -1   # short spread
    signal[(zscore > -0.5) & (zscore < 0.5)] = 0  # exit zone
    signal = signal.ffill()

    return {'spread': spread, 'zscore': zscore, 'signal': signal, 'hedge_ratio': hedge_ratio}

Useful Libraries Reference

Library Install Use For
backtesting.py pip install backtesting Clean event-driven backtesting with built-in interactive HTML charts
vectorbt pip install vectorbt Extremely fast vectorized backtesting and parameter optimization grids
statsmodels pip install statsmodels Cointegration test parameters, OLS regressions, ARIMA time series models
pyfolio-reloaded pip install pyfolio-reloaded Hedge-fund tear sheets (rolling Sharpe, risk allocations, drawdown curves)
plotly pip install plotly High-end zoomable and hoverable charting canvas overlays