Time Series Forecasting: From Baselines to Production

Forecasting predicts future values from historical temporal patterns. It is central to inventory planning, staffing, demand management, and capacity control.

Forecasting quality depends less on exotic models and more on correct framing, temporal validation, and operational integration.


Problem Definition That Actually Works

Before selecting model class, define:

  • forecast horizon (next day, week, month)
  • granularity (SKU, store, region, global)
  • update cadence (hourly, daily, weekly)
  • decision use (ordering, pricing, staffing)
  • acceptable error per segment

A model that is accurate globally may still fail critical SKUs or regions.


Baselines First

Strong baselines are mandatory:

  • naive: last observed value
  • seasonal naive: same time previous cycle
  • moving average

If advanced model does not consistently beat baseline on relevant slices, complexity is unjustified.


Feature Engineering for Forecasting

Common high-impact features:

  • lags (t-1, t-7, t-30)
  • rolling mean/std/max/min
  • seasonality indicators (day-of-week, month, festival periods)
  • promotions/pricing flags
  • weather and external drivers

Every feature must be available at prediction time for target horizon.


Model Choices

Classical models:

  • ARIMA/SARIMA for linear autocorrelation and seasonality
  • ETS for trend/seasonality decomposition

ML models:

  • gradient boosting with lag features
  • random forest for nonlinear tabular patterns

Deep models:

  • LSTM/TCN/transformers for complex multi-series contexts

Do not assume deep model superiority without clear evidence.


Temporal Validation

Random split is invalid for forecasting. Use rolling-origin validation:

  1. train on early period
  2. validate on next window
  3. roll forward and repeat

This approximates real deployment and reveals regime sensitivity.


Metrics and Decision Mapping

Useful metrics:

  • MAE / RMSE
  • WAPE / sMAPE for business reporting
  • quantile loss for uncertainty-aware planning

For inventory decisions, point forecast alone is insufficient. Prediction intervals or quantiles improve risk handling.


Multi-Series and Hierarchical Forecasting

Real businesses forecast many related series. Challenges:

  • sparse series with intermittent demand
  • hierarchy consistency (store -> region -> national)
  • cold start for new products

Use hierarchy reconciliation and pooled features where possible.


Production Failure Modes

  1. stale external feature feeds
  2. unhandled holiday/event shifts
  3. concept drift from policy or market change
  4. retraining cadence too slow
  5. evaluating only aggregate metrics

Forecasting systems must be monitored by segment, not only global averages.


Practical Deployment Pattern

  1. data quality checks
  2. feature generation with timestamp contract
  3. forecast generation and interval estimates
  4. business-rule postprocessing
  5. publish outputs to planning systems
  6. monitor forecast error and drift

Treat forecasting as recurring decision infrastructure.


End-to-End Code Example (Lag Features + Rolling Validation)

This example builds a forecasting model with:

  • lag/rolling features
  • time-based split
  • rolling-origin validation
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

df = pd.read_csv("daily_demand.csv", parse_dates=["date"])
df = df.sort_values("date").reset_index(drop=True)

# Create lag and rolling features
for lag in [1, 7, 14, 28]:
    df[f"lag_{lag}"] = df["demand"].shift(lag)

for w in [7, 14, 28]:
    df[f"roll_mean_{w}"] = df["demand"].shift(1).rolling(w).mean()
    df[f"roll_std_{w}"] = df["demand"].shift(1).rolling(w).std()

df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df = df.dropna().reset_index(drop=True)

feature_cols = [c for c in df.columns if c not in ["date", "demand"]]

def rolling_windows(data, train_days=365, valid_days=30, step_days=30):
    start = 0
    while True:
        train_end = start + train_days
        valid_end = train_end + valid_days
        if valid_end > len(data):
            break
        yield data.iloc[start:train_end], data.iloc[train_end:valid_end]
        start += step_days

mae_scores = []
for train_slice, valid_slice in rolling_windows(df):
    X_train = train_slice[feature_cols]
    y_train = train_slice["demand"]
    X_valid = valid_slice[feature_cols]
    y_valid = valid_slice["demand"]

    model = RandomForestRegressor(
        n_estimators=400,
        max_depth=12,
        random_state=42,
        n_jobs=-1,
    )
    model.fit(X_train, y_train)
    pred = model.predict(X_valid)
    mae_scores.append(mean_absolute_error(y_valid, pred))

print("Rolling MAE mean:", round(np.mean(mae_scores), 3))
print("Rolling MAE std :", round(np.std(mae_scores), 3))

# Train final model on full history (except final holdout if you keep one)
final_model = RandomForestRegressor(
    n_estimators=400,
    max_depth=12,
    random_state=42,
    n_jobs=-1,
)
final_model.fit(df[feature_cols], df["demand"])

This pattern is easy to productionize and avoids leakage from random splits.


Key Takeaways

  • robust forecasting starts with clear horizon and decision context
  • temporal validation and leakage prevention are non-negotiable
  • simple baselines are strong references and fallback options
  • operational reliability matters as much as offline accuracy