Time Series Forecasting: From Baselines to Production
Forecasting predicts future values from historical temporal patterns. It is central to inventory planning, staffing, demand management, and capacity control.
Forecasting quality depends less on exotic models and more on correct framing, temporal validation, and operational integration.
Problem Definition That Actually Works
Before selecting model class, define:
- forecast horizon (next day, week, month)
- granularity (SKU, store, region, global)
- update cadence (hourly, daily, weekly)
- decision use (ordering, pricing, staffing)
- acceptable error per segment
A model that is accurate globally may still fail critical SKUs or regions.
Baselines First
Strong baselines are mandatory:
- naive: last observed value
- seasonal naive: same time previous cycle
- moving average
If advanced model does not consistently beat baseline on relevant slices, complexity is unjustified.
Feature Engineering for Forecasting
Common high-impact features:
- lags (
t-1,t-7,t-30) - rolling mean/std/max/min
- seasonality indicators (day-of-week, month, festival periods)
- promotions/pricing flags
- weather and external drivers
Every feature must be available at prediction time for target horizon.
Model Choices
Classical models:
- ARIMA/SARIMA for linear autocorrelation and seasonality
- ETS for trend/seasonality decomposition
ML models:
- gradient boosting with lag features
- random forest for nonlinear tabular patterns
Deep models:
- LSTM/TCN/transformers for complex multi-series contexts
Do not assume deep model superiority without clear evidence.
Temporal Validation
Random split is invalid for forecasting. Use rolling-origin validation:
- train on early period
- validate on next window
- roll forward and repeat
This approximates real deployment and reveals regime sensitivity.
Metrics and Decision Mapping
Useful metrics:
- MAE / RMSE
- WAPE / sMAPE for business reporting
- quantile loss for uncertainty-aware planning
For inventory decisions, point forecast alone is insufficient. Prediction intervals or quantiles improve risk handling.
Multi-Series and Hierarchical Forecasting
Real businesses forecast many related series. Challenges:
- sparse series with intermittent demand
- hierarchy consistency (store -> region -> national)
- cold start for new products
Use hierarchy reconciliation and pooled features where possible.
Production Failure Modes
- stale external feature feeds
- unhandled holiday/event shifts
- concept drift from policy or market change
- retraining cadence too slow
- evaluating only aggregate metrics
Forecasting systems must be monitored by segment, not only global averages.
Practical Deployment Pattern
- data quality checks
- feature generation with timestamp contract
- forecast generation and interval estimates
- business-rule postprocessing
- publish outputs to planning systems
- monitor forecast error and drift
Treat forecasting as recurring decision infrastructure.
End-to-End Code Example (Lag Features + Rolling Validation)
This example builds a forecasting model with:
- lag/rolling features
- time-based split
- rolling-origin validation
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
df = pd.read_csv("daily_demand.csv", parse_dates=["date"])
df = df.sort_values("date").reset_index(drop=True)
# Create lag and rolling features
for lag in [1, 7, 14, 28]:
df[f"lag_{lag}"] = df["demand"].shift(lag)
for w in [7, 14, 28]:
df[f"roll_mean_{w}"] = df["demand"].shift(1).rolling(w).mean()
df[f"roll_std_{w}"] = df["demand"].shift(1).rolling(w).std()
df["dow"] = df["date"].dt.dayofweek
df["month"] = df["date"].dt.month
df = df.dropna().reset_index(drop=True)
feature_cols = [c for c in df.columns if c not in ["date", "demand"]]
def rolling_windows(data, train_days=365, valid_days=30, step_days=30):
start = 0
while True:
train_end = start + train_days
valid_end = train_end + valid_days
if valid_end > len(data):
break
yield data.iloc[start:train_end], data.iloc[train_end:valid_end]
start += step_days
mae_scores = []
for train_slice, valid_slice in rolling_windows(df):
X_train = train_slice[feature_cols]
y_train = train_slice["demand"]
X_valid = valid_slice[feature_cols]
y_valid = valid_slice["demand"]
model = RandomForestRegressor(
n_estimators=400,
max_depth=12,
random_state=42,
n_jobs=-1,
)
model.fit(X_train, y_train)
pred = model.predict(X_valid)
mae_scores.append(mean_absolute_error(y_valid, pred))
print("Rolling MAE mean:", round(np.mean(mae_scores), 3))
print("Rolling MAE std :", round(np.std(mae_scores), 3))
# Train final model on full history (except final holdout if you keep one)
final_model = RandomForestRegressor(
n_estimators=400,
max_depth=12,
random_state=42,
n_jobs=-1,
)
final_model.fit(df[feature_cols], df["demand"])
This pattern is easy to productionize and avoids leakage from random splits.
Key Takeaways
- robust forecasting starts with clear horizon and decision context
- temporal validation and leakage prevention are non-negotiable
- simple baselines are strong references and fallback options
- operational reliability matters as much as offline accuracy