------------------------------------| 1. DATA SOURCES |
| POS sales records, inventory |
| logs, weather APIs, and promo |
| calendars feed forecasting. |
| Kaggle retail datasets support |
| benchmarking. Data completeness |
| and temporal alignment checked. |
------------------------------------ | | v ----------------------------------- | 2. INGESTION & PREPROCESSING |
| PySpark pipelines ingest and |
| transform rows, resolving |
| missing timestamps. Outliers |
| smoothed. Distributed feature |
| extraction enables efficient |
| large-scale preparation. |
----------------------------------- | | v -------------------------------- | 3. FEATURE ENGINEERING |
| Lagged features, rolling |
| windows, seasonal indicators,|
| and event-based metadata |
| engineered. Correlation |
| screening prioritizes |
| predictors. |
-------------------------------- | | v --------------------------------- | 4. MODEL TRAINING (Prophet + |
| XGBoost Hybrid) |
| Hybrid models capture |
| seasonality (Prophet) and |
| nonlinear interactions |
| (XGBoost). |
--------------------------------- | | v ----------------------------------- | 5. INFERENCE & SERVING |
| Forecasts computed in batch |
| and published to BI dashboards. |
| APIs provide near real-time |
| updates. |
----------------------------------- | | v ----------------------------------- | 6. MONITORING & FEEDBACK LOOP |
| Forecast accuracy tracked by |
| SKU. Drift triggers retraining. |
| Feedback refines features. |
-----------------------------------