------------------------------------------------------------ ------------------------------------------------------------| 1. DATA SOURCES | | 2. INGESTION & FRAME PROCESSING |
| COCO, OpenImages, and custom CCTV video feeds supply | --> | Video streams are decoded into frames at target FPS. |
| diverse visual scenes. Data includes indoor, outdoor, | | Frames are resized, normalized, and deduped. |
| and low-light frames. Annotation quality is manually | | Metadata such as timestamps and camera IDs are |
| validated to ensure objects and bounding boxes are | | preserved. Preprocessing ensures consistent luminance |
| accurately labeled for detection training. | | and reduces noise for cleaner detection accuracy. |
------------------------------------------------------------ ------------------------------------------------------------ | v------------------------------------------------------------ ------------------------------------------------------------| 4. MODEL TRAINING (YOLOv8 Optimization) | | 3. ROI EXTRACTION & AUGMENTATION |
| YOLOv8 is trained using multi-scale anchors and | <-- | Regions of interest (ROIs) are identified and cropped |
| augmented datasets. Hyperparameter sweeps optimize | | for targeted augmentation. Techniques include brightness |
| performance across object sizes. mAP metrics guide | | shifts, blur simulation, occlusions, and motion artifacts|
| improvements. Regularization and fine-tuning ensure | | to enhance model robustness across real-world scenes. |
| stability under extreme lighting or motion conditions. | ------------------------------------------------------------
------------------------------------------------------------ |
| v
v------------------------------------------------------------ ------------------------------------------------------------| 5. INFERENCE & TRACKING | | 6. VISUALIZATION & FEEDBACK LOOP |
| The deployed model performs real-time detection on | --> | Detection outputs stream to dashboards showing bounding |
| GPU/edge accelerators. Multi-object tracking assigns | | boxes and confidence scores. Security teams review |
| persistent identities across video frames. System | | false positives/negatives. Continuous learning adjusts |
| supports alert generation for restricted or unusual | | thresholds and retrains models using new video data. |
| activity zones. | | |
------------------------------------------------------------ ------------------------------------------------------------