IEEE Access, cilt.13, ss.182430-182443, 2025 (SCI-Expanded, Scopus)
Quantitative, type, and anomaly information in crowd videos is critical for smart city and campus applications. Existing approaches generally focus on high-performance counting or anomaly detection within the scope of crowd analysis. Existing studies in crowd counting generate density maps using regressive neural architectures, and counting is performed on these density maps. Approaches focused on anomaly detection, on the other hand, perform some crime classification tasks that cannot be generalized, particularly for dense crowds. In this study, high-performance neural models are developed to perform density, type, and anomaly classification of crowd images and videos. A CNN-based multi-task model was developed for density classification, which both generates the density map and classifies these densities. Type classification is performed with a frame-by-frame ViT model that focuses on identifying attributes of crowd images such as gathering, concert, sports and protest. Finally, the Swin Transformer model is used for multiple classification of dynamic video segments based on anomalies such as running, falling, panic, and violence. The developed models are integrated with Apache Kafka, and the F1-score performance of each module is over 90%. (Density classification: 91.66%, Anomaly classification: 96.63%, Type classification: 90.9%).