Pipeline for complex actions recognition in video surveillance systems

Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy


Release:

2022. Vol. 8. № 2 (30)

Title: 
Pipeline for complex actions recognition in video surveillance systems


For citation: Egorov Yu. A., Zakharova I. G. 2022. “Pipeline for complex actions recognition in video surveillance systems”. Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy, vol. 8, no. 2 (30), pp. 165-182. DOI: 10.21684/2411-7978-2022-8-2-165-182

About the authors:

Yurij A. Egorov, Postgraduate Student, University of Tyumen; y.a.egorov@utmn.ru

Irina G. Zakharova, Cand. Sci. (Phys.-Math.), Professor, Department of Software, School of Computer Science, University of Tyumen, Tyumen, Russia; i.g.zakharova@utmn.ru, https://orcid.org/0000-0002-4211-7675

Abstract:

The development of intelligent video surveillance systems is an area of active research, presenting solutions for use in specific environments. In addition, several problems have been formulated that need to be addressed. This is the problem of recognizing complex actions, which consist of sequences of elementary actions and, as a rule, are difficult to classify from a single frame of a video recording.

The present study is devoted to solving the problem of recognizing complex actions on video recordings. The aim of the work is to develop a pipeline for recognizing complex actions that an observed object performs on video recordings. The novelty of the work lies in the approach to action modeling using sequences of elementary actions and a combination of neural networks and stochastic models. The proposed solution can be used to develop intelligent video surveillance systems to ensure security at production facilities, including oil and gas industry facilities.

We analyzed video recordings of objects performing various actions. The features describing complex actions and their properties are singled out. The problem of recognition of complex actions represented by a sequence of elementary actions is formulated. As a result, we developed a pipeline implements a combined approach. Elementary actions are described using a skeletal model in graphical form. Each elementary action is recognized using a convolutional neural network, then complex actions are modeled using a hidden Markov model. The developed pipeline was tested on videos of students, whose actions were divided into two categories: cheating and ordinary actions. As a result of the experiments, the classification accuracy of elementary actions was 0.69 according to the accuracy metric, the accuracy of the binary classification of complex actions was 0.71.

In addition, the constraints of the developed pipeline were indicated and further ways of enhancing the applied approaches were highlighted, in particular, the study of noise immunity.

References:

  1. Egorov Yu. A., Vorobyova M. S., Vorobyov A. M. 2017. “FDET algorithm for building space of classification patterns in graph model”. Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy, vol. 3, no. 3, pp. 125-134. DOI: 10.21684/2411-7978-2017-3-3-125-134 [In Russian]

  2. Egorov Y. A., Zakharova I. G., Gasanov A. R., Filitsin A. A. 2020. “Stichastic modeling for skeleton based human action diagnostics”. Information systems and technologies: Proceedings of the 8th International Scientific Conference, pp. 96-102. [In Russian]

  3. Albanie S., Varlo G., Momeni L., Afouras T., Chung J. S., Fox N., A. Zisserman A. 2020 “BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues”. ECCV 2020: Computer Vision — ECCV 2020, pp. 35-53. DOI: 10.48550/arXiv.2007.12131

  4. Ali S., Bouguila N. 2019. “Variational learning of beta-liouville hidden Markov models for infrared action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPRW). DOI: 10.1109/CVPRW.2019.00119

  5. Aslan M. F., Durdu A., Sabanci K. 2020. “Human action recognition with bag
    of visual words using different machine learning methods and hyperparameter optimization”. Neural Computing and Applications, no. 32, pp. 8585-8597. DOI: 10.1007/s00521-019-04365-9

  6. Bilal M., Maqsood M., Yasmin S., Hasan N. U., Seungmin Rho. 2020. “A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes”. The Journal of Supercomputing, vol. 78, no. 2, pp. 2873-2908. DOI: 10.1007/s11227-021-03957-4

  7. Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu. 2018. “Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation”. IJCAI’18: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 786-792. DOI: 10.48550/arXiv.1804.06055

  8. Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu. 2017. “Skeleton-based action recognition with convolutional neural networks”. 2017 IEEE International
    Conference on Multimedia and Expo Workshops (ICMEW), pp. 597-600. DOI: 10.48550/arXiv.1704.07595

  9. Duta I. C., Uijlings J. R. R., Ionescu B., Aizawa K., Hauptmann A. G., Sebe N. 2017. “Efficient human action recognition using histograms of motion gradients and VLAD with descriptor shape information”. Multimedia Tools and Applications, vol. 76, no. 21, pp 22445-22472. DOI: 10.1007/s11042-017-4795-6

  10. Ghojogh B, Mohammadzade H., Mokari M. 2018. “Fisherposes for Human Action Recognition Using Kinect Sensor Data”. EEE Sensors Journal, vol. 18, no. 4, pp. 1612‑1627. DOI: 10.1109/JSEN.2017.2784425

  11. Guha R., Khan A. H., Singh P. K., Sarkar R., Bhattacharjee D. 2021. “CGA: a new feature selection model for visual human action recognition”. Neural Computing and Applications, no. 33, pp. 5267-5286. DOI: 10.1007/s00521-020-05297-5

  12. Gul M. A., Yousaf M. H., Nawaz S., Rehman Z. U., Kim H. 2020. “Patient monitoring by abnormal human activity recognition based on CNN architecture”. Electronics, vol. 9, no. 12, pp 1-14. DOI: 10.3390/electronics9121993

  13. Hongsong Wang, Liang Wang. 2018. “Learning content and style: Joint action recognition and person identification from human skeletons”. Pattern Recognition, vol. 81, pp. 23-25. DOI: 10.1016/j.patcog.2018.03.030

  14. Kapidis G., Poppe R., van Dam E., Noldus L. P. J. J., Veltkamp R. 2019. “Egocentric hand track and object-based human action recognition”. 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 922-929. DOI: 10.48550/arXiv.1905.00742

  15. Kundu J. N., Gor M., Uppala P. K., Babu R. V. 2019. “Unsupervised feature
    learning of human actions as trajectories in pose embedding manifold”. IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1459-1467. DOI: 10.48550/arXiv.1812.02592

  16. Lan Wang, Chenqiang Gao, Luyu Yang, Yue Zhao, Wangmeng Zuo, Deyu Meng. 2018. “PM-GANs: Discriminative representation learning for action recognition using partial‑modalities”. Proceedings of the European Conference on Computer Vision (ECCV), pp. 384-401. DOI: 10.48550/arXiv.1804.06248

  17. Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu. 2019. “Two-stream adaptive graph convolutional networks for skeleton-based action recognition”. Proceedings
    of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12026‑12035. DOI: 10.48550/arXiv.1805.07694

  18. Lei Wang, Koniusz P., Huynh Du Q. 2019. “Hallucinating IDT descriptors
    and I3D optical flow features for action recognition with CNNs”. Proceedings
    of the IEEE/CVF International Conference on Computer Vision (ICCV).
    Pp. 8698-8708. DOI: 10.48550/arXiv.1906.05910

  19. Ludl D., Gulde T., Curio C. “Simple yet efficient real-time pose-based action recognition”. IEEE Intelligent Transportation Systems Conference (ITSC),
    pp. 581-588. DOI: 10.48550/arXiv.1904.09140

  20. Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian. 2019. “Actional‑structural graph convolutional networks for skeleton-based action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision
    and Pattern Recognition (CVPR), pp. 3595-3603. DOI: 10.48550/arXiv.1904.12659

  21. Mengyuan Liu, Hong Liu, Chen Chen. 2017. “Enhanced skeleton visualization for view invariant human action recognition”. Pattern Recognition, vol. 68, pp. 346-362. DOI: 10.1016/j.patcog.2017.02.030

  22. Nadeem A., Jalal A., Kim K. 2020. “Accurate physical activity recognition using multidimensional features and Markov model for smart health fitness”. Symmetry, vol. 12, no. 11, pp. 1766-1783. DOI: 10.3390/sym12111766

  23. Padoy N. 2019. “Machine and deep learning for workflow recognition during surgery”. Minimally Invasive Therapy and Allied Technologies, no. 28, pp. 82-90. DOI: 10.1080/13645706.2019.1584116

  24. Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng. 2017. “View adaptive recurrent neural networks for high performance human action recognition from skeleton data”. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2117-2126. DOI: 10.48550/arXiv.1703.08274

  25. Rahmani H., Bennamoun M. 2017. “Learning action recognition model from depth and skeleton videos”. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5832-5841. DOI: 10.1109/ICCV.2017.621

  26. Rezazadegan F., Shirazi S., Upcrofit B., Milford M. 2018. “Action recognition: From static datasets to moving robots”. 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3185-3191. DOI: 10.48550/arXiv.1701.04925

  27. Rui Zhao, Wanru Xu, Hui Su, Qiang Ji. 2019. “Bayesian hierarchical dynamic model for human action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7733-7742. DOI: 10.1109/CVPR.2019.00792

  28. Schofield D., Nagrani A., Zisserman A., Hayashi M., Matsuzawa M., Biro D., Carvalho S. 2019. “Chimpanzee face recognition from videos in the wild using deep learning”. Science Advances, vol. 5, no. 9, pp. 1-9. DOI: 10.1126/sciadv.aaw0736

  29. Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu. 2017. “An end-to-end spatio-temporal attention model for human action recognition from skeleton data”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1. DOI: 10.48550/arXiv.1611.06067

  30. Silva V., Soares F., Leão C. P., Esteves J. S., Vercelli G. 2021. “Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network”. Sensors, vol. 21, no. 13, paper 4342. DOI: 10.3390/s21134342

  31. Weizhi Nie, Wei Wang, Xiangdong Huang. 2017. “SRNet: Structured relevance feature learning network from skeleton data for human action recognition”. EEE Access, vol. 7, pp. 132161-132172. DOI: 10.1109/ACCESS.2019.2940281

  32. Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, Liang Wang. 2019. “Relational network for skeleton-based action recognition”. IEEE International Conference on Multimedia and Expo (ICME), pp. 826-831. DOI: 10.48550/arXiv.1805.02556

  33. Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, Jie Zhou. 2018. “Deep progressive reinforcement learning for skeleton-based action recognition”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5323-5332. DOI: 10.1109/CVPR.2018.00558

  34. Yi-Fan Song, Zhang Zhang, Caifeng Shan, Liang Wang. 2020. “Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition”. Proceedings of the 28th ACM International Conference on Multimedia, pp. 1625-1633. DOI: 10.1145/3394171.3413802

  35. Zhiguo Pan, Chao Li. 2020. “Robust basketball sports recognition by leveraging motion block estimation”. Signal Processing: Image Communication, vol. 83. paper 115784. DOI: 10.1016/j.image.2020.115784

  36. Zhouning Du, Hiroaki Mukaidani, Ramasamy Saravanakumar. 2020. “Action recognition based on linear dynamical systems with deep features in videos”. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2634-2639. DOI: 10.1109/SMC42975.2020.9283429

  37. Zhumazhanova S. S., Sulavko A. E., Ponomarev D. B., Pasenchuk V. A. 2019. “Statistical approach for subject’s state identification by face and neck thermograms with small training sample”. IFAC-PapersOnLine, vol. 52, no. 25, pp. 46-51. DOI: 10.1016/j.ifacol.2019.12.444

  38. Zi-Hao Lin, Albert Y. Chen, Shang-Hsien Hsieh. 2021. “Temporal image analytics for abnormal construction activity identification”. Automation in Construction, vol. 124. paper 103572. DOI: 10.1016/j.autcon.2021.103572