Pipeline for complex actions recognition in video surveillance systems

Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy

Release:

2022. Vol. 8. № 2 (30)

Title:

Pipeline for complex actions recognition in video surveillance systems

Authors: Yurij A. Egorov, Irina G. Zakharova

For citation: Egorov Yu. A., Zakharova I. G. 2022. “Pipeline for complex actions recognition in video surveillance systems”. Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy, vol. 8, no. 2 (30), pp. 165-182. DOI: 10.21684/2411-7978-2022-8-2-165-182

About the authors:

Yurij A. Egorov, Postgraduate Student, University of Tyumen; y.a.egorov@utmn.ru

Irina G. Zakharova, Cand. Sci. (Phys.-Math.), Professor, Department of Software, School of Computer Science, University of Tyumen, Tyumen, Russia; i.g.zakharova@utmn.ru, https://orcid.org/0000-0002-4211-7675

Full text download file

Abstract:

The development of intelligent video surveillance systems is an area of active research, presenting solutions for use in specific environments. In addition, several problems have been formulated that need to be addressed. This is the problem of recognizing complex actions, which consist of sequences of elementary actions and, as a rule, are difficult to classify from a single frame of a video recording.

The present study is devoted to solving the problem of recognizing complex actions on video recordings. The aim of the work is to develop a pipeline for recognizing complex actions that an observed object performs on video recordings. The novelty of the work lies in the approach to action modeling using sequences of elementary actions and a combination of neural networks and stochastic models. The proposed solution can be used to develop intelligent video surveillance systems to ensure security at production facilities, including oil and gas industry facilities.

We analyzed video recordings of objects performing various actions. The features describing complex actions and their properties are singled out. The problem of recognition of complex actions represented by a sequence of elementary actions is formulated. As a result, we developed a pipeline implements a combined approach. Elementary actions are described using a skeletal model in graphical form. Each elementary action is recognized using a convolutional neural network, then complex actions are modeled using a hidden Markov model. The developed pipeline was tested on videos of students, whose actions were divided into two categories: cheating and ordinary actions. As a result of the experiments, the classification accuracy of elementary actions was 0.69 according to the accuracy metric, the accuracy of the binary classification of complex actions was 0.71.

In addition, the constraints of the developed pipeline were indicated and further ways of enhancing the applied approaches were highlighted, in particular, the study of noise immunity.

Keywords:

References:

Egorov Yu. A., Vorobyova M. S., Vorobyov A. M. 2017. “FDET algorithm for building space of classification patterns in graph model”. Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy, vol. 3, no. 3, pp. 125-134. DOI: 10.21684/2411-7978-2017-3-3-125-134 [In Russian]
Egorov Y. A., Zakharova I. G., Gasanov A. R., Filitsin A. A. 2020. “Stichastic modeling for skeleton based human action diagnostics”. Information systems and technologies: Proceedings of the 8^th International Scientific Conference, pp. 96-102. [In Russian]
Albanie S., Varlo G., Momeni L., Afouras T., Chung J. S., Fox N., A. Zisserman A. 2020 “BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues”. ECCV 2020: Computer Vision — ECCV 2020, pp. 35-53. DOI: 10.48550/arXiv.2007.12131
Ali S., Bouguila N. 2019. “Variational learning of beta-liouville hidden Markov models for infrared action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPRW). DOI: 10.1109/CVPRW.2019.00119
Aslan M. F., Durdu A., Sabanci K. 2020. “Human action recognition with bag
of visual words using different machine learning methods and hyperparameter optimization”. Neural Computing and Applications, no. 32, pp. 8585-8597. DOI: 10.1007/s00521-019-04365-9
Bilal M., Maqsood M., Yasmin S., Hasan N. U., Seungmin Rho. 2020. “A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes”. The Journal of Supercomputing, vol. 78, no. 2, pp. 2873-2908. DOI: 10.1007/s11227-021-03957-4
Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu. 2018. “Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation”. IJCAI’18: Proceedings of the 27^th International Joint Conference on Artificial Intelligence, pp. 786-792. DOI: 10.48550/arXiv.1804.06055
Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu. 2017. “Skeleton-based action recognition with convolutional neural networks”. 2017 IEEE International
Conference on Multimedia and Expo Workshops (ICMEW), pp. 597-600. DOI: 10.48550/arXiv.1704.07595
Duta I. C., Uijlings J. R. R., Ionescu B., Aizawa K., Hauptmann A. G., Sebe N. 2017. “Efficient human action recognition using histograms of motion gradients and VLAD with descriptor shape information”. Multimedia Tools and Applications, vol. 76, no. 21, pp 22445-22472. DOI: 10.1007/s11042-017-4795-6
Ghojogh B, Mohammadzade H., Mokari M. 2018. “Fisherposes for Human Action Recognition Using Kinect Sensor Data”. EEE Sensors Journal, vol. 18, no. 4, pp. 1612‑1627. DOI: 10.1109/JSEN.2017.2784425
Guha R., Khan A. H., Singh P. K., Sarkar R., Bhattacharjee D. 2021. “CGA: a new feature selection model for visual human action recognition”. Neural Computing and Applications, no. 33, pp. 5267-5286. DOI: 10.1007/s00521-020-05297-5
Gul M. A., Yousaf M. H., Nawaz S., Rehman Z. U., Kim H. 2020. “Patient monitoring by abnormal human activity recognition based on CNN architecture”. Electronics, vol. 9, no. 12, pp 1-14. DOI: 10.3390/electronics9121993
Hongsong Wang, Liang Wang. 2018. “Learning content and style: Joint action recognition and person identification from human skeletons”. Pattern Recognition, vol. 81, pp. 23-25. DOI: 10.1016/j.patcog.2018.03.030
Kapidis G., Poppe R., van Dam E., Noldus L. P. J. J., Veltkamp R. 2019. “Egocentric hand track and object-based human action recognition”. 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 922-929. DOI: 10.48550/arXiv.1905.00742
Kundu J. N., Gor M., Uppala P. K., Babu R. V. 2019. “Unsupervised feature
learning of human actions as trajectories in pose embedding manifold”. IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1459-1467. DOI: 10.48550/arXiv.1812.02592
Lan Wang, Chenqiang Gao, Luyu Yang, Yue Zhao, Wangmeng Zuo, Deyu Meng. 2018. “PM-GANs: Discriminative representation learning for action recognition using partial‑modalities”. Proceedings of the European Conference on Computer Vision (ECCV), pp. 384-401. DOI: 10.48550/arXiv.1804.06248
Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu. 2019. “Two-stream adaptive graph convolutional networks for skeleton-based action recognition”. Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12026‑12035. DOI: 10.48550/arXiv.1805.07694
Lei Wang, Koniusz P., Huynh Du Q. 2019. “Hallucinating IDT descriptors
and I3D optical flow features for action recognition with CNNs”. Proceedings
of the IEEE/CVF International Conference on Computer Vision (ICCV).
Pp. 8698-8708. DOI: 10.48550/arXiv.1906.05910
Ludl D., Gulde T., Curio C. “Simple yet efficient real-time pose-based action recognition”. IEEE Intelligent Transportation Systems Conference (ITSC),
pp. 581-588. DOI: 10.48550/arXiv.1904.09140
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian. 2019. “Actional‑structural graph convolutional networks for skeleton-based action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 3595-3603. DOI: 10.48550/arXiv.1904.12659
Mengyuan Liu, Hong Liu, Chen Chen. 2017. “Enhanced skeleton visualization for view invariant human action recognition”. Pattern Recognition, vol. 68, pp. 346-362. DOI: 10.1016/j.patcog.2017.02.030
Nadeem A., Jalal A., Kim K. 2020. “Accurate physical activity recognition using multidimensional features and Markov model for smart health fitness”. Symmetry, vol. 12, no. 11, pp. 1766-1783. DOI: 10.3390/sym12111766
Padoy N. 2019. “Machine and deep learning for workflow recognition during surgery”. Minimally Invasive Therapy and Allied Technologies, no. 28, pp. 82-90. DOI: 10.1080/13645706.2019.1584116
Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, Nanning Zheng. 2017. “View adaptive recurrent neural networks for high performance human action recognition from skeleton data”. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2117-2126. DOI: 10.48550/arXiv.1703.08274
Rahmani H., Bennamoun M. 2017. “Learning action recognition model from depth and skeleton videos”. Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5832-5841. DOI: 10.1109/ICCV.2017.621
Rezazadegan F., Shirazi S., Upcrofit B., Milford M. 2018. “Action recognition: From static datasets to moving robots”. 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3185-3191. DOI: 10.48550/arXiv.1701.04925
Rui Zhao, Wanru Xu, Hui Su, Qiang Ji. 2019. “Bayesian hierarchical dynamic model for human action recognition”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7733-7742. DOI: 10.1109/CVPR.2019.00792
Schofield D., Nagrani A., Zisserman A., Hayashi M., Matsuzawa M., Biro D., Carvalho S. 2019. “Chimpanzee face recognition from videos in the wild using deep learning”. Science Advances, vol. 5, no. 9, pp. 1-9. DOI: 10.1126/sciadv.aaw0736
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu. 2017. “An end-to-end spatio-temporal attention model for human action recognition from skeleton data”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1. DOI: 10.48550/arXiv.1611.06067
Silva V., Soares F., Leão C. P., Esteves J. S., Vercelli G. 2021. “Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network”. Sensors, vol. 21, no. 13, paper 4342. DOI: 10.3390/s21134342
Weizhi Nie, Wei Wang, Xiangdong Huang. 2017. “SRNet: Structured relevance feature learning network from skeleton data for human action recognition”. EEE Access, vol. 7, pp. 132161-132172. DOI: 10.1109/ACCESS.2019.2940281
Wu Zheng, Lin Li, Zhaoxiang Zhang, Yan Huang, Liang Wang. 2019. “Relational network for skeleton-based action recognition”. IEEE International Conference on Multimedia and Expo (ICME), pp. 826-831. DOI: 10.48550/arXiv.1805.02556
Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, Jie Zhou. 2018. “Deep progressive reinforcement learning for skeleton-based action recognition”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5323-5332. DOI: 10.1109/CVPR.2018.00558
Yi-Fan Song, Zhang Zhang, Caifeng Shan, Liang Wang. 2020. “Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition”. Proceedings of the 28^th ACM International Conference on Multimedia, pp. 1625-1633. DOI: 10.1145/3394171.3413802
Zhiguo Pan, Chao Li. 2020. “Robust basketball sports recognition by leveraging motion block estimation”. Signal Processing: Image Communication, vol. 83. paper 115784. DOI: 10.1016/j.image.2020.115784
Zhouning Du, Hiroaki Mukaidani, Ramasamy Saravanakumar. 2020. “Action recognition based on linear dynamical systems with deep features in videos”. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2634-2639. DOI: 10.1109/SMC42975.2020.9283429
Zhumazhanova S. S., Sulavko A. E., Ponomarev D. B., Pasenchuk V. A. 2019. “Statistical approach for subject’s state identification by face and neck thermograms with small training sample”. IFAC-PapersOnLine, vol. 52, no. 25, pp. 46-51. DOI: 10.1016/j.ifacol.2019.12.444
Zi-Hao Lin, Albert Y. Chen, Shang-Hsien Hsieh. 2021. “Temporal image analytics for abnormal construction activity identification”. Automation in Construction, vol. 124. paper 103572. DOI: 10.1016/j.autcon.2021.103572