- 1.
Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. https://doi.org/10.1016/0364-0213(90)90002-E.
- 2.
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; et al. Time Series Analysis: Forecasting and Control, 5th ed.; In Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2016.
- 3.
Wang, C.; Mohamed, A.S.A. Group Activity Recognition in Computer Vision: A Comprehensive Review, Challenges, and Future Perspectives. arXiv 2023, arXiv:2307.13541.
- 4.
Ibrahim, M.S.; Muralidharan, S.; Deng, Z.; et al. A Hierarchical Deep Temporal Model for Group Activity Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1971–1980. https://doi.org/10.1109/CVPR.2016.217.
- 5.
Shu, T.; Xie, D.; Rothrock, B.; et al. Joint Inference of Groups, Events and Human Roles in Aerial Videos. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015, pp. 4576–4584. https://doi.org/10.1109/CVPR.2015.7299088.
- 6.
Bagautdinov, T.; Alahi, A.; Fleuret, F.; et al. Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3425–3434. https://doi.org/10.1109/CVPR.2017.365.
- 7.
Qi, M.; Qin, J.; Li, A.; et al. stagNet: An Attentive Semantic RNN for Group Activity Recognition. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 104–120. https://doi.org/10.1007/978-3-030-01249-6_7.
- 8.
Ibrahim, M.S.; Mori, G. Hierarchical Relational Networks for Group Activity Recognition and Retrieval. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11027, pp. 742–758. https://doi.org/10.1007/978-3-030-01219-9_44.
- 9.
Wu, J.; Wang, L.; Wang, L.; et al. Learning Actor Relation Graphs for Group Activity Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 9964–9974. https://doi.org/10.1109/CVPR.2019.01020.
- 10.
Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008.
- 11.
Devlin, J.; Chang, M.-W.; Lee, K.; et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2–7 June 2019, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423.
- 12.
Wang, S.; Li, B.Z.; Khabsa, M.; et al. Linformer: Self-Attention with Linear Complexity. arXiv 2020, arXiv:2006.04768.
- 13.
Qin, Z.; Sun, W.; Li, D.; et al. Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models. arXiv 2024, arXiv:2401.04658.
- 14.
Choromanski, K.; Likhosherstov, V.; Dohan, D.; et al. Rethinking Attention with Performers. arXiv 2021. arXiv:2009.14794.
- 15.
Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. In Proceedings of the 10th International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022. Available online: https://openreview.net/forum?id=uYLFoz1vlAC (accessed on 26 September 2025).
- 16.
Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752.
- 17.
Zhang, Y.; Liu, W.; Xu, D.; et al. Bi-Causal: Group Activity Recognition via Bidirectional Causality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 1450–1459. https://doi.org/10.1109/CVPR52733.2024.00144.
- 18.
Li, S.; Cao, Q.; Liu, L.; et al. GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021 (virtual); pp. 13668–13677. https://doi.org/10.1109/ICCV48922.2021.01341.
- 19.
Han, M.; Zhang, D.J.; Wang, Y.; et al. Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 2990–2999. https://doi.org/10.1109/CVPR52688.2022.00300.
- 20.
Lu, L.; Lu, Y.; Yu, R.; et al. GAIM: Graph Attention Interaction Model for Collective Activity Recognition. IEEE Trans. Multimed. 2020, 22, 524–539. https://doi.org/10.1109/TMM.2019.2930344.
- 21.
Yan, R.; Xie, L.; Tang, J.; et al. HiGCIN: Hierarchical Graph-Based Cross Inference Network for Group Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6955–6968. https://doi.org/10.1109/TPAMI.2020.3034233.
- 22.
Pramono, R.R.A.; Fang, W.-H.; Chen, Y.T. Relational Reasoning for Group Activity Recognition via Self-Attention Augmented Conditional Random Field. IEEE Trans. Image Process. 2021, 30, 8184–8199. https://doi.org/10.1109/TIP.2021.3113570.
- 23.
Perez, M.; Liu, J.; Kot, A.C. Skeleton-Based Relational Reasoning for Group Activity Analysis. Pattern Recognit. 2022, 122, 108360. https://doi.org/10.1016/j.patcog.2021.108360.
- 24.
Amer, M.R.; Xie, D.; Zhao, M.; et al. Cost-Sensitive Top-Down/Bottom-Up Inference for Multiscale Activity Recognition. In Computer Vision—ECCV 2012. Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer, Berlin/Heidelberg, Germany, 2012; Volume 7575, pp. 187–200. https://doi.org/10.1007/978-3-642-33765-9_14.
- 25.
Azar, S.M.; Atigh, M.G.; Nickabadi, A.; et al. Convolutional Relational Machine for Group Activity Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 7892–7901. https://doi.org/10.1109/CVPR.2019.00808.
- 26.
Demirel, B.; Ozkan, H. Decompl: Decompositional Learning with Attention Pooling for Group Activity Recognition from a Single Volleyball Image. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 977–983. https://doi.org/10.1109/ICIP51287.2024.10647499.
- 27.
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
- 28.
Zhai, X.; Hu, Z.; Yang, D.; et al. Spatial Temporal Network for Image and Skeleton Based Group Activity Recognition. In Proceedings of the 2022 Asian Conference on Computer Vision (ACCV), Macao, China, 4–8 December 2022; pp. 329–346. https://doi.org/10.1007/978-3-031-26316-3_20.
- 29.
Yuan, H.; Ni, D.; Wang, M. Spatio-Temporal Dynamic Inference Network for Group Activity Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada 11–17 October 2021 (virtual); pp. 7456–7465. https://doi.org/10.1109/ICCV48922.2021.00738.
- 30.
Chung, J.; Gülçehre, C.; Cho, K.; et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555 (accessed on 26 September 2025).
- 31.
Zheng, C.; Ding, W.; Shen, S.; et al. MAF: Multimodal Auto Attention Fusion for Video Classification. In Advances and Trends in Artificial Intelligence: Theory and Applications; Fujita, H., Wang, Y., Xiao, Y., Moonis, A., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 253–264. https://doi.org/10.1007/978-3-031-36819-6_22.
- 32.
Amer, M.R.; Todorovic, S.; Fern, A.; et al. Monte Carlo Tree Search for Scheduling Activity Recognition. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013; pp. 1353–1360. https://doi.org/10.1109/ICCV.2013.171.
- 33.
Amer, M.R.; Lei, P.; Todorovic, S. HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos. In Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 572–585. https://doi.org/10.1007/978-3-319-10599-4_37.
- 34.
Shu, T.; Todorovic, S.; Zhu, S.-C. CERN: Confidence–Energy Recurrent Network for Group Activity Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4255–4263. https://doi.org/10.1109/CVPR.2017.453.
- 35.
Saito, T.; Rehmsmeier, M. The Precision–Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. https://doi.org/10.1371/journal.pone.0118432.
- 36.
Kim, B.; Lee, J.; Kang, J.; et al. Detector-Free Weakly Supervised Group Activity Recognition. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022.
- 37.
Ng, X.L.; Ong, K.E.; Zheng, Q.; et al. Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022.
- 38.
He, K.; Gkioxari, G.; Dollár, P.; et al. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017, pp. 2961–2969. https://doi.org/10.1109/ICCV.2017.322.
- 39.
Tamura, M.; Vishwakarma, R.; Vennelakanti, R. Hunting Group Clues with Transformers for Social Group Activity Recognition. In Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022. https://doi.org/10.1007/978-3-031-19772-7_2.
- 40.
Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. https://doi.org/10.1109/5.18626.
- 41.
Berndt, D.J.; Clifford, J. Using Dynamic Time Warping to Find Patterns in Time Series. In Papers from the AAAI Workshop on Knowledge Discovery in Databases (KDD-94); Technical Report WS-94-03; AAAI Press: Seattle, WA, USA, 1994; pp. 359–370. Available online: https://cdn.aaai.org/Workshops/1994/WS-94-03/WS94-03-031.pdf (accessed on 26 September 2025).
- 42.
Zheng, C.; Dagnew, T.M.; Yang, L.; et al. Animal-JEPA: Advancing animal behavior studies through joint embedding predictive architecture in video analysis. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 1909–1918. https://doi.org/10.1109/BigData62323.2024.10826081.