Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Creation and analysis of multimodal corpus for aggressive behavior recognition

https://doi.org/10.17586/2226-1494-2024-24-5-834-842

Abstract

The development of digital communication systems is associated with the increasing number of disruptive behavior incidents that require rapid response in order to prevent negative consequences. Due to weak formalization of human aggression, machine learning approaches are the most suitable for this area. Machine learning approaches require representative sets of relevant data for efficient aggression recognition. Datasets developing implies such problems as dataset labels relevance to the real behavior, the consistency of the situations, where behavior is manifested, and the naturalness of behavior. The purpose of this work is the development of an aggressive behavior datasets creation methodology that reflects the key aspects of aggression and provides relevant data. The work reveals the developed methodology for creation of multimodal datasets of natural aggression behavior. The analysis of human aggression subject area substantiates the key aspects of human aggression manifestations (the presence of subject and object of aggression, the destructiveness of the aggressive action), the behavior analysis units — the time intervals of audio and video with the localized informants, defines considering types of aggression (physical and verbal overt direct aggression), substantiates criteria for aggressive behavior assessment as a set of aggressive actions that define each aggression type. The methodology consists of the following stages: collecting video on the Internet, identifying time intervals where aggression is performed, localizing informants in video frames, transcribing informants’ speech, collective labeling of physical and verbal aggression actions by a group of annotators (raters), assessing the reliability of annotations agreement using Fleiss’ kappa coefficient. In order to evaluate the methodology a new audiovisual aggressive behavior in online streams corpus (AVABOS) was collected and labeled. The dataset contains audio and video segments that contains verbal and physical aggression correspondingly that manifested by Russian-speaking informants during online video streams. The results of interrater agreement reliability show substantial agreement for physical (κ = 0.74) and moderate agreement for verbal aggression (κ = 0.48) that substantiates the developed methodology. AVABOS dataset can be used in automatic aggression recognition tasks. The developed methodology can also be used for creating datasets with the other types of behavior.

About the Authors

M. Yu. Uzdiaev
St. Petersburg Federal Research Center of the Russian Academy of Sciences
Russian Federation

Mikhail Yu. Uzdiaev - Junior Researcher

Saint Petersburg, 199178



A. A. Karpov
St. Petersburg Federal Research Center of the Russian Academy of Sciences
Russian Federation

Alexey A. Karpov - D.Sc., Professor, Head of Laboratory

Saint Petersburg, 199178



References

1. Lefter I., Rothkrantz L.J.M., Burghouts G.J. A comparative study on automatic audio–visual fusion for aggression detection using metainformation. Pattern Recognition Letters, 2013, vol. 34, no. 15, pp. 1953–1963. https://doi.org/10.1016/j.patrec.2013.01.002

2. Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces, 2014, vol. 8, no. 1, pp. 29–41. https://doi.org/10.1007/s12193-014-0150-7

3. Lefter I., Jonker C.M., Tuente S.K., Veling W., Bogaerts S. NAA: A multimodal database of negative affect and aggression. Proc. of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 21–27. https://doi.org/10.1109/ACII.2017.8273574

4. Sernani P., Falcionelli N., Tomassini S., Contardo P., Dragoni A.F. Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access, 2021, vol. 9, pp. 160580–160595. https://doi.org/10.1109/ACCESS.2021.3131315

5. Ciampi L., Foszner P., Messina N., Staniszewski M., Gennaro C., Falchi F., Serao G., Cogiel M., Golba D., Szczęsna A., Amato G. Bus violence: An open benchmark for video violence detection on public transport. Sensors, 2022, vol. 22, no. 21, pp. 8345. https://doi.org/10.3390/s22218345

6. Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos. Proc. of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2662–2666. https://doi.org/10.1109/ICASSP.2019.8683676

7. Cheng M., Cai K., Li M. RWF-2000: An open large scale video database for violence detection. Proc. of the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4183–4190. https://doi.org/10.1109/ICPR48806.2021.9412502

8. Potapova R., Komalova L. On principles of annotated databases of the semantic field “aggression”. Lecture Notes in Computer Science, 2014, vol. 8773, pp. 322–328. https://doi.org/10.1007/978-3-319-11581-8_40

9. Apanasovich K.S., Makhnytkina O.V., Kabarov V.I., Dalevskaya O.P. RuPersonaChat: a dialog corpus for personalizing conversational agents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 214–221. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-214-221

10. Hassoun Al-Jawad M.M., Alharbi H., Almukhtar A.F., Alnawas A.A. Constructing twitter corpus of Iraqi Arabic Dialect (CIAD) for sentiment analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 22, no. 2, pp. 308–316. https://doi.org/10.17586/2226-1494-2022-22-2-308-316

11. Busso C., Bulut M., Lee C., Kazemzadeh A., Mower E., Kim S., Chang J.N., Lee S., Narayanan S.S. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, vol. 42, no. 4, pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6

12. Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. Lecture Notes in Computer Science, 2018, vol. 11096, pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52

13. Ringeval F., Sonderegger A., Sauer J., Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–8. https://doi.org/10.1109/FG.2013.6553805

14. Busso C., Parthasarathy S., Burmania A., AbdelWahab M., Sadoughi N., Provost E.M. MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2017, vol. 8, no. 1, pp. 67–80. https://doi.org/10.1109/TAFFC.2016.2515617

15. Enikolopov S.N. The concept of aggression in the contemporary psychology. Prikladnaja psihologija, 2001, no. 1, pp. 60–72. (in Russian)

16. Groth-Marnat G., Wright A.J. Handbook of Psychological Assessment. John Wiley & Sons, 2016, 824 p.

17. Uzdiaev M., Vatamaniuk I. Investigation of manifestations of aggressive behavior by users of sociocyberphysical systems on video. Lecture Notes in Networks and Systems, 2021, vol. 231, pp. 593–604. https://doi.org/10.1007/978-3-030-90321-3_49

18. Buss A.H. The Psychology of Aggression. John Wiley & Sons, 1961, 307 p. https://doi.org/10.1037/11160-000

19. Radford A., Kim J.W., Xu T., Brockman G., McLeavey C., Sutskever I. Robust speech recognition via large-scale weak supervision. International conference on machine learning (PMLR), 2023, vol. 202, pp. 28492–28518.

20. Plaquet A., Bredin H. Powerset multi-class cross entropy loss for neural speaker diarization. Proc. of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023, pp. 3222–3226. https://doi.org/10.21437/Interspeech.2023-205

21. Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, 2009, vol. 41, no. 3, pp. 841–849. https://doi.org/10.3758/BRM.41.3.841

22. Fleiss J.L. Measuring nominal scale agreement among many raters. Psychological Bulletin, 1971, vol. 76, no. 5, pp. 378–382. https://doi.org/10.1037/h0031619

23. Uzdiaev M.Iu., Karpov A.A. Audiovisual Aggressive Behavior in Online Streams dataset – AVABOS. Certificate of state registration of the database 2022623239, 2022. (in Russian)

24. Landis J.R., Koch G.G. The measurement of observer agreement for categorical data. Biometrics, 1977, vol. 33, no. 1, pp. 159–174. https:// doi.org/10.2307/2529310

25. Fleiss J.L., Levin B., Paik M.C. Statistical Methods for Rates and Proportions. John Wiley & Sons, 2013, 800 p.


Review

For citations:


Uzdiaev M.Yu., Karpov A.A. Creation and analysis of multimodal corpus for aggressive behavior recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(5):834-842. (In Russ.) https://doi.org/10.17586/2226-1494-2024-24-5-834-842

Views: 16


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)