Explainability and interpretability are important aspects in ensuring the security of decisions made by intelligent systems (review article)

D. N. Biryukov; A. S. Dudkin

doi:10.17586/2226-1494-2025-25-3-373-386

Explainability and interpretability are important aspects in ensuring the security of decisions made by intelligent systems (review article)

D. N. Biryukov, A. S. Dudkin

https://doi.org/10.17586/2226-1494-2025-25-3-373-386

Full Text:

PDF (Rus) |

Generate QR code

Abstract

The issues of trust in decisions made (formed) by intelligent systems are becoming more and more relevant. A systematic review of Explicable Artificial Intelligence (XAI) methods and tools aimed at bridging the gap between the complexity of neural networks and the need for interpretability of results for end users is presented. A theoretical analysis of the differences between explainability and interpretability in the context of artificial intelligence as well as their role in ensuring the security of decisions made by intelligent systems is carried out. It is shown that explainability implies the ability of a system to generate justifications understandable to humans, whereas interpretability focuses on the passive clarity of internal mechanisms. A classification of XAI methods is proposed based on their approach (preliminary/subsequent analysis: ante hoc/post hoc) and the scale of explanations (local/global). Popular tools, such as Local Interpretable Model Agnostic Explanations, Shapley Values, and integrated gradients, are considered, with an assessment of their strengths and limitations of applicability. Practical recommendations are given on the choice of methods for various fields and scenarios. The architecture of an intelligent system based on the V.K. Finn model and adapted to modern requirements for ensuring “transparency” of solutions, where the key components are the information environment, the problem solver and the intelligent interface, are discussed. The problem of a compromise between the accuracy of models and their explainability is considered: transparent models (“glass boxes”, for example, decision trees) are inferior in performance to deep neural networks, but provide greater certainty of decision-making. Examples of methods and software packages for explaining and interpreting machine learning data and models are provided. It is shown that the development of XAI is associated with the integration of neuro-symbolic approaches combining deep learning capabilities with logical interpretability.

Keywords

artificial intelligence, neural networks, deep learning, black box, explainability, interpretability, XAI

About the Authors

D. N. Biryukov

Mozhaisky Military Aerospace Academy
Russian Federation

Denis N. Biryukov — D.Sc., Professor, Head of Department

Saint Petersburg, 197198

sc 57188163400

A. S. Dudkin

Mozhaisky Military Aerospace Academy
Russian Federation

Andrey S. Dudkin — PhD, Associate Professor, Deputy Head of Department

Saint Petersburg, 197198

sc 57211979130

References

1. Finn V.K. On intelligent data analysis. Novosti Iskusstvennogo Intellekta, 2004, no. 3, pp. 3–18. (in Russian)

2. Finn V.K. Artificial Intelligence: The Idea Base and the Main Product. Proc. of the 9th National Conference on Artificial Intelligence, 2004, vol. 1, pp. 11–20. (in Russian)

3. Biryukov D.N., Lomako A.G., Rostovtsev Yu.G. The appearance of anticipating cyber threats risk prevention systems. SPIIRAS Proceedings, 2015, no. 2(39), pp. 5–25. (in Russian)

4. Biryukov D.N., Lomako A.G. Denotational semantics of knowledge contexts in ontological modeling of subject domains of the conflict. SPIIRAS Proceedings, 2015, no. 5(42), pp. 155–179. (in Russian)

5. Biryukov D.N., Lomako A.G., Zholus R.B. Ontological knowledge system completion based on modeling inferences taking into account role semantics. SPIIRAS Proceedings, 2016, no. 4(47), pp. 105–129. (in Russian). https://doi.org/10.15622/sp.47.6

6. Namatēvs I., Sudars K., Dobrājs A. Interpretability versus explainability: classification for understanding deep learning systems and models. Computer Assisted Methods in Engineering and Science, 2022, vol. 29, no. 4, pp. 297–356. http://dx.doi.org/10.24423/cames.518

7. Gunning D. Explainable artificial intelligence (XAI), 2017. Available at: https://nsarchive.gwu.edu/sites/default/files/documents/5794867/National-Security-Archive-David-Gunning-DARPA.pdf (accessed: 21.10.2024).

8. Varshney K.R. Trustworthy machine learning and artificial intelligence. XRDS: Crossroads, The ACM Magazine for Students, 2019, vol. 25, no. 3, pp. 26–29. https://doi.org/10.1145/3313109

9. Doshi-Velez F., Kim B., Towards a rigorous science of interpretable machine learning. arXiv, 2017, arXiv:1702.08608v2. https://doi.org/10.48550/arXiv.1702.08608

10. Yuan W., Liu P., Neubig G. Can we automate scientific reviewing? arXiv, 2021, arXiv.2102.00176. https://doi.org/10.48550/arXiv.2102.00176

11. Arya V., Bellamy R.K.E., Chen P.-Yu., Dhurandhar A., Hind M., Hoffman S.C., Houde S., Liao V.Q., Luss R., Mojsilović A., et al. One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques. arXiv, 2019, arXiv.1909.03012. https://doi.org/10.48550/arXiv.1909.03012

12. Samek W., Wiegand T., Müller K.-R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv, 2017, arXiv.1708.08296. https://doi.org/10.48550/arXiv.1708.08296

13. Angelov P., Soares E. Towards explainable deep neural networks (xDNN). Neural Net-works, 2020, vol. 130, pp. 185–194. https://doi.org/10.1016/j.neunet.2020.07.010

14. Oh S.J., Augustin M., Schiele B., Fritz M. Towards reverseengineering black-box neural networks. arXiv, 2018, arXiv.1711.01768. https://doi.org/10.48550/arXiv.1711.01768

15. Rai A. Explainable AI: From black box to glass box. Journal of the Academy of Marketing Science, 2020, vol. 48, no. 1, pp. 137–141. https://doi.org/10.1007/s11747-019-00710-5

16. Lipton Z.C. The mythos of model interpretability. arXiv, 2017, arXiv.1606.03490. https://doi.org/10.48550/arXiv.1606.03490

17. Montavon G., Samek W., Müller K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 2018, vol. 73, pp. 1–15. https://doi.org/10.1016/j.dsp.2017.10.011

18. Mascharka D., Tran P., Soklaski R., Majumdar A. Transparency by design: Closing the gap between performance and interpretability in visual reasoning. arXiv, 2018, arXiv:1803.05268. https://doi.org/10.48550/arXiv.1803.05268

19. Beaudouin V., Bloch I., Bounie D., Clémençon S., d’Alché-Buc F., Eagan J., Maxwell W., Mozharovskyi P., Parekh J. Flexible and context-specific AI explainability: A multidisciplinary approach. arXiv, 2020, arXiv:2003.07703v1. https://doi.org/10.48550/arXiv.2003.07703

20. Sokol K., Flach P. Explainability fact sheets: A framework for systematic assessment of explainable approaches. Proc. of the 2020 Conference on Fairness, Accountability, and Transparency (FAT*’20), 2020, pp. 56–67. https://doi.org/10.1145/3351095.3372870

21. Xu F., Uszkoreit H., Du Y., Fan W., Zhao D., Zhu J. Explainable AI: A brief survey on history, research areas, approaches and challenges. Lecture Notes in Computer Science, 2019, vol. 11839, pp. 563–574. https://doi.org/10.1007/978-3-030-32236-6_51

22. Thompson N.C., Greenwald K., Lee K., Manso G.F. The computational limits of deep learning. arXiv, 2020, arXiv:2007.05558. https://doi.org/10.48550/arXiv.2007.05558

23. DuSell B., Chiang D. Learning context-free languages with nondeterministic stack RNNs. Proc. of the 24th Conference on Computational Natural Language Learning, 2020, pp. 507–519. https://doi.org/10.18653/v1/2020.conll-1.41

24. Flambeau J.K.F., Norbert T. Simplifying the explanation of deep neural networks with sufficient and necessary feature-sets: case of text classification. arXiv. 2020, arXiv:2010.03724v2. https://doi.org/10.48550/arXiv.2010.03724

25. Gunning D., Stefik M., Choi J., Miller T., Stumpf S., Yang G.-Z. XAI — Explainable artificial intelligence. Science Robotics, 2019, vol. 4, no. 37, pp. eaay7120. https://doi.org/10.1126/scirobotics.aay7120

26. Gilpin L.H., Bau D., Yuan B.Z., Bajwa A., Specter M., Kagal L. Explaining explanations: an overview of interpretability of machine learning. arXiv, 2018, arXiv:1806.00069v3. https://doi.org/10.48550/arXiv.1806.00069

27. Alber M. Software and application patterns for explanation methods. arXiv, 2019, arXiv:1904.04734v1. https://doi.org/10.48550/arXiv.1904.04734

28. Zhao X., Banks A., Sharp J., Robu V., Flynn D., Fisher M., Huang X. A safety framework for critical systems utilising deep neural networks. arXiv, 2020, arXiv:2003.05311v3. https://doi.org/10.1007/978-3-030-54549-9_16

29. Weller A. Transparency: Motivations and challenges. Lecture Notes in Computer Science, 2019, vol. 11700, pp. 23–40. https://doi.org/10.1007/978-3-030-28954-6_2

30. Raghu M., Schmidt E. A survey of deep learning for scientific discovery. arXiv, 2020, arXiv:2003.11755v1. https://doi.org/10.48550/arXiv.2003.11755

31. Hendricks L.A., Rohrbach A., Schiele B., Darrell T., Akata Z. Generating visual explanations with natural language. Applied AI Letters, 2021, vol. 2, no. 4, pp. e55. https://doi.org/10.1002/ail2.55

32. Kaplan J. McCandlish S., Henighan T., Brown T.B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling laws for neural language models. arXiv, 2020, arXiv:2001.08361v1. https://doi.org/10.48550/arXiv.2001.08361

33. Towell G.G., Shavlik J.W. Extracting refined rules from knowledgebased neural networks. Machine Learning, 1993, vol. 13, no. 1, pp. 71–101. https://doi.org/10.1007/bf00993103

34. Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar, 2025, 392 p.

35. Kim S., Jeong M., Ko B.C. Interpretation and simplification of deep forest. arXiv, 2020, arXiv:2001.04721v4. https://doi.org/10.48550/arXiv.2001.04721

36. Nam W.-J., Gur S., Choi J., Wolf L., Lee S.-W. Relative attributing propagation: interpreting the comparative contributions of individual units in deep neural networks. arXiv, 2019, arXiv:1904.00605v4. https://doi.org/10.48550/arXiv.1904.00605

37. Oramas J.M., Wang K., Tuytelaars T. Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks. arXiv, 2019, arXiv:1712.06302v3. https://doi.org/10.48550/arXiv.1712.06302

38. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. arXiv, 2019, arXiv:1811.10154v3. https://doi.org/10.48550/arXiv.1811.10154

39. Samek W., Montavon G., Vedaldi A., Hansen L.K., Müller K.-R. Explainable AI: interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science, 2019, vol. 11700. 439 p. https://doi.org/10.1007/978-3-030-28954-6

40. Hansen L.K., Rieger L. Interpretability in intelligent systems – A new concept? Lecture Notes in Computer Science, 2019, vol. 11700, pp. 41–49. https://doi.org/10.1007/978-3-030-28954-6_3

41. Liao Q.V., Gruen D., Miller S. Questioning the AI: Informing design practices for explainable AI user experiences. arXiv, 2020, arXiv:2001.02478v2. https://doi.org/10.48550/arXiv.2001.02478

42. Holzinger A., Langs G., Denk H., Zatlouk K., Müller H. Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2019, vol. 9, no. 4, pp. e1312. https://doi.org/10.1002/widm.1312

43. Miller T. Explanation in artificial intelligence: insights from the social sciences. Artificial Intelligence, 2019, vol. 267, pp. 1–38. https://doi.org/10.1016/j.artint.2018.07.007

44. Kulesza T., Burnett M., Wong W., Stumpf S. Principles of explanatory debugging to personalize interactive machine learning. Proc. of the 20th International Conference on Intelligent User Interfaces (IUI ‘15), 2015, pp. 126–137. https://doi.org/10.1145/2678025.2701399

45. Tintarev N. Explaining recommendations. Lecture Notes in Computer Science, 2007, vol. 4511, pp. 470–474. https://doi.org/10.1007/978-3-540-73078-1_67

46. Chrysostomou G., Alertas N. Improving the faithfulness of attentionbased explanations with task-specific information for text classification. arXiv, 2021, arXiv:2105.02657v2. https://doi.org/10.48550/arXiv.2105.02657

47. Vilone G., Longo L. Explainable artificial intelligence: A systematic review. arXiv, 2020, arXiv:2006.00093v3. https://doi.org/10.48550/arXiv.2006.00093

48. Papenmeier A., Englebienne G., Seifert C. How model accuracy and explanation fidelity influence user trust. arXiv, 2019, arXiv:1907.12652v1. https://doi.org/10.48550/arXiv.1907.12652

49. Harutyunyan H. Achille A., Paolini G., Majumder O., Ravichandran A., Bhotika R., Soatto S. Estimating informativeness of samples with smooth unique information. arXiv, 2021, arXiv:2101.06640v1. https://doi.org/10.48550/arXiv.2101.06640

50. Liu S., Wang X., Liu M., Zhu J. Towards better analysis of machine learning models: a visual analytics perspective. Visual Informatics, 2017, vol. 1, no. 1, pp. 48–56. https://doi.org/10.1016/j.visinf.2017.01.006

51. Arrieta A.B., Díaz-Rodríguez N., Del Ser J., Bennetot A., Tabik S., Barbado A., García S., Gil-López S., Molina D., Benjamins R., Chatila R., Herrera F. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 2020, vol. 58, pp. 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

52. Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587. https://doi.org/10.1109/CVPR.2014.81

53. Ancona M., Ceolini E., Özitreli C., Gross M. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv, 2018, arXiv:1711.06104v4. https://doi.org/10.48550/arXiv.1711.06104

54. Rumelhart D.E., Hinton G.E., Williams R.J. Learning internal representations by error propagation. Readings in Cognitive Science: A Perspective from Psychology and Artificial Intelligence, 2013, pp. 399–421.

55. Kindermans P.-J., Hooker S., Adebayo J., Alber M., Schütt K.T., Dähne S., Erhan D., Kim B. The (Un) reliability of saliency methods. Lecture Notes in Computer Science, 2019, vol. 11700, pp. 267–280. https://doi.org/10.1007/978-3-030-28954-6_14

56. Roscher R., Bohn B., Duarte M.F., Garcke J. Explainable machine learning for scientific insights and discoveries. arXiv, 2020, arXiv:1905.08883v3. https://doi.org/10.48550/arXiv.1905.08883

Review

For citations:

Biryukov D.N., Dudkin A.S. Explainability and interpretability are important aspects in ensuring the security of decisions made by intelligent systems (review article). Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(3):373-386. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-3-373-386

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Explainability and interpretability are important aspects in ensuring the security of decisions made by intelligent systems (review article)

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy