Объяснимость и интерпретируемость — важные аспекты безопасности решений, принимаемых интеллектуальными системами (обзорная статья)

Д. Н. Бирюков; А. С. Дудкин

doi:10.17586/2226-1494-2025-25-3-373-386

Объяснимость и интерпретируемость — важные аспекты безопасности решений, принимаемых интеллектуальными системами (обзорная статья)

Д. Н. Бирюков, А. С. Дудкин

https://doi.org/10.17586/2226-1494-2025-25-3-373-386

Полный текст:

PDF (Rus) |

сгенерировать QR код

Аннотация

Вопросы доверия к решениям, принимаемыми (формируемыми) интеллектуальными системами, становятся все более актуальными. Представлен систематический обзор методов и инструментов объяснимого искусственного интеллекта (Explainable Artificial Intelligence, XAI), направленных на преодоление разрыва между сложностью нейронных сетей и потребностью в интерпретируемости результатов для конечных пользователей. Проведен теоретический анализ различий между объяснимостью и интерпретируемостью в контексте искусственного интеллекта, а также их роли в обеспечении безопасности решений, принимаемых интеллектуальными системами. Показано, что объяснимость подразумевает способность системы генерировать понятные человеку обоснования, тогда как интерпретируемость сосредоточена на пассивной понятности внутренних механизмов. Предложена классификация методов XAI на основе их подхода (предварительный/последующий анализ: ante hoc/post hoc) и масштаба объяснений (локальный/глобальный). Рассмотрены популярные инструменты, такие как Local Interpretable Model Agnostic Explanations, Shapley Values и интегрированные градиенты, с оценкой их сильных сторон и ограничений применимости. Даны практические рекомендации по выбору методов для различных областей и сценариев. Обсуждается архитектура интеллектуальной системы, построенной на основе модели В.К. Финна, и адаптированной к современным требованиям к обеспечению «прозрачности» решений, где ключевыми компонентами являются информационная среда, решатель задач и интеллектуальный интерфейс. Рассмотрена проблема компромисса между точностью моделей и их объяснимостью: прозрачные модели («стеклянные ящики», например, деревья решений) уступают в производительности глубоким нейронным сетям, но обеспечивают большую бесспорность принятия решений. Приведены примеры методов и программных пакетов для объяснения и интерпретации данных и моделей машинного обучения. Показано, что развитие XAI связано с интеграцией нейро-символических подходов, объединяющих возможности глубокого обучения с логической интерпретируемостью.

Ключевые слова

искусственный интеллект, нейронные сети, глубокое обучение, «черный ящик», объяснимость, интерпретируемость, XAI

Об авторах

Д. Н. Бирюков

Военно-космическая академия имени А.Ф. Можайского
Россия

Бирюков Денис Николаевич — доктор технических наук, профессор, начальник кафедры

Санкт-Петербург, 197198

sc 57188163400

А. С. Дудкин

Военно-космическая академия имени А.Ф. Можайского
Россия

Дудкин Андрей Сергеевич — кандидат технических наук, доцент, заместитель начальника кафедры

Санкт-Петербург, 197198

sc 57211979130

Список литературы

1. Финн В.К. Об интеллектуальном анализе данных // Новости искусственного интеллекта. 2004. № 3. С. 3–18.

2. Финн В.К. Искусственный интеллект: Идейная база и основной продукт // IX Национальная конференция «Искусственный интеллект-2004». 2004. Т. 1. С. 11–20.

3. Бирюков Д.Н., Ломако А.Г., Ростовцев Ю.Г. Облик антиципирующих систем предотвращения рисков реализации киберугроз // Труды CПИИРАН. 2015. № 2(39). С. 5–25.

4. Бирюков Д.Н., Ломако А.Г. Денотационная семантика контекстов знаний при онтологическом моделировании предметных областей конфликта // Труды CПИИРАН. 2015. № 5(42). С. 155–179.

5. Бирюков Д.Н., Ломако А.Г., Жолус Р.Б. Пополнение онтологических систем знаний на основе моделирования умозаключений с учетом семантики ролей // Труды СПИИРАН. 2016. № 4(47). С. 105–129. https://doi.org/10.15622/sp.47.6

6. Namatēvs I., Sudars K., Dobrājs A. Interpretability versus explainability: classification for understanding deep learning systems and models // Computer Assisted Methods in Engineering and Science. 2022. V. 29. N 4. P. 297–356. http://dx.doi.org/10.24423/cames.518

7. Gunning D. Explainable artificial intelligence (XAI). 2017. [Электронный ресурс]. URL: https://nsarchive.gwu.edu/sites/default/files/documents/5794867/National-Security-Archive-DavidGunning-DARPA.pdf (дата обращения: 21.10.2024).

8. Varshney K.R. Trustworthy machine learning and artificial intelligence // XRDS: Crossroads, The ACM Magazine for Students. 2019. V. 25. N 3. P. 26–29. https://doi.org/10.1145/3313109

9. Doshi-Velez F., Kim B., Towards a rigorous science of interpretable machine learning // arXiv. 2017. arXiv:1702.08608v2. https://doi.org/10.48550/arXiv.1702.08608

10. Yuan W., Liu P., Neubig G. Can we automate scientific reviewing? // arXiv. 2021. arXiv.2102.00176. https://doi.org/10.48550/arXiv.2102.00176

11. Arya V., Bellamy R.K.E., Chen P.-Yu., Dhurandhar A., Hind M., Hoffman S.C., Houde S., Liao V.Q., Luss R., Mojsilović A., et al. One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques // arXiv. 2019. arXiv.1909.03012. https://doi.org/10.48550/arXiv.1909.03012

12. Samek W., Wiegand T., Müller K.-R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models // arXiv. 2017. arXiv.1708.08296. https://doi.org/10.48550/arXiv.1708.08296

13. Angelov P., Soares E. Towards explainable deep neural networks (xDNN) // Neural Net- works. 2020. V. 130. P. 185–194. https://doi.org/10.1016/j.neunet.2020.07.010

14. Oh S.J., Augustin M., Schiele B., Fritz M. Towards reverseengineering black-box neural networks // arXiv. 2018. arXiv.1711.01768. https://doi.org/10.48550/arXiv.1711.01768

15. Rai A. Explainable AI: From black box to glass box // Journal of the Academy of Marketing Science. 2020. V. 48. N 1. P. 137–141. https://doi.org/10.1007/s11747-019-00710-5

16. Lipton Z.C. The mythos of model interpretability // arXiv. 2017. arXiv.1606.03490. https://doi.org/10.48550/arXiv.1606.03490

17. Montavon G., Samek W., Müller K.-R. Methods for interpreting and understanding deep neural networks // Digital Signal Processing. 2018. V. 73. P. 1–15. https://doi.org/10.1016/j.dsp.2017.10.011

18. Mascharka D., Tran P., Soklaski R., Majumdar A. Transparency by design: Closing the gap between performance and interpretability in visual reasoning // arXiv. 2018. arXiv:1803.05268. https://doi.org/10.48550/arXiv.1803.05268

19. Beaudouin V., Bloch I., Bounie D., Clémençon S., d’Alché-Buc F., Eagan J., Maxwell W., Mozharovskyi P., Parekh J. Flexible and context-specific AI explainability: A multidisciplinary approach // arXiv. 2020. arXiv:2003.07703v1. https://doi.org/10.48550/arXiv.2003.07703

20. Sokol K., Flach P. Explainability fact sheets: A framework for systematic assessment of explainable approaches // Proc. of the 2020 Conference on Fairness, Accountability, and Transparency (FAT*’20). 2020. P. 56–67. https://doi.org/10.1145/3351095.3372870

21. Xu F., Uszkoreit H., Du Y., Fan W., Zhao D., Zhu J. Explainable AI: A brief survey on history, research areas, approaches and challenges // Lecture Notes in Computer Science. 2019. V. 11839. P. 563–574. https://doi.org/10.1007/978-3-030-32236-6_51

22. Thompson N.C., Greenwald K., Lee K., Manso G.F. The computational limits of deep learning // arXiv. 2020. arXiv:2007.05558. https://doi.org/10.48550/arXiv.2007.05558

23. DuSell B., Chiang D. Learning context-free languages with nondeterministic stack RNNs // Proc. of the 24th Conference on Computational Natural Language Learning. 2020. P. 507–519. https://doi.org/10.18653/v1/2020.conll-1.41

24. Flambeau J.K.F., Norbert T. Simplifying the explanation of deep neural networks with sufficient and necessary feature-sets: case of text classification // arXiv. 2020. arXiv:2010.03724v2. https://doi.org/10.48550/arXiv.2010.03724

25. Gunning D., Stefik M., Choi J., Miller T., Stumpf S., Yang G.-Z. XAI — Explainable artificial intelligence // Science Robotics. 2019. V. 4. N 37. P. eaay7120. https://doi.org/10.1126/scirobotics.aay7120

26. Gilpin L.H., Bau D., Yuan B.Z., Bajwa A., Specter M., Kagal L. Explaining explanations: an overview of interpretability of machine learning // arXiv. 2018. arXiv:1806.00069v3. https://doi.org/10.48550/arXiv.1806.00069

27. Alber M. Software and application patterns for explanation methods // arXiv. 2019. arXiv:1904.04734v1. https://doi.org/10.48550/arXiv.1904.04734

28. Zhao X., Banks A., Sharp J., Robu V., Flynn D., Fisher M., Huang X. A safety framework for critical systems utilising deep neural networks // arXiv. 2020. arXiv:2003.05311v3. https://doi.org/10.1007/978-3-030-54549-9_16

29. Weller A. Transparency: Motivations and challenges // Lecture Notes in Computer Science. 2019. V. 11700. P. 23–40. https://doi.org/10.1007/978-3-030-28954-6_2

30. Raghu M., Schmidt E. A survey of deep learning for scientific discovery // arXiv. 2020. arXiv:2003.11755v1. https://doi.org/10.48550/arXiv.2003.11755

31. Hendricks L.A., Rohrbach A., Schiele B., Darrell T., Akata Z. Generating visual explanations with natural language // Applied AI Letters. 2021. V. 2. N 4. P. e55. https://doi.org/10.1002/ail2.55

32. Kaplan J. McCandlish S., Henighan T., Brown T.B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D. Scaling laws for neural language models // arXiv. 2020. arXiv:2001.08361v1. https://doi.org/10.48550/arXiv.2001.08361

33. Towell G.G., Shavlik J.W. Extracting refined rules from knowledgebased neural networks // Machine Learning. 1993. V. 13. N 1. P. 71–101. https://doi.org/10.1007/bf00993103

34. Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar, 2025. 392 p.

35. Kim S., Jeong M., Ko B.C. Interpretation and simplification of deep forest // arXiv. 2020. arXiv:2001.04721v4. https://doi.org/10.48550/arXiv.2001.04721

36. Nam W.-J., Gur S., Choi J., Wolf L., Lee S.-W. Relative attributing propagation: interpreting the comparative contributions of individual units in deep neural networks // arXiv. 2019. arXiv:1904.00605v4. https://doi.org/10.48550/arXiv.1904.00605

37. Oramas J.M., Wang K., Tuytelaars T. Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks // arXiv. 2019. arXiv:1712.06302v3. https://doi.org/10.48550/arXiv.1712.06302

38. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead // arXiv. 2019. arXiv:1811.10154v3. https://doi.org/10.48550/arXiv.1811.10154

39. Samek W., Montavon G., Vedaldi A., Hansen L.K., Müller K.-R. Explainable AI: interpreting, explaining and visualizing deep learning // Lecture Notes in Computer Science. 2019. V. 11700. 439 p. https://doi.org/10.1007/978-3-030-28954-6

40. Hansen L.K., Rieger L. Interpretability in intelligent systems – A new concept? // Lecture Notes in Computer Science. 2019. V. 11700. P. 41–49. https://doi.org/10.1007/978-3-030-28954-6_3

41. Liao Q.V., Gruen D., Miller S. Questioning the AI: Informing design practices for explainable AI user experiences // arXiv. 2020. arXiv:2001.02478v2. https://doi.org/10.48550/arXiv.2001.02478

42. Holzinger A., Langs G., Denk H., Zatlouk K., Müller H. Causability and explainability of artificial intelligence in medicine // Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019. V. 9. N 4. P. e1312. https://doi.org/10.1002/widm.1312

43. Miller T. Explanation in artificial intelligence: insights from the social sciences // Artificial Intelligence. 2019. V. 267. P. 1–38. https://doi.org/10.1016/j.artint.2018.07.007

44. Kulesza T., Burnett M., Wong W., Stumpf S. Principles of explanatory debugging to personalize interactive machine learning // Proc. of the 20th International Conference on Intelligent User Interfaces (IUI ‘15). 2015. P. 126–137. https://doi.org/10.1145/2678025.2701399

45. Tintarev N. Explaining recommendations // Lecture Notes in Computer Science. 2007. V. 4511. P. 470–474. https://doi.org/10.1007/978-3-540-73078-1_67

46. Chrysostomou G., Alertas N. Improving the faithfulness of attentionbased explanations with task-specific information for text classification // arXiv. 2021. arXiv:2105.02657v2. https://doi.org/10.48550/arXiv.2105.02657

47. Vilone G., Longo L. Explainable artificial intelligence: A systematic review // arXiv. 2020. arXiv:2006.00093v3. https://doi.org/10.48550/arXiv.2006.00093

48. Papenmeier A., Englebienne G., Seifert C. How model accuracy and explanation fidelity influence user trust // arXiv. 2019. arXiv:1907.12652v1. https://doi.org/10.48550/arXiv.1907.12652

49. Harutyunyan H. Achille A., Paolini G., Majumder O., Ravichandran A., Bhotika R., Soatto S. Estimating informativeness of samples with smooth unique information // arXiv. 2021. arXiv:2101.06640v1. https://doi.org/10.48550/arXiv.2101.06640

50. Liu S., Wang X., Liu M., Zhu J. Towards better analysis of machine learning models: a visual analytics perspective // Visual Informatics. 2017. V. 1. N 1. P. 48–56. https://doi.org/10.1016/j.visinf.2017.01.006

51. Arrieta A.B., Díaz-Rodríguez N., Del Ser J., Bennetot A., Tabik S., Barbado A., García S., Gil-López S., Molina D., Benjamins R., Chatila R., Herrera F. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI // Information Fusion. 2020. V. 58. P. 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

52. Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation // Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. P. 580–587. https://doi.org/10.1109/CVPR.2014.81

53. Ancona M., Ceolini E., Özitreli C., Gross M. Towards better understanding of gradient-based attribution methods for deep neural networks // arXiv. 2018. arXiv:1711.06104v4. https://doi.org/10.48550/arXiv.1711.06104

54. Rumelhart D.E., Hinton G.E., Williams R.J. Learning internal representations by error propagation // Readings in Cognitive Science: A Perspective from Psychology and Artificial Intelligence. 2013. P. 399–421.

55. Kindermans P.-J., Hooker S., Adebayo J., Alber M., Schütt K.T., Dähne S., Erhan D., Kim B. The (Un) reliability of saliency methods // Lecture Notes in Computer Science. 2019. V. 11700. P. 267–280. https://doi.org/10.1007/978-3-030-28954-6_14

56. Roscher R., Bohn B., Duarte M.F., Garcke J. Explainable machine learning for scientific insights and discoveries // arXiv. 2020. arXiv:1905.08883v3. https://doi.org/10.48550/arXiv.1905.08883

Рецензия

Для цитирования:

Бирюков Д.Н., Дудкин А.С. Объяснимость и интерпретируемость — важные аспекты безопасности решений, принимаемых интеллектуальными системами (обзорная статья). Научно-технический вестник информационных технологий, механики и оптики. 2025;25(3):373-386. https://doi.org/10.17586/2226-1494-2025-25-3-373-386

For citation:

Biryukov D.N., Dudkin A.S. Explainability and interpretability are important aspects in ensuring the security of decisions made by intelligent systems (review article). Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2025;25(3):373-386. (In Russ.) https://doi.org/10.17586/2226-1494-2025-25-3-373-386

Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Научно-технический вестник информационных технологий, механики и оптики

Объяснимость и интерпретируемость — важные аспекты безопасности решений, принимаемых интеллектуальными системами (обзорная статья)

Полный текст:

Аннотация

Ключевые слова

Об авторах

Список литературы

Рецензия

Для цитирования:

For citation:

Использование куки-файлов