Preview

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Advanced search

Advanced methods for knowledge injection in large language models

https://doi.org/10.17586/2226-1494-2024-24-4-588-593

Abstract

Transformer-based language models have revolutionized Natural Language Processing tasks, with advancements in language modeling techniques. Current transformer architectures utilize attention mechanisms to model text dependencies effectively. Studies have shown that these models embed syntactic structures and knowledge, explaining their performance in tasks involving syntactic and semantic elements. However, transformer-based models are prone to hallucination where incorporated knowledge is not utilized effectively. To address this, methods are emerging to mitigate hallucination and integrate external knowledge sources like knowledge graphs (e.g., Freebase, WordNet, ConceptNet, ATOMIC). Knowledge graphs represent real-world knowledge through entities and relationships offering a potential injection point to enhance model performance in inference tasks. Various injection approaches, including input, architectural, and output injections, aim to incorporate knowledge from graphs into transformer models. Input injections modify data preprocessing, architectural injections add layers for knowledge integration, and output injections adjust error functions to correct knowledge incorporation during training. Despite ongoing research, a universal solution to hallucination remains elusive, and a standardized benchmark for comparing injection methods is lacking. This study investigates knowledge graphs as one of the methods to mitigate hallucination and their possible integration into Large Language Models. Comparative experiments across General Language Understanding Evaluation benchmark tasks demonstrated that ERNIE 3.0 and XLNet outperform other injection methods with the average scores of 91.1 % and 90.1 %.

About the Authors

N. I. Kulin
ITMO University
Russian Federation

Nikita I. Kulin — PhD Student

Saint Petersburg, 197101



S. B. Muravyov
ITMO University
Russian Federation

Sergey B. Muravyov — PhD, Associate Professor

Saint Petersburg, 197101



References

1. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, vol. 30.

2. Devlin J., Chang M.-W., Lee K., Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. V. 1 (Long and Short Papers), 2019, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423

3. Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R.R., Le Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 2019, vol. 32.

4. Colon-Hernandez P., Havasi C., Alonso J., Huggins M., Breazeal C. Combining pre-trained language models and structured knowledge. arXiv, 2021, arXiv:2101.12294. https://doi.org/10.48550/arXiv.2101.12294

5. Ye Z.-X., Chen Q., Wang W., Ling Z.-H. Align, mask and select: A simple method for incorporating commonsense knowledge into language representation models. arXiv, 2019, arXiv:1908.06725. https://doi.org/10.48550/arXiv.1908.06725

6. Lauscher A., Majewska O., Ribeiro L.F.R., Gurevych I., Rozanov N., Glavaš G. Common sense or world knowledge? investigating adapterbased knowledge injection into pretrained transformers. Proc. of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, 2020, pp. 43–49. https://doi.org/10.18653/v1/2020.deelio-1.5

7. Peters M.E., Neumann M., Logan R., Schwartz R., Joshi V., Singh S., Smith N.A. Knowledge enhanced contextual word representations. Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 43–54. https://doi.org/10.18653/v1/D19-1005

8. Houlsby N., Giurgiu A., Jastrzebski S., Morrone B., De Laroussilhe Q., Gesmundo A., Attariyan M., Gelly S. Parameterefficient transfer learning for NLP. International Conference on Machine Learning, PMLR, 2019, vol. 97, pp. 2790–2799.

9. Singh P., Lin T., Mueller E.T., Lim G., Perkins T., Zhu W.L. Open mind common sense: Knowledge acquisition from the general public. Lecture Notes in Computer Science, 2002, vol. 2519, pp. 1223–1237. https://doi.org/10.1007/3-540-36124-3_77

10. Balažević I., Allen C., Hospedales T.M. TuckER: Tensor factorization for knowledge graph completion. Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 5185–5194. https://doi.org/10.18653/v1/D19-1522

11. Zhang Z., Wu Y., Zhao H., Li Z., Zhang S., Zhou X., Zhou X. Semantics-aware BERT for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 05, pp. 9628–9635. https://doi.org/10.1609/aaai.v34i05.6510

12. Lauscher A., Vulić I., Ponti E.M., Korhonen A., Glavaš G. Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity. Proc. of the 28th International Conference on Computational Linguistics (COLING 2020), 2020, pp. 1371–1383. https://doi.org/10.18653/v1/2020.coling-main.118

13. He B., Zhou D., Xiao J., Jiang X., Liu Q., Yuan N.J., Xu T. BERTMK: Integrating graph contextualized knowledge into pre-trained language models. Proc. of the Findings of the Association for Computational Linguistics (EMNLP 2020), 2020, pp. 2281–2290. https://doi.org/10.18653/v1/2020.findings-emnlp.207

14. Liu W., Zhou P., Zhao Z., Wang Z., Ju Q., Deng H., Wang P. K-bert: Enabling language representation with knowledge graph. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34. N 03, pp. 2901–2908. https://doi.org/10.1609/aaai.v34i03.5681

15. Yao L., Mao C., Luo Y. KG-BERT: BERT for knowledge graph completion. arXiv, 2019, arXiv:1909.03193. https://doi.org/10.48550/arXiv.1909.03193

16. Wang R., Tang D., Duan N., Wei Z., Huang X., Ji J., Cao G., Jiang D., Zhou M. K-adapter: Infusing knowledge into pre-trained models with adapters. Proc. of the Findings of the Association for Computational Linguistics (ACL-IJCNLP 2021), 2021, pp. 1405–1418. https://doi.org/10.18653/v1/2021.findings-acl.121

17. Sun Y., Wang S., Feng S., Ding S., Pang C., Shang J., Liu J., Chen X., Zhao Y., Lu Y., Liu W., Wu Z., Gong W., Liang J., Shang Z., Sun P., Liu W., Ouyang X., Yu D., Tian H., Wu H., Wang H. Ernie 3.0: Largescale knowledge enhanced pre-training for language understanding and generation. arXiv, 2021, arXiv:2107.02137. https://doi.org/10.48550/arXiv.2107.02137

18. Lv S., Guo D., Xu J., Tang D., Duan N., Gong M., Shou L., Jiang D., Cao G., Hu S. Graph-based reasoning over heterogeneous external knowledge for commonsense question answering. Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, no. 05, pp. 8449–8456. https://doi.org/10.1609/aaai.v34i05.6364

19. Wang A., Singh A., Michael J., Hill F., Levy O., Bowman S. GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proc. of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018, pp. 353–355. https://doi.org/10.18653/v1/W18-5446


Review

For citations:


Kulin N.I., Muravyov S.B. Advanced methods for knowledge injection in large language models. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024;24(4):588-593. https://doi.org/10.17586/2226-1494-2024-24-4-588-593

Views: 17


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2226-1494 (Print)
ISSN 2500-0373 (Online)