Lightweight approach for malicious domain detection using machine learning
https://doi.org/10.17586/2226-1494-2022-22-2-262-268
Abstract
The web-based attacks use the vulnerabilities of the end users and their system and perform malicious activities such as stealing sensitive information, injecting malwares, redirecting to malicious sites without their knowledge. Malicious website links are spread through social media posts, emails and messages. The victim can be an individual or an organization and it creates huge money loss every year. Recent Internet Security report states that 83 % of systems in the internet are infected by the malware during the last 12 months due to the users who do not aware of the malicious URL (Uniform Resource Locators) and its impacts. There are some methods to detect and prevent the access malicious domain name in the internet. Blacklist-based approaches, heuristic-based methods, and machine/deep learning-based methods are the three categories. This study provides a machine learning-based lightweight solution to classify malicious domain names. Most of the existing research work is focused on increasing the number of features for better classification accuracy. But the proposed approach uses fewer number of features which include lexical, content based, bag of words, popularity features for malicious domain classification. Result of the experiment shows that the proposed approach performs better than the existing one.
About the Authors
G. PradeepaRussian Federation
Ganesan Pradeepa — Research Scholar
Pallavaram, Chennai, 600117
R. Devi
Russian Federation
Radhakrishnan Devi — Associate Professor
Pallavaram, Chennai, 600117
sc 57195412460
References
1. Warburton D. 2020 Phishing and Fraud Report [Электронный ресурс]. URL: https://www.f5.com/labs/articles/threatintelligence/2020-phishing-and-fraud-report (дата обращения: 11.11.2020).
2. Saleem Raja A., Vinodini R., Kavitha A. Lexical features based malicious URL detection using machine learning techniques // Materials Today: Proceedings. 2021. V. 47. Part 1. P. 163–166. https://doi.org/10.1016/j.matpr.2021.04.041
3. Pradeepa G., Devi R. Review of malicious URL detection using machine learning // Advances in Intelligent Systems and Computing. 2021. V. 1397. P. 97–105. https://doi.org/10.1007/978-981-16-5301- 8_7
4. Joshi A., Lloyd L., Westin P., Seethapathy S. Using lexical features for malicious URL detection - a machine learning approach // arXiv. 2019. arXiv:1910.06277.
5. Tupsamudre H., Singh A.K., Lodha S. Everything is in the name — a URL based approach for phishing detection // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019. V. 11527. P. 231–248. https://doi.org/10.1007/978-3-030-20951-3_21
6. Sahoo D., Liu C., Hoi S.C.H. Malicious URL Detection using Machine Learning: A Survey // arXiv. 2017. arXiv:1701.07179.
7. Ma J., Saul L.K., Savage S., Voelker G.M. Identifying suspicious URLs: an application of large-scale online learning // Proc. of the 26th International Conference on Machine Learning (ICML). 2009. P. 681– 688. https://doi.org/10.1145/1553374.1553462
8. Kevin McGrath D., Gupta M. Behind phishing: An examination of phisher modi operandi // Proc. of the 1st USENIX Workshop on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET). 2008.
9. Hou Y.-T., Chang Y., Chen T., Laih C.-S., Chen C.-M. Malicious web content detection by machine learning // Expert Systems with Applications. 2010. V. 37. N 1. P. 55–60. https://doi.org/10.1016/j.eswa.2009.05.023
10. Fu A.Y., Liu W., Deng X. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD) // IEEE Transactions on Dependable and Secure Computing. 2006. V. 3. N 4. P. 301–311. https://doi.org/10.1109/TDSC.2006.50
11. Sahingoz O.K., Buber E., Demir O., Diri B. Machine learning based phishing detection from URLs // Expert Systems with Applications. 2019. V. 117. P. 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
12. Patgiri R., Katari H., Kumar R., Sharma D. Empirical study on malicious URL detection using machine learning // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019. V. 11319. P. 380–388. https://doi.org/10.1007/978-3-030-05366-6_31
13. Xuan C.D., Nguyen H.D., Tisenko V.N. Malicious URL detection based on machine learning // International Journal of Advanced Computer Science and Applications (IJACSA). 2020. V. 11. N 1. http://doi.org/10.14569/IJACSA.2020.0110119
14. Catak F.O., Sahinbas K., Dörtkardeş V. Malicious URL detection using machine learning // Artificial Intelligence Paradigms for Smart Cyber-Physical Systems. 2021. P. 21. https://doi.org/10.4018/978-1-7998-5101-1.ch008
15. Butnaru A., Mylonas A., Pitropakis N. Towards lightweight URLbased phishing detection // Future Internet. 2021. V. 13. N 6. P. 154. https://doi.org/10.3390/fi13060154
16. Browniee J. How to choose a feature selection method for machine learning [Электронный ресурс] . URL : https://machinelearningmastery.com/feature-selection-with-real-andcategorical-data/ (дата обращения: 20.08.2020).
Review
For citations:
Pradeepa G., Devi R. Lightweight approach for malicious domain detection using machine learning. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2022;22(2):262-268. https://doi.org/10.17586/2226-1494-2022-22-2-262-268