PVE: A log parsing method based on VAE using embedding vectors期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

PVE: A log parsing method based on VAE using embedding vectors

Institution:	1. International College, Renmin University of China, No.58, Zhongguancun Street, Haidian District, Beijing 100872, China;2. School of Insurance, Guangdong University of Finance, 527 Yingfu Road, Tianhe District, Guangzhou Guangdong 510521, China;1. College of Economics and Management, Fujian Agriculture and Forestry University, Fuzhou 350002, China;2. School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;3. Business Administration Department, Applied College, Najran University, Najran, Saudi Arabia;4. Shariaa, Educational and Humanities Research Center (SEHRC), Najran University, Najran, Saudi Arabia;5. Department of Industrial & Systems Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia;6. Department of Industrial Engineering, College of Engineering in Al-Qunfudah, Umm Al-Qura University, Makkah 21955, Saudi Arabia;1. Information Research Institute of Qilu University of Technology (Shandong Academy of Sciences), Jinan, PR China;2. School of Management, Xi''an University of Architecture and Technology, Xi''an, PR China;3. School of Information and Control Engineering, Xi''an University of Architecture and Technology, Xi''an, PR China;4. University of Jinan, Jinan, PR China

Abstract:	Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.

Keywords:	Log parsing Variational auto-encoder Data analysis Software engineering
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏