首页 | 本学科首页   官方微博 | 高级检索  
     检索      


PVE: A log parsing method based on VAE using embedding vectors
Institution:1. International College, Renmin University of China, No.58, Zhongguancun Street, Haidian District, Beijing 100872, China;2. School of Insurance, Guangdong University of Finance, 527 Yingfu Road, Tianhe District, Guangzhou Guangdong 510521, China;1. College of Economics and Management, Fujian Agriculture and Forestry University, Fuzhou 350002, China;2. School of Management, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;3. Business Administration Department, Applied College, Najran University, Najran, Saudi Arabia;4. Shariaa, Educational and Humanities Research Center (SEHRC), Najran University, Najran, Saudi Arabia;5. Department of Industrial & Systems Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia;6. Department of Industrial Engineering, College of Engineering in Al-Qunfudah, Umm Al-Qura University, Makkah 21955, Saudi Arabia;1. Information Research Institute of Qilu University of Technology (Shandong Academy of Sciences), Jinan, PR China;2. School of Management, Xi''an University of Architecture and Technology, Xi''an, PR China;3. School of Information and Control Engineering, Xi''an University of Architecture and Technology, Xi''an, PR China;4. University of Jinan, Jinan, PR China
Abstract:Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.
Keywords:Log parsing  Variational auto-encoder  Data analysis  Software engineering
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号