Papers
arxiv:2407.18877

Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection

Published on Jul 26
Authors:
,
Ge Li ,
,
,
,

Abstract

Different from the flow semantics of natural languages, programming languages are inherently rigid in structure and grammar. Existing fine-tuning methodologies for code vulnerability detection generally treat code as long text sequences, stripping away structural elements such as newlines ('/n') and whitespace. However, this approach inadvertently results in the loss of crucial structural information, diminishing the distinct characteristics of code and impairing the accuracy of vulnerability detection. To address these challenges, we propose a novel network architecture method based on pre-trained code models, which incorporates structural information awareness. We propose an enhanced code text processing workflow that retains structural elements prior to modeling. This refinement allows the model to retain and exploit line-level structural information and semantic information during the modeling process. Furthermore, we introduce a new network architecture, the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which integrates three key components: global vulnerability awareness, line-structural awareness, and sensitive-line awareness. We have conducted comprehensive experiments using vulnerability detection datasets from real-world projects. Extensive experiments were conducted on vulnerability detection datasets derived from real-world projects. The results demonstrate that our new code pre-processing flow significantly improves existing baselines (e.g., a 3\% accuracy improvement on the Devign dataset when applied to popular models such as CoderBert and UniXcoder). The proposed network architecture also demonstrates superior accuracy in detecting vulnerabilities, surpassing newly established benchmarks. These findings underscore the importance of structural information in enhancing the efficacy of code vulnerability detection models.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.18877 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.18877 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.18877 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.