Using AI to reduce the burden of MISRA – Embedded

Advertisement
Embedded.com
Advertisement
Introducing static code analysis tools to the development process can be a daunting task, especially for ongoing projects with an existing codebase. In addition to efforts to select a proper set of guidelines or establish workflows, teams often struggle with productivity drops caused by an initially large number of violations they need to address to claim compliance with the standard. The overwhelming number of violations can be typically grouped into clusters of findings that are caused by the same code constructs or originate from similar code patterns, even if they violate different guidelines. The classic approach to distributing static analysis results to developers, which is based on the authorship or project structure, turns out to be ineffective.
I’m going to share the results of internal research on how artificial intelligence and machine learning can reduce the effort of achieving MISRA compliance of state-of-the-art automotive software. We explore various machine learning techniques to cluster and sequence static analysis findings to optimize the remediation efforts.
Static analysis (SA) is an effective method of eliminating software bugs that may affect software safety, security, and reliability. Static analysis is also widely used as an error prevention technique for development of critical systems. As for prevention, there are several coding standards available in the industry, which provide guidance regarding the safe use of programming language features and constructs. MISRA C and MISRA C++ are among the most popular coding standards for C and C++ used in the development of safety-critical systems. Compliance with a coding standard such as MISRA is often a mandatory step in achieving overall compliance with functional safety standards such as ISO 26262, IEC 62304, EN 50128 and others.
However, introducing static code analysis and coding standard enforcement into the development process is a difficult task and a significant investment.
I propose a development workflow that includes several ML-based techniques for classification and prioritization of the static analysis findings to optimize the developers’ effort in achieving compliance with a coding standard such as MISRA. The experimental workflow includes the following classification and prioritization techniques:
The workflow discussed has been internally tested at Parasoft on the internal codebase. One of the metrics we selected for measuring productivity improvement was the average time spent on fixing a static analysis violation. The metric was computed as the total time spent by all developers when fixing violations, divided by the total number of violations that were addressed (either fixed or suppressed/deviated). The metric was also computed for each developer individually. The research and experiments were conducted without academic rigor hence we do not share detailed results, and we acknowledge that more research is required to fully assess the gain in developers’ productivity, but the initial results are very promising.
Enforcing compliance with a coding standard such as MISRA requires continuous monitoring of the codebase. In an ideal scenario, teams would start with an empty codebase and not allow merging of incompliant code. With this approach enforcing compliance is a simpler task, that is focused on controlling small increments of new code. In a real-life scenario, this is never true. Projects are started with a significant amount of pre-existing code that is frequently far from standards compliance. While there are some situations where pre-existing code is decided to be excluded from the compliance, in most cases development teams face the challenge of eliminating incompliance from inherited code and assuring that new code is created according to standard.
Typically, the violation backlog contains different categories of violations. Random selection of the problems to be fixed causes a lot of inefficiency in the compliance process and work duplication. Inefficiency in the development process distracts developers and hinders adoption of the compliance process.
The adoption of the compliance process and its effectiveness could be improved by introducing additional automated classification and prioritization techniques that use ML to post-process static analysis findings to group them into clusters and recommend the optimal order of addressing them and pre-assign problems to individuals in the team that are best skilled to fix them.
Research revealed or identified several aspects that can be used to automatically classify violations and pre-assign them to specific team members:
Below we discuss the value of distinguishing each category, describe classification algorithms and propose how to use this information to optimize the development workflows.
Static analysis findings that a team considers as noise are inevitable in the compliance process. They can be either false positive alarms reported due to analysis inaccuracy or legitimate issues that teams feel are acceptable and choose to deviate from the coding standard. SA noise is a significant problem that hinders the adoption of static analysis and delay’s reaction to important findings. Instead of fixing “strategic” issues which have the potential to propagate throughout the codebase first, developers process a homogenous violations queue, which includes low priority or ignorable findings mixed with critical issues. The ability to automatically distinguish “real” problems from “noise” highly improves developers’ productivity.
Parasoft’s static analysis solution uses ML and AI-based classification for grouping violations into “to be fixed” and “ignore” classes. The system uses violation meta-data such as its author, rule, module, branch, and a special characteristic extracted from the parse tree, as an input for a model. The model is trained, either through a dedicated manual session, where the developer triages the problems, or in an automated mode, where historical data is used. Once the model is sufficiently trained, it is used to automatically classify the violations. By default, the system uses RandomForst classifier, which can be changed to other classifiers like AdaBoost or BaggingClassifier in the solution’s configuration. The output from the classifier, where “to be fixed” violations are prioritized over “ignored”, is passed to the next step in the violation processing chain.
In most development teams, individual developers have their areas of expertise or technical profiles. Usually, this is related to their previous experience. For example, developer A may be an expert in POSIX threads programming while developer B can be very well versed in UDP sockets.
At the same time in the violations backlog, there may be clusters of violations which are in the code related to POSIX threads or UDP sockets or somehow related to these subjects. Similarly, there may be coding guidelines requiring specific knowledge.
We put forward a hypothesis that by observing developers when fixing problems, we can build individual profiles that reflect their skills and technical expertise. This process includes clustering violations and allowing users to rate them as “like” or “dislike” when working on specific problems randomly assigned from different clusters.
Profiles building happens in the initial phase when the AI model is trained. This approach was inspired by the popular multimedia streaming platforms.
With user technical-skill profiles ready, the system prioritizes new violations with an understanding of whether a specific problem is a good match for the given developer. Obviously, the assumption is that developers are more productive and faster when working on the problems that belong to their area of expertise. The filtering algorithm is based on matrix factorization approach. The productivity gain of this approach is obvious, especially when working with large codebases and teams.
There are two potential problems that we anticipate though. First, some users may get annoyed when constantly getting the same type of problems to be fixed, and second, this approach does not promote self-education and agility in the development teams. In our local experiments, we did not observe these kinds of problems, but it is obvious, that the unorganized distribution of static analysis findings reduces fatigue and forces developers to learn random sections of the code and broaden their expertise.
More research is required in this area, but we believe that these negative effects can be mitigated by introducing a certain percentage of randomly selected violations to developer-specific, automatically prioritized queues.
Another trait that can be used for grouping the violations is the root cause. For a subset of rules, it is possible that they generate violations because of the same code construct. Finding clusters of violations that can be removed by eliminating one and the same problem in the code helps developers to maximize ROI and quickly reduce the size of the violation backlog, which in turn simplifies compliance process management.
In our experimental workflow, we have applied Parasoft Static Analysis Solution’s existing functionality for identifying clusters of violations with the same root cause. The system is available now for a specific sub-class of violations identified by data and control flow analysis. These findings are reported with a stack trace documenting all steps between the root cause and the location in the code where the problem surfaces. This additional meta-data related to stack trace, simplifies comparing violations and identifying those that are caused by the same construct. We have experimented with different approaches to compare the violations and decide if they share the root cause. Our experiments included AI-based algorithms. Finally, a classic approach was selected, where stack traces of the violations are compared, with some heuristics to simplify comparing alternative paths.
In our experiment, we assumed that the cluster needs to have more than five violations. The most numerous clusters of violations we found in our internal codebase had several dozen findings. The prioritization engine orders clusters according to the number of violations and passes them to the next phase in the processing chain.
Fixing a static analysis violation requires an understanding of the source code around the place where the problem was reported. This is a critical step to make a safe modification to the code. Getting this understanding and creating a mental image of the source code is sometimes a demanding and time-consuming activity. When processing many violations in a session, users need to jump from one context to another context rebuilding their understanding of different code blocks. This is highly inefficient.
In our workflow, we have built a prototype of a system that analyzes the code block around a selected violation and scans other violations selecting those which are in similar code blocks. Violations located in similar code blocks are suggested as the next tasks for the developer during the working session. The functionality allows developers to fix more than one problem with less time spent on the time-consuming code analysis.
Our system was built on the code2vec Open Source, a general-purpose model. The model is pre-trained with publicly available codebases. Code2vec is used to vectorize a given code block and convert it into a vector containing a couple of hundreds of floating-point numbers. This conversion is performed once the developer fixes the first violation during the session. Then the system is queried for other violations with similar vectors representing the source. If there are other violations in the queue with close enough source code vectors, they are suggested to the developer as the next tasks in the working session.
To further improve the accuracy and effectiveness of the system, we have experimented with an extension of the algorithm which included violation similarity, in addition to the source code similarity, when calculating the “proximity” of the next violation to be suggested to the developer. The assumption was that if there are two violations which are similar, and they are in the code blocks that are similar, it is even easier to fix them in a row. This algorithm’s extension requires additional input of groups of static analysis rules that are considered similar. For this purpose, we have used the original chapters from the MISRA C 2012 standard as groupings, like chapter “8.6 Types” with two rules, chapter “8.7 Literals and constants” with four rules and chapter “8.8 Declarations and definitions” with fourteen rules. In our research, rules from the same chapters of the standard were considered as similar.
The results of this extension are very promising, and the quality of suggestions is visibly higher, as compared to the initial approach where the similarity of only code block is considered. There is more research and work required though to better divide coding standard’s rules into groups.
In the workflow that we have internally tested at Parasoft, we combine the techniques described in Part A above and Part B, in the chain for processing, classifying and distributing violations. We use Parasoft products’ existing functionality for techniques described in Part A and Part C, and we have built a prototype solution for techniques described in Part B and Part D. Static analysis with described experimental workflow, is used on the internal C/C++ codebase containing non-safety critical code.
The experimental workflow is fully functional, and it enables developers to work on static analysis findings that are automatically pre-assigned to them. The AI models used for classifications were manually trained on an initial subset of randomly selected violations, in the training phase of the workflow.
In the first step of the workflow, violations are classified as “to be fixed” and “to be ignored” according to the algorithms described in Part A. Violations classified as “to be fixed” are prioritized over those classified as “to be ignored”, to assure that developers focus on important findings first.
In the second step of the workflow, the system groups violations into queues pre-assigned to specific team members based on their skills and experience, as described in Part B. This step ensures that developers focus on problems that they are most effective with. In our experimental classification and prioritization chain, we put this block on the second position, right after the initial step where the “noise” is filtered. Output from the first step is developer agnostic. Starting from the second step, the system maintains separate queues of violations for each team member.
In the third step, the system groups violations in the individual developers’ queues into clusters of items having the same root cause as described in Part C. This step maximizes the return on investment.
In the last step of the workflow, the system observes developers’ activity while they are fixing problems and looks for other findings in their queues that are in the code similar to the code that was just fixed. This step optimizes developers’ productivity.
The combined chain feeds violations to individual developers’ trays as they are working on the findings. Static analysis violations are enabled to developers via the web-based interface. We have also enabled import to the VSCode editor. In our research, we have introduced a training phase for the workflow where developers pre-train the AI model used for techniques described in Part A, B and C.
We have focused on accessing the productivity improvement in achieving compliance with the MISRA coding standard, using the metric described in this document. We compared the metric values computed for a basic static analysis solution and our experimental setup.
Depending on the developer, we observed a 21-28% drop in the average time required to fix or suppress the problem. The average reduction of the time required to fix a single violation for the entire team was 23%. The initial results are very promising, and the decision was made to productize the experimental setup. I plan to continue the research to precisely quantify the productivity gain.
Related Contents:
For more Embedded, subscribe to Embedded’s weekly email newsletter.

Advertisement
You must Sign in or Register to post a comment.
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Advertisement
Advertisement
Advertisement

source
—
Note that any programming tips and code writing requires some knowledge of computer programming. Please, be careful if you do not know what you are doing…

Leave a ReplyCancel reply