How Healthy is Your Codebase? Introducing Biomarkers for Code

We at Empear make heavy use of CodeScene ourselves. We use the tool as part of our services. Over the past years we have analyzed hundreds of different codebases, and there are some patterns that we have seen repeated over and over again. Thus, we have started to implement support in CodeScene for auto-detecting those patterns, and we have called the feature code biomarkers. We chose that name because we wanted to avoid terms like “quality” or “maintenance effort” since they suggest an absolute truth; instead, we wanted a concept that doesn’t judge, but acts like a friendly, unbiased, and skilled team member.

Detect Your Code’s Biomarkers

In medicine, a biomarker is a measure that might indicate a particular disease or physiological state of an organism. CodeScene’s biomarkers do the same for code. Combined with our biomarker trend measures, you get a high-level summary of the state of your hotspots and the direction your code is moving in. Code biomarkers act like a virtual code reviewer that looks for patterns that might indicate problems.

The Code Biomarkers shows the status of your hotspots at a glance.

Code biomarkers are scored from A to E where A is the best and E indicates code with severe potential problems. CodeScene also aggregates those scores into a total score for the whole project. This lets you keep track of the overall status. As an example, the next figure shows a particular codebase that has improved over the past month, indicated by the move from a D score to a C.

Code Biomarkers summary on the analysis dashboard.

I spend a lot of time reviewing code, and over the years I’ve learned to look for certain high-level patterns that are likely to indicate problematic designs. Our goal with the biomarkers concept is to automate that pattern detection. Hence, you can click on a hotspot and inspect the biomarkers in detail:

Detailed Biomarker information for a specific hotspot.

The detailed information is intended to help developers select appropriate refactoring steps. For example, if we consider the previous figure, we note that that hotspots contains a large Brain Method, GenerateInput. A brain method is simply a large function with high complexity that seems to do too many things. Modularizing the design by splitting the method into smaller, well-named methods with clear responsibilities is likely to improve the design by making it easier to read and understand the overall algorithm.

Biomarkers introduce short Feedback Loops

In large-scale systems, social factors tend to be at least as important as any technical issues you might have. In fact, as I wrote in my book, we often mistake organizational problems for technical issues. Hence, we have developed biomarkers that detect organizational issues that are known to correlate with unwanted properties like defects and low organizational system mastery. The next figure shows an example:

Social Biomarker indication found in a specific hotspot.

All together, code biomarkers fill a number of important gaps by providing feedback loops in an organization:

  • Bridge the gap between developers and non-technical stakeholders: The biomarkers help you decide when it’s time to take a step back and invest in technical improvements, versus when it’s OK to continue to add features at a high pace.
  • Get immediate feedback on improvements: Biomarker trends give you immediate and visual feedback on the investments you make in refactorings. Not only is it motivating – it also helps ensure that you’re on track.
  • Share an objective picture of your codebase: A successful project is one where everyone has a shared understanding of what the code looks like and how it evolves. CodeScene provides an additional monitor view where the biomarkers are continuously updated with the status of your ongoing work. Present the view on a TV in the office, as shown in the next figure, to create awareness of your technical debt.

Display an always up-to-date view of your biomarkers in the office.

Integrate Code Biomarkers in your Continuous Integration Pipeline

CodeScene offers integration points that let you incorporate the analysis results into your build pipeline. We have expanded that integration to also auto-detect files that seem to degrade in quality through issues introduced in the current commit or pull request. This is done by calculating code biomarkers, which are then supervised for their trend. The next figure shows an example by using CodeScene’s Jenkins plugin.

A delta analysis, triggered from Jenkins, detects degrading biomarkers.

Metrics must be Actionable

Biomarker scores use baseline data from thousands of codebases, and your code is scored against an industry average of similar codebases. The biomarkers concept is built on top of CodeScene’s other metrics and behavioral data. That means we only score the prioritized parts of the codebase, the parts that are most likely to impact development and maintenance costs.

Biomarkers are built on top of CodeScene's prioritized hotspots.

Hence, the biomarkers provide insights into the parts of your code that are most likely to benefit from improvements.

Get Code Biomarkers for Your own Code

The code biomarkers feature is available in the latest release of CodeScene on-prem. Contact us if you want to know more and get started with CodeScene – your code is worth it.

Adam Tornhill avatar.
comments powered by Disqus