I found this awesome article on Bug Prediction at Google. I've tried the bugspots app by Ilya Grigorik on Rails and Git repositories to feel its accuracy. I’ve picked these two repositories because they have a contributing guide and a large number of commit messages.
The algorithm works by parsing the commit message and determine if it is a bug fix commit. Then, it ranks the files by the number of bug fix commits issued against a file. The file with the most bug fix commits is usually the file with the most bugs. To work reliably, the algorithm needs to be able to distinguish a bug fix commit from the rest. I picked the abovementioned repositories because they have excellent contributing guide that defines a commit message standard. At Authentise, we use a commit message standard that looks like this:
Summarize changes in around 50 characters or less More detailed explanatory text up to about 72 characters. AE-1223 #time 1h 30m
We require all commit messages to have a long description, a task, and a time log. The format of the task and time log can vary depending on the issue tracker that you are using. In my example, I am using Atlassian Jira. I like to separate my pull request (PR) into small commits before sending it for code review. This allows the reviewer to look at each commit and approve or reject it. Using this approach, I may end up with a lot of commit messages that address just one bug and game the bug prediction algorithm.
Improve the accuracy
I’ve changed the initial bugspots application to only parse merge messages. From the merge message, I could source the task number (AE-1223). Using the task number, I can query Jira and find out the type of the task (Task, Bug, Improvement). This improved the final results A LOT.
At Authentise, we are using Slack for everyday communication and Sentry to track exceptions that affect our customers. Sentry has this neat feature that lets you resolve an exception and post a message as you resolve it. Engineers would mention the Jira task where the exception was resolved. I’ve included Sentry reports when determining the type of a Jira task but for all unresolved exceptions. I used the source file of the exception and incremented the bug fix counter by 2 instead of 1 to emphasize the weight of a Sentry exception over a bug commit message.
Authentise is a rewarding environment and we use Slack with Changetip to show our appreciation and to recognize our peers. I’ve further edited bugspots application to parse Changetip messages that included a task number and set the weight to 3 whenever the type of the task is Bug.
The tool is tightly integrated with Authentise toolstack but I’ve open sourced a repository named Fulcrum that will make it possible for anyone to use this tool. The goal is to login with your GitHub, BitBucket, GitLab account, add services that can be linked from a commit message and put a weight on each service to generate a bug prediction report.