Source Code Analytics

Oleksandr Panchenko, Hasso Plattner, and Alexander Zeier

Keywords

reverse engineering, business analytics, source code analytics, source code repository, performance engineering

Abstract

To perform reverse engineering tasks software engineers often are compelled to analyze a number of atomic facts that have been extracted from source code. Examples of such atomic facts are occurrences of certain patterns in code, software product metrics or dependencies between software components. Each fact typically has several characteristics, such as the type of the fact, location in code where found, and some attributes. Particularly, analysis of large software systems requires the ability to process a large amount of such facts efficiently. To manage such a large number of facts software engineers typically select a subset of those facts based on one or several characteristics and aggregate these based on one or several other characteristics. This paper shows that the data structure to represent these facts and the way software engineers work with the data are similar to those used in business analytics systems. This paper proposes a new approach based on the principles of business analytics systems to support reverse engineering tasks. Several possible application scenarios are discussed and illustrated with real examples from a large reverse engineering project.

Important Links:



Go Back