Data Sets



This section proposes various software-related open evolutionnary data sets. Open data sets are important, because they allow easy experimentation, learning and research. They also provide a good foundation for the reproducibility of research works.

The specificities of these data sets are numerous. Firstly, they are really easy to use: flat csv files that can be imported (e.g. in R) with one single line. Secondly, the metrics provided include uncommon measures like rule-checking, mailing lists and configuration management data on top of the more classical source code measures. Finally, they provide three consistent layers of information for each version of software: application, files, and functions.

In application data sets, each line describes a version of the full product. Included metrics are code, configuration management and mailing lists.
In file-level data sets, each line describes a file in the product. Included metrics are code and configuration management.
In function-level data sets, each line describes a function in the product. Only includes metrics from code.

Included Metrics

The data sets define 159 metrics, including: 20 from code, 16 from configuration management, 30 from change management, 93 from rule-checking tools. See their description here.

Checked Rules

Rule-checking tools look for common anti-patterns in the code. They allow to follow the evolution of development practices along years. The proposed data sets include rules from PMD (58 rules), CheckStyle (39 rules) and SQuORE (21 rules).



Ant weekly:

636
apps
654K
files
6 887K
funcs


Ant releases:

23
apps
18K
files
194K
funcs

Ant Data Sets

These data sets have been produced in the course of the Maisqual project, and are used to demonstrate some of the algorithms developed. They feature a hundred metrics on various versions of the Apache Ant project, in a neat csv format for easy use.

The Ant data set has also been featured in the MSR 2014 data track held in Hyderabad, India. You can download the article describing the data set here, and the poster used during the conference here.



JMeter weekly:

655
apps
360K
files
3 376K
funcs


JMeter releases:

23
apps
17K
files
168K
funcs

JMeter Data Sets

JMeter is another famous project from the Apache foundation, widely used for http components testing. Both releases and weekly snapshots are provided.