Data Sets Rules

Rules are associated to practices that have an impact on some characteristic of quality. We list hereafter the rule checks that we decided to include in our data sets, with information about their associated practices and further references. The categories defined in the rule sets are mapped to the ISO 9126 decomposition of quality (analysability, changeability, stability, testability, etc.) and to development concerns (programming techniques, architecture, etc.).

Many of these rules can be linked to coding conventions published by standardisation organisms like MISRA (the Motor Industry Software Reliability Association, which C rule set is well spread) or CERT (Carnegie Mellon’s secure coding instance). They usually give a ranking on the remediation cost and the severity of the rule. There are also language-specific coding conventions, as is the case with Sun’s coding conventions for the Java programming language [171].

Percentage of  acquired practices along years for Apache Ant

In the proposed data sets, there are 39 rules from Checkstyle, 58 rules from PMD, and 21 rules from SQuORE. We define below only a subset of them, corresponding to the most common and prejudicial errors. In the data sets, conformity to rules is displayed as a number of violations of the rule for the given artefact.

SQuORE

SQuORE rules apply to C, C++ and Java code. The following families of rules are defined: fault tolerance (2 rules), analysability (7 rules), maturity (1 rule), stability (10 rules), changeability (12 rules) and testability (13 rules). The most common checks are:

  • No fall through ( R_NOFALLTHROUGH ). There shall be no fall through the next case in a switch statement. It threatens analysability of code (one may not understand where execution goes through) and stability (one needs to modify existing code to add a feature). It is related to rules Cert MSC17-C, Cert MSC18-CPP, Cert MSC20-C and Misra-C (2004) 15-2.
  • Compound if ( R_COMPOUNDIFELSE ). An if (expression) construct shall be followed by a compound statement. The else keyword shall be followed by either a compound statement, or another if statement. It is related to Misra-C (2004) rule 14.9.
  • Final else ( R_ELSEFINAL ). All if ... else if constructs shall be terminated with an else clause. This impacts changeability, because developers can identify quickly what is the default treatment, and fault tolerance because unseen cases are caught. It is related to Misra-C (2004) rule 14.10.
  • No multiple breaks ( R_SGLBRK ). For any iteration statement there shall be at most one break statement used for loop termination. It is related to Misra-C (2004) rule 14.6 and impacts the analysability (developers will have trouble understanding the control flow) and testability (more paths to test) of code.
  • No goto ( R_NOGOTO ). Gotos are considered bad practice (Misra-C (2004) rule 14.4) and may be hazardous (see CERT MSC35-CPP): they threaten the analysability of code, because one needs to scroll through the file instead of following a clear sequence of steps, and makes the test cases harder to write. Similarly, the no backward goto ( R_BWGOTO ) rule searches for goto operators that link to code that lies before the goto. Backward gotos shall not be used. They shall be rewritten using a loop instead.
  • No assignment in Boolean ( R_NOASGINBOOL ). Assignment operators shall not be used in expressions that yield a boolean value. It is related to Cert EXP45-C, Cert EXP19-CPP and Misra-C (2004) rule 13.1.
  • No assignment in expressions without comparison ( R_NOASGCOND ). Assignment operators shall not be used in expressions that do not contain comparison operators.
  • Case in switch ( R_ONECASE ). Every switch statement shall have at least one case clause. It is releated to Misra-C (2004) rule 15.5.
  • Label out of a switch ( R_NOLABEL ). A switch label shall only be used when the most closely-enclosing compound statement is the body of a switch statement. It is related to Misra-C (2004) rule 15.1.
  • Missing Default ( R_DEFAULT ). The final clause of a switch statement shall be the default clause. It is related to Cert MSC01-C, Cert MSC01-CPP and Misra-C (2004) rule 15.3.
  • Code before first case ( R_NOCODEBEFORECASE ). There shall be no code before the first case of a switch statement. It is related to Cert DCL41-C.

Checkstyle

We identified 39 elements from the Checkstyle 5.6 rule set, corresponding to useful practices generally well adopted by the community. The quality attributes impacted by these rules are: analysability (23 rules), reusability (11 rules), reliability (5 rules), efficiency (5 rules), testability (3 rules), robustness (2 rules) and portability (1 rule). All rules are described on the project page at http://checkstyle.sourceforge.net/config.html.

  • Javadoc checks ( JAVADOCMETHOD , JAVADOCPACKAGE , JAVADOCTYPE , JAVADOCVARIABLE ) ensure that Javadoc comments are present at the different levels of information as it is recommended by Sun [171].
  • Import checks: UNUSEDIMPORTS looks for declared imports that are not used in the file. They clutter space and are misleading for readers. AVOIDSTARIMPORT checks that no generic import is used; specific import statements shall be used to help the reader grasp what is inherited in the file. REDUNDANTIMPORT looks for duplicates in imports, which uselessly takes up visual space.
  • Equals Hash Code ( EQUALSHASHCODE ). A class that overrides equals() shall also override hashCode(). A caller may use both methods without knowing that one of them has not been modified to fit the behaviour intended when modifying the other one (consistency in behaviour). It impacts reusability, reliability and fault tolerance.
  • No Hard Coded Constant ( MAGICNUMBER ). Hard coded constant or magic numbers shall not be used. Magic numbers are actual numbers like 27 that appear in the code that require the reader to figure out what 27 is being used for. One should consider using named constants for any number other than 0 and 1. Using meaningful names for constants instead of using magic numbers in the code makes the code self-documenting, reducing the need for trailing comments. This rule is related to programming technics and changeability.
  • Illegal Throws ( ILLEGALTHROWS ) lists exceptions that are illegal or too generic; throwing java.lang.Error or java.lang.RuntimeException is considered to be almost never acceptable. In these cases a new exception type shall be defined to reflect the distinctive features of the throw.
  • New line at end of file ( NEWLINEATENDOFFILE ). Checks that there is a trailing newline at the end of each file. This is an ages-old convention, but many tools still complain when they find no trailing newline. Examples include diff or cat commands, and some SCM systems like CVS will print a warning when they encounter a file that does not end with a newline.
  • Anonymous Inner Length ( ANONINNERLENGTH ). Checks for long anonymous inner classes. For analysability reasons these should be defined as self-standing classes if they embed too much logic.
  • Multiple String Literals ( MULTIPLESTRINGLITERALS ) checks for multiple occurrences of the same string literal within a single file. It should be defined as a constant, both for reusability and changeability, so people can change the string at first shot without forgetting occurrences.

PMD

We selected 58 rules from the PMD 5.0.5 rule set. These are related to the following quality attributes: analysability (26 rules), maturity (31 rules), testability (13 rules), changeability (5 rules), and efficiency (5 rules). The full rule set is documented on the PMD web site: http://pmd.sourceforge.net/pmd-5.0.5/rules/.

  • Jumbled incrementer ( JUMBLEDINCREMENT ) detects when a variable used in a structure is modified in a nested structure. One shall avoid jumbled loop incrementers — it is usually a mistake, and even when it is intended it is confusing for the reader.
  • Return from finally block ( RETURNFROMFINALLYBLOCK ). One shall avoid returning from a finally block since this can discard exceptions. This rule has an effect on analysability (developers will have trouble understanding where the exception comes from) and fault tolerance (the return method in the finally block may be stopping the exception that happened in the try block from propagating up even though it is not caught). It is related to the Cert ERR04-J rule.
  • Unconditional if statements ( UNCONDITIONALIFSTATEMENT ), empty if statements ( EMPTYIFSTMT ), empty switch statements ( EMPTYSWITCHSTATEMENTS ), empty synchronized block ( EMPTYSYNCHRONIZEDBLOCK ), and empty while statements ( EMPTYWHILESTMT ) are useless and clutter code. They impact analysability – developers will spend more time trying to understand what they are for, and they may have undesirable side effects. As for the empty while statements, if it is a timing loop then Thread.sleep() is better suited; if it does a lot in the exit expression then it should be rewritten to make it clearer. All these are related to Cert MSC12-C.
  • Empty catch blocks ( EMPTYCATCHBLOCK ) are instances where an exception is caught, but nothing is done. In most circumstances an exception should either be acted on or reported, as examplified in the Cert ERR00-J rule. Empty try blocks ( EMPTYTRYBLOCK ) and empty finally blocks ( EMPTYFINALBLOCK ) serve no purpose and should be remove because they clutter the file’s analysability.
  • The God Class ( GODCLASS ) rule detects the God Class design flaw using metrics. God classes do too many things, are very big and overly complex. They should be split apart to be more object-oriented. The rule uses the detection strategy described in Object-Oriented Metrics in Practice [118].
  • Avoid Catching Throwable ( AVOIDCATCHINGTHROWABLE ). CatchingThrowable errors is not recommended since its scope is very broad. It includes runtime issues such as OutOfMemoryError that should be exposed and managed separately. It is related to the ERR07-J rule.
  • Avoid Catching NPE ( AVOIDCATCHINGNPE ) Code should never throw NullPointerExceptions under normal circumstances. A catch block for such an exception may hide the original error, causing other, more subtle problems later on. It is related to the Cert ERR08-J rule.
  • Non Thread Safe Singleton ( NONTHREADSAGESINGLETON ) Non-thread safe singletons can result in bad, unpredictable state changes. Static singletons are usually not needed as only a single instance exists anyway: they can be eliminated by instanciating the object directly. Other possible fixes are to synchronize the entire method or to use an initialize-on-demand holder class. See Effective Java [25], item 48 and Cert MSC07-J.

Bibliography

  1. Panagiotis Louridas. Static Code Analysis. IEEE Software 23(4):58–61, 2006.
    @article{Louridas2006,
    	author = "Louridas, Panagiotis",
    	journal = "IEEE Software",
    	number = 4,
    	pages = "58--61",
    	title = "{Static Code Analysis}",
    	volume = 23,
    	year = 2006
    }
    
  2. Nick Rutar, Christian B Almazan and Jeffrey S Foster. A comparison of bug finding tools for Java. In 15th International Symposium on Software Reliability Engineering. 2004, 245–256. URL
    @inproceedings{Rutar2004,
    	author = "Rutar, Nick and Almazan, Christian B and Foster, Jeffrey S",
    	booktitle = "15th International Symposium on Software Reliability Engineering",
    	pages = "245--256",
    	title = "{A comparison of bug finding tools for Java}",
    	url = "http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=1383122",
    	year = 2004
    }