How to add a new CPD language

First of all, thanks for the contribution!

Happily for you, to add CPD support for a new language is now easier than ever!

Pro Tip: If you wish to add a new language, there are more than 50 languages you could easily add with just an Antlr grammar.

All you need to do is follow this few steps:

  1. Create a new module for your language, you can take the Golang module as an example
  2. Create a Tokenizer

    • For Antlr grammars you can take the grammar from here and extend AntlrTokenizer taking Go as an example
       public class GoTokenizer extends AntlrTokenizer {    
            
           @Override protected AntlrTokenManager getLexerForSource(SourceCode sourceCode) {   
               CharStream charStream = AntlrTokenizer.getCharStreamFromSourceCode(sourceCode);   
               return new AntlrTokenManager(new GolangLexer(charStream), sourceCode.getFileName());   
           }
       }
    

If you’re using Antlr or JavaCC, update the pom.xml of your submodule to use the appropriate ant wrapper. See pmd-go/pom.xml and pmd-python/pom.xml for examples.

  1. Create your Language class

     public class GoLanguage extends AbstractLanguage {    
            
         public GoLanguage() {   
             super("Go", "go", new GoTokenizer(), ".go");   
         }  
     } 
    
    Pro Tip: Yes, keep looking at Go!

    You are almost there!

  2. Update the list of supported languages

    • Write the fully-qualified name of your Language class to the file src/main/resources/META-INF/services/net.sourceforge.pmd.cpd.Language

    • Update the test that asserts the list of supported languages by updating the SUPPORTED_LANGUAGES constant in BinaryDistributionIT

  3. Please don’t forget to add some test, you can again.. look at Go implementation ;)

    If you read this far, I’m keen to think you would also love to support some extra CPD configuration (ignore imports or crazy things like that)
    If that’s your case , you came to the right place!

  4. You can add your custom properties using a Token filter

Testing your implementation

Add a Maven dependency on pmd-lang-test (scope test) in your pom.xml. This contains utilities to test your Tokenizer.

For simple tests, create a test class extending from CpdTextComparisonTest. That class is written in Kotlin, but you can extend it in Java as well.

To add tests, you need to write regular JUnit @Test-annotated methods, and call the method doTest with the name of the test file.

For example, for the Dart language:


public class DartTokenizerTest extends CpdTextComparisonTest {

    /**********************************
      Implementation of the superclass
    ***********************************/


    public DartTokenizerTest() {
        super(".dart"); // the file extension for the dart language
    }

    @Override
    protected String getResourcePrefix() {
        // If your class is in                  src/test/java     /some/package
        // you need to place the test files in  src/test/resources/some/package/cpdData
        return "cpdData";
    }

    @Override
    public Tokenizer newTokenizer() {
        // Override this abstract method to return the correct tokenizer
        return new DartTokenizer();
    }

    /**************
      Test methods
    ***************/


    @Test  // don't forget the JUnit annotation
    public void testLiterals() {
        // This will look for a file named literals.dart
        // in the directory identified by getResourcePrefix,
        // tokenize it, then compare the result against a baseline
        // literals.txt file in the same directory

        // If the baseline file does not exist, it is created automatically
        doTest("literals");
    }

}