java - Match a single senerio with ANTLR and skip everything else as noise -

May 15, 2011

i defined simple grammar using antlr v4 eclipse plugin. want parse file contains coldfusion cfscript code, , find every instance of property definition. example:

property name="producttypeid" ormtype="string" length="32" fieldtype="id" generator="uuid" unsavedvalue="" default="";

that is, property keyword followed number of attributes, line terminated semicolon.

.g4 file

grammar cfproperty; property  : 'property ' (atr'='string)+eol; // match keyword property followed attribute definition  atr : [a-za-z]+;                            // match lower , upper-case identifiers name  string: '"' .*? '"';                        // match string  ws : [ \t\r\n]+ -> skip;                    // skip spaces, tabs, newlines  eol : ';';                                  // end of property line

i put simple java project uses generated parser, tree-walker etc printout occurrences of matches.

the input i'm testing is:

"property id=\"actionid\" name=\"actionname\" attr=\"actionattr\"    hbmethod=\"hbmethod\"; public function funtion  {//some text} property name=\"actionid\" name=\"actionname\" attr=\"actionattr\" hbmethod=\"hbmethod\"; \n more noise "

my issue matching:

property id="actionid" name="actionname" attr="actionattr"    hbmethod="hbmethod";

and because doesn't understand everthing else noise, doesn't match second instance of property definition.

how can match on multiple instances of property definition , match on else in-between noise skipped?

you can use lexer mode want. 1 mode property , stuffs , 1 mode noise. idea behind mode go mode (a state) following token found during lexing operation.

to this, have cut grammar in 2 files, parser in 1 file , lexer in other.

here lexer part (named testlexer.g4 in case)

lexer grammar testlexer;  // normal mode  property : 'property'; equals : '='; atr : [a-za-z]+;                           // match lower , upper-case identifiers name string: '"' .*? '"';                       // match string ws : [ \t\r\n]+        -> skip;            // skip spaces, tabs, newlines eol : ';'              -> pushmode(noise); // when ';' found, go noise mode skip  mode noise;  noise_property : 'property' -> type(property), popmode;   // when 'property' found, it's property token , go  normal mode :  .+?                  -> skip;                      // skip other stuffs

here parser part (named test.g4 in case)

grammar test;  options { tokenvocab=testlexer; }  root : property+; property  : property (atr equals string)+ eol; // match keyword   property followed attribute definition

this should work :)

Search This Blog

Call

java - Match a single senerio with ANTLR and skip everything else as noise -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -