php - Better regex for security and auditing? -


for personal use , work have written various regular expressions find variables in php. purpose of regex security reasons , in particular vet scripts , plugins. expression follows:

\${1,1}[\w]+[" +"]{0,}=[" +"]{0,}['"][a-za-z0-9" "]+['"]+[;]{0,} 

the above regular expression find $vars , set to. use search entire directories , sites using dreamweaver. below example of kind of php variables found above regex.

$var = 'sample'; $var = "sampletext" $var="sampletext" $$$var  = "sampletext" $var      = "sampletext" $var=     'sampletext'; $var = 'here sample text'; var = 'here more sample text'; 

you can see how there slight variation of above variables. use double quotes, single quotes, have semicolons others don't , there variations spaces.

so, question, can simplify regular expression? have other regular expressions use vet , analyize code, php in particular nice. many time read , on this.

both regex in question , answer variable assignment expressions; if looking first assignment, complicate matters , better - @mario says - use php_parser.

there lot of weird aspects regex. first of small enumeration of weird regex constructs:

  1. \${1,1}

    {1,1} means between one , one time. rather useless , can replaced \$.

  2. [\w]+

    here use box of 1 type of characters, semantically equivalent expression \w, there wrong part. document says name of variable can start letter or underscore. followed letters, underscores , digits. \w means last category. expression $0 matched. documentation shows how specify variable name:

    [a-za-z_\x7f-\xff][a-za-z0-9_\x7f-\xff]* 
  3. [" +"]{0,}

    here i'm not entirely sure aim do; seem provide regex choice of 0 or more repititions of quotes ("), spaces () , plus (+). if want 0 or more spacing characters, can use \s*. same holds parts after assignment.

  4. =

    here assume can declare variable assignment. that's not true: php allows use default value , instance write $var += 3;. in case $var "initialized" 3 since default value 0. agree bad design. optionally can allow ([+-*/%.&|^]|<<|>>)?.

  5. spacing again; see number 4.

  6. ['"][a-za-z0-9" "]+['"]+[;]{0,}.

    expression, hard parse php expression next of assignment operator. can constants numbers, these can variables, strings, function calls,... function calls can cascaded in f(1,2,g(3,'a')), etc. standard regexes such calls cannot processed correctly: consequence of pumping lemma regular languages. php allows balanced brackets extension in theory can done. in case, need dig context free grammar of php making harder.

    you furthermore state of expressions don't end semicolon. php -a interactive shell doesn't seem idea much:

    $ php -a php > $var php > echo $var; php parse error:  syntax error, unexpected 'echo' (t_echo) in php shell code on line 2 

    you use semi-colon way find out when expression terminate. instance:

    .*?; 

    this work there problem: semicolon can placed inside string well. in case 1 needs ignore semicolon. can replace dot . regex:

    ([^"']|(["'][^"]*["']))*? 

    but again results in problems, because quote can escaped (like "\"") in; in case don't want regex interpret second " end of string. can solve making regex bit more complicated:

    ([^"']|(["']([^"\\]|\\.)*["']))*? 

as result, regex read:

\$[a-za-z_\x7f-\xff][a-za-z0-9_\x7f-\xff]*\s*([+-*/%.&|^]|<<|>>)?=\s*([^"']|(["']([^"\\]|\\.)*["']))*?; 

regex101 demo.

as said before requires expression end semicolon. semicolons in string environments ignored.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -