Excel VBA to find and mask PAN data using regex for PCI DSS compliance -
because of tools discover credit card data in file systems no more list suspicious files, tools needed mask data in files must retained.
for excel files, loads of credit card data may exist, figure macro detects credit card data in selected column/row using regex , replaces middle 6-8 digits xs useful many. sadly, i'm not guru in regex macro space.
the below works regex 3 card brands only, , works if pan in cell other data (e.g. comments fields)
the below code works, improved. improve regex make work more/all card brands , reduce false-positives including luhn algorithm check.
improvements/problems remaining :
- match card brand's pans expanded regex
- include luhn algorithm checking (fixed - idea ron)
- improve while logic (fixed stribizhev)
- even better handling of cells don't contain pans (fixed)
here's have far seems working ok amex, visa , mastercard:
sub pci_mask_card_numbers() ' written mask credit card numbers in excel files in accordance pci dss. ' highlight credit card data in excel sheet, run macro. dim strpattern string: strpattern = "([4][0-9]{3})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})|" & _ "([5][0-9]{3})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})|" & _ "([3][0-9]{2})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-za-z0-9_]?[0-9]{3})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{3})([^a-za-z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{4})([^a-za-z0-9_]?[0-9]{3})|" & _ "([3][0-9]{3})([^a-za-z0-9_]?[0-9]{6})([^a-za-z0-9_]?[0-9]{5})" ' regex patterns pans above broken multiple parts (between brackets) ' such when regex matches first part of pan fit 1 of rmatch(k).submatches(#) # 0, 4, 8, 12, 16, 20 or 24. ' visa start 4 , 16 digits long. typically data entry pattern 4 groups of 4 digits ' mastercard start 5 , 16 digits long. typically data entry pattern 4 groups of 4 digits ' amex start 3 , 15 digits long. typically pattern 4-6-5, data entry seems inconsistent dim strreplace string: strreplace = "" ' dim regex new regexp ' if line used instead of next 2, ms vbs regex v5.5 needs enabled manually. next 2 lines seem within script dim regex object set regex = createobject("vbscript.regexp") dim regex new regexp dim strinput string dim myrange range dim newpan string dim aproblem string dim masked long dim problems long dim total long regex .global = true .multiline = true .ignorecase = false .pattern = strpattern ' sets regex pattern match pattern above end set myrange = selection msgbox ("the macro start masking credit card numbers identified in selected cells only. if entire columns selected, each column take 10-30 seconds complete. ditto rows.") each cell in myrange total = total + 1 ' check cell candidate holding pan, not long number if strpattern <> "" _ , cell.hasformula = false _ , left(cell.numberformat, 1) <> "$" _ , mid(cell.numberformat, 3, 1) <> "$" ' cell.numberformat = "@" strinput = cell.value ' depending on data matching regex pattern, fix if regex.test(strinput) set rmatch = regex.execute(strinput) k = 0 rmatch.count - 1 toreplace = rmatch(k).value ' if regex matched, replace pan based on regex segment select case 2 case < len(rmatch(k).submatches(0)) strreplace = rmatch(k).submatches(0) & "xxxxxxxx" & trim(rmatch(k).submatches(3)) masked = masked + 1 case < len(rmatch(k).submatches(4)) strreplace = rmatch(k).submatches(4) & "xxxxxxxx" & trim(rmatch(k).submatches(7)) masked = masked + 1 case < len(rmatch(k).submatches(8)) strreplace = rmatch(k).submatches(8) & "xxxxxxxx" & trim(rmatch(k).submatches(11)) masked = masked + 1 case < len(rmatch(k).submatches(12)) strreplace = rmatch(k).submatches(12) & "xxxxxxxx" & trim(rmatch(k).submatches(13)) masked = masked + 1 case < len(rmatch(k).submatches(16)) strreplace = rmatch(k).submatches(16) & "xxxxxxxx" & trim(rmatch(k).submatches(19)) masked = masked + 1 case < len(rmatch(k).submatches(20)) strreplace = rmatch(k).submatches(20) & "xxxxxxxx" & trim(rmatch(k).submatches(23)) masked = masked + 1 case < len(rmatch(k).submatches(24)) strreplace = rmatch(k).submatches(24) & "xxxxxxxx" & trim(rmatch(k).submatches(26)) masked = masked + 1 case else aproblem = cell.value problems = problems + 1 ' msgbox (aproblem) ' needed when curios end select if cell.value <> aproblem cell.value = replace(strinput, toreplace, strreplace) end if next k else ' adds cell value variable allow macro move past cell ' once macro trusted not loop forever, message box can removed ' msgbox ("problem. regex fail? bad data = " & aproblem) end if end if next cell ' done, tell user msgbox ("cardholder data masked" & vbcr & vbcr & "total cells highlighted (including blanks) = " & total & vbcr & "cells masked = " & masked & vbcr & "possible problem cells = " & problems & vbcr & "all other cells ignored") end sub
back vacation. here's simple vba function test luhn algorithm. argument string of digits; result boolean.
it generates checksum digit , compares digit 1 in digit string feed it.
option explicit function luhn(snum string) boolean 'modulus 10 algorithm various numbers dim x long, long, j long = len(snum) - 1 1 step -2 x = x + doublesumdigits(mid(snum, i, 1)) if > 1 x = x + mid(snum, - 1, 1) next if right(snum, 1) = (x * 9) mod 10 luhn = true else luhn = false end if end function function doublesumdigits(l long) long dim x long x = l * 2 if x > 9 x = val(left(x, 1)) + val(right(x, 1)) doublesumdigits = x end function
Comments
Post a Comment