How to filter only printable characters in a file on Bash (linux) or Python? -

March 15, 2011

i want make file including non-printable characters include printable characters. think problem related acscii control action, not find solution , not understand meaning of .[16d (ascii control action character??) in following file.

hexdump of input file:

00000000: 4845 4c4c 4f20 5448 4953 2049 5320 5448 hello th 00000010: 4520 5445 5354 1b5b 3136 4420 2020 2020 e test.[16d 00000020: 2020 2020 2020 2020 2020 201b 5b31 3644            .[16d 00000030: 2020

when cated file on bash, got: "hello ". think because default cat interprets ascii control action, 2 .[16ds.

why 2 .[16d strings make cat file print "hello"?, and... how can make file include printable characters, i.e., "hello "?

the hexdump shows dot in .[16d escape character, \x1b.
esc[nd ansi escape code delete n characters. esc[16d tells terminal delete 16 characters, explains cat output.

there various ways remove ansi escape codes file, either using bash commands (eg using sed, in anubhava's answer) or python.

however, in cases this, may better run file through terminal emulator interpret existing editing control sequences in file, result file's author intended after applied editing sequences.

one way in python use pyte, python module implements simple vtxxx compatible terminal emulator. can install using pip, , here docs on readthedocs.

here's simple demo program interprets data given in question. it's written python 2, it's easy adapt python 3. pyte unicode-aware, , standard stream class expects unicode strings, example uses bytestream, can pass plain byte string.

#!/usr/bin/env python  ''' pyte vtxxx terminal emulator demo      interpret byte string containing text , ansi / vtxxx control sequences      code adapted demo script in pyte tutorial @     http://pyte.readthedocs.org/en/latest/tutorial.html#tutorial      posted http://stackoverflow.com/a/30571342/4014959       written pm 2ring 2015.06.02 '''  import pyte   #hex dump of data #00000000  48 45 4c 4c 4f 20 54 48  49 53 20 49 53 20 54 48  |hello th| #00000010  45 20 54 45 53 54 1b 5b  31 36 44 20 20 20 20 20  |e test.[16d     | #00000020  20 20 20 20 20 20 20 20  20 20 20 1b 5b 31 36 44  |           .[16d| #00000030  20 20                                             |  |  data = 'hello test\x1b[16d                \x1b[16d  '  #create default sized screen tracks changed lines screen = pyte.diffscreen(80, 24) screen.dirty.clear() stream = pyte.bytestream() stream.attach(screen) stream.feed(data)  #get index of last line containing text last = max(screen.dirty)  #gather lines, stripping trailing whitespace lines = [screen.display[i].rstrip() in range(last + 1)]  print '\n'.join(lines)

output

hello

hex dump of output

00000000  48 45 4c 4c 4f 0a                                 |hello.|

Search This Blog

Call

How to filter only printable characters in a file on Bash (linux) or Python? -

Comments

Post a Comment

Popular posts from this blog

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -