Kate and cleaning Byte Order Mark


Once upon time BOM stole a few hours….

Over the years, Kate editor become my favorite editor and even more. With constant active development kate is more powerful than ever. Initially, I needed a light kde editor for my python projects: kate performed great. Over the time, I used kate more and more, and now it become my primary tool.

Recently, I noticed that somehow my session was set to add Byte Order Mark (BOM) to UTF-8 files. The byte order mark is a Unicode character used to signal the byte order of a text file. More importantly, it might confuse the python templates libraries such as mako, or babel. In kate, you can see if BOM is enabled for a specific file by looking at Tools->Add Byte Order Mark (BOM). As you can suspect, when one opens hundreds of files and accidentally attaches BOM at each one, it would be quite painful to clean BOM manually: uncheck Tools->Add Byte Order Mark (BOM). Here is the solution: find all files with BOM with find and awk


find . -type f -print0 | xargs -0r awk '
/^\xEF\xBB\xBF/ {print FILENAME}
{nextfile}'

And to clean BOM from all *.mako files


find . -type f -iname *.mako -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;

and everyone lived happy ever after…