UTF-8 Converter


The other day at work, I needed to batch convert about one or two hundred files formatted in MACROMAN format to UTF-8. Well, it turns out there is a command line utility to do just this called iconv. I was very pleased when I found that because it was going to save me a lot of time. Then I ran it and got confused. It turns out that iconv does convert text format, but it doesn’t write it back out to a file, it just spits the results back into the terminal window. Mildly frustrated, I decided to take matters into my own hands and write a script that would take the output and put it back into a file with the same name. These are the results:

#!/bin/bash

for f in $1/* ; do
o=`basename $f`
if file $f | grep Unicode ; then
cp $f $2
else
iconv -f MACROMAN -t UTF-8 $f >$2/$o
fi
done

I went further and added options,  a debug mode, verbose mode, and the like, and even a man page! The syntax is:

# roman_to_utf8 [options] <input> <output>

The input and output can be either directories or individual files.

#!/bin/bash

usage() {
echo Usage: $0 "[-v | --verbose] [-d | --debug] [-e | --encoding <encoding>] <input> <output>"
exit 1
}

VERBOSE=false
ENCODING=MACROMAN
DEBUG=false

while true; do
case $1 in
-v | --verbose) VERBOSE=true;;
-d | --debug) DEBUG=true;;
-*) echo "Bad option $1"; usage;;
*) break;;
esac
shift
done

SOURCE="$1"
DESTINATION="$2"

if [ $DEBUG = true ]; then
echo VERBOSE = $VERBOSE
echo SOURCE = $SOURCE
echo DESTINATION = "$DESTINATION"
echo ENCODING = "$ENCODING"
exit
fi

if [ "x$SOURCE" = x -o "x$DESTINATION" = x ]; then
usage;
fi

convert() {
INPUT="$1"
OUTPUT="$2"
FILENAME=`basename "$INPUT"`
if file "$INPUT" | grep Unicode ; then
cp "$INPUT" "$OUTPUT"
$VERBOSE && echo "Successfully copied $FILENAME"
else
iconv -s -f $ENCODING -t UTF-8 "$INPUT" >"$OUTPUT/$FILENAME"
$VERBOSE && echo "Successfully converted $FILENAME"
fi
}

if [ -d "$SOURCE" ]; then
for INPUT in "$SOURCE"/* ; do
convert "$INPUT" "$DESTINATION"
done
else
convert "$SOURCE" "$DESTINATION"
fi

exit

Please excuse the poor tabbing due to wordpress. In any case, it worked and saved me a ton of time. All you need to do is copy this script into a file and make it executable. Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.