UTF-8 Aware Cron Scripts

I’ve recently been having a spot of bother with UTF-8 data in a Perl script on an old linux box.

Specifically, I have been importing data from a RESTful service that includes the name Michael Bublé. That accented e at the end of Michael’s name has been problematic.

When I run my code from the command line, it imports correctly into my system, however, when run as a cron job, it imports as Michael Bublé. The é is a multibyte character, but the script was trying to read it seperate characters and getting into a muddle.

At first I assumed the service I was consuming had change the encoding, but running via the command line showed no problems. The problem was down a difference between the command line and cron environments.

Checking the locale using the locale command I got this on the command line…

… but when running that command as a cron job and piping the results to a file in /tmp, I got the following…

Cron jobs were being executed that weren’t UTF-8 aware. The solution was to set the LANG in the /etc/environment file like this…

… then restart the cron daemon using

Now my scripts can successfully import multibyte UTF-8 data correctly when run on the command line or as a cron job.

The /etc/environment file is used to set variables that specify the basic environment for all processes so should be the best place to set the lANG variable.

Leave a Reply