Configuring locales on Linux

(2015-12-20)

While modern desktop environments that are used on Linux set up the locale with support for UTF-8, users who prefer not to run a desktop environment or users who use SSH to work on remote computers occasionally face trouble setting up their locale correctly.

On Linux, the locale is defined via the environment variables LANG and a bunch of environment variables starting with LC_, e.g. LC_MESSAGES.

Your system’s available locales

With locale -a, you can get a list of available locales, e.g.:

$ locale -a
C
C.UTF-8
de_CH.utf8
en_DK.utf8
en_US.utf8
POSIX

The format of these values is [language[_territory][.codeset][@modifier]], see wikipedia:Locale.

You can configure the locales that should be generated on your system in /etc/locale.gen. On Debian, sudo dpkg-reconfigure locales brings up a front-end for that configuration file which automatically runs locale-gen(8) after you’ve made changes.

The environment variables

  1. LC_ALL overrides all variables starting with LC_ and LANG.
  2. LANG is the fallback when more specific LC_ environment variables are not set.
  3. The individual LC_ variables are documented in The Single UNIX ® Specification.

Based on the above, my advice is to never set LC_ALL, set LANG to the locale you want to use and possibly override specific aspects using the relevant LC_ variables.

As an example, my personally preferred setup looks like this:

unset LC_ALL
export LANG=de_CH.UTF-8
export LC_MESSAGES=C
export LC_TIME=en_DK.UTF-8

This slightly peculiar setup sets LANG=de_CH.UTF-8 so that the default value corresponds to Switzerland, where I live.

Then, it specifies via LC_MESSAGES=C that I prefer program output to not be translated. This corresponds to programs having English output in all cases that are relevant to me, but strictly speaking you could come across programs whose untranslated language isn’t English, so maybe you’d prefer LC_MESSAGES=en_US.UTF-8.

Finally, via LC_TIME=en_DK.UTF-8, I configure date/time output to use the ISO8601 output, i.e. YYYY-mm-dd HH:MM:SS and a 24-hour clock.

Displaying the current locale setup

By running locale without any arguments, you can see the currently effective locale configuration:

$ locale
LANG=de_CH.UTF-8
LANGUAGE=
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME=en_DK.UTF-8
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES=C
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=

Where to set the environment variables?

Unfortunately, there is no single configuration file that allows you to set environment variables. Instead, each shell reads a slightly different set of configuration files, see wikipedia:Unix_shell#Configuration_files for an overview. If you’re unsure which shell you are using, try using readlink /proc/$$/exe.

Configuring the environment variables in the shell covers logins on the text console and via SSH, but you’ll still need to set the environment variables for graphical sessions. If you’re using a desktop environment such as GNOME, the desktop environment will configure the locale for you. If you’re using a window manager, you should be using an Xsession script (typically found in ~/.xsession or ~/.xinitrc).

To keep the configuration centralized, I recommend you create a file that you can include from both your shell config and your Xsession:

cat > ~/.my_locale_env <<'EOT'
unset LC_ALL
export LANG=de_CH.UTF-8
export LC_MESSAGES=C
export LC_TIME=en_DK.UTF-8
EOT

echo 'source ~/.my_locale_env' >> ~/.zshenv
sed -i '2isource ~/.my_locale_env' ~/.xsession

Remember to make these settings both on your local machines and on the machines you log into remotely.

Non-interactive SSH sessions

Notably, the above setup only covers interactive sessions. When you run e.g. ssh server ls /tmp, ssh will actually use a non-interactive non-login shell. For most shells, this means that the usual configuration files are not read.

In order for your locale setup to apply to non-interactive SSH commands as well, ensure that your SSH client is configured with SendEnv LANG LC_* to send the environment variables to the SSH server when connecting. On the server, you’ll need to have AcceptEnv LANG LC_* configured. Recent versions of OpenSSH include these settings by default in /etc/ssh/ssh_config and /etc/ssh/sshd_config, respectively. If that’s not the case on your machine, use echo "SendEnv LANG LC_*" >> ~/.ssh/config.

To verify which variables are getting sent, run SSH with the -v flag and look for the line “debug1: Sending environment.”:

$ ssh localhost env
[…]
debug1: Sending environment.
debug1: Sending env LC_TIME = en_DK.UTF-8
debug1: Sending env LANG = de_CH.UTF-8
debug1: Sending env LC_MESSAGES = C
debug1: Sending command: env
[…]

Debugging locale issues

You can introspect the locale-related environment variables of any process by inspecting the /proc/${PID}/environ file (where ${PID} stands for the process id of the process). As an example, this is how you verify your window manager is using the expected configuration, provided you use i3:

$ tr '\0' '\n' < /proc/$(pidof i3)/environ | grep -e '^\(LANG\|LC_\)'
LC_MESSAGES=C
LANG=de_CH.UTF-8
LC_TIME=en_DK.UTF-8

In order for Unicode text input to work, your terminal emulator (e.g. urxvt) and the program you are using inside it (e.g. your shell, like zsh, or a terminal multiplexer like screen, or a chat program like irssi, etc.) both should use a locale whose codeset is UTF-8.

A good end-to-end test could be to run the following perl command:

$ perl -MPOSIX -Mlocale -e 'print strftime("%c", localtime) . "\n"'
2015-12-20T16:22:09 CET

In case your locales are misconfigured, perl will complain loudly:

$ LC_TIME=en_AU.UTF-8 perl -MPOSIX -Mlocale -e 'print strftime("%c", localtime) . "\n"'
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_TIME = "en_AU.UTF-8",
    LC_MESSAGES = "C",
    LANG = "de_CH.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("de_CH.UTF-8").
Son 20 Dez 2015 16:22:58 CET