pygettext – Python equivalent of xgettext(1)

Many systems (Solaris, Linux, Gnu) provide extensive tools that ease the internationalization of C programs. Most of these tools are independent of the programming language and can be used from within Python programs. Martin von Loewis’ work[1] helps considerably in this regard.

There’s one problem though; xgettext is the program that scans source code looking for message strings, but it groks only C (or C++). Python introduces a few wrinkles, such as dual quoting characters, triple quoted strings, and raw strings. xgettext understands none of this.

Enter pygettext, which uses Python’s standard tokenize module to scan Python source code, generating .pot files identical to what GNU xgettext[2] generates for C and C++ code. From there, the standard GNU tools can be used.

A word about marking Python strings as candidates for translation. GNU xgettext recognizes the following keywords: gettext, dgettext, dcgettext, and gettext_noop. But those can be a lot of text to include all over your code. C and C++ have a trick: they use the C preprocessor. Most internationalized C source includes a #define for gettext() to _() so that what has to be written in the source is much less. Thus these are both translatable strings:

gettext(“Translatable String”) _(“Translatable String”)

Python of course has no preprocessor so this doesn’t work so well. Thus, pygettext searches only for _() by default, but see the -k/–keyword flag below for how to augment this.

NOTE: pygettext attempts to be option and feature compatible with GNU xgettext where ever possible. However some options are still missing or are not fully implemented. Also, xgettext’s use of command line switches with option arguments is broken, and in these cases, pygettext just defines additional switches.

Usage: pygettext [options] inputfile ...


-a –extract-all

Extract all strings

-d name –default-domain=name

Rename the default output file from messages.pot to name.pot

-E –escape

replace non-ASCII characters with octal escape sequences.

-h –help

print this help message and exit

-k word –keyword=word

Keywords to look for in addition to the default set, which are: %(DEFAULTKEYWORDS)s

You can have multiple -k flags on the command line.

-K –no-default-keywords

Disable the default set of keywords (see above). Any keywords explicitly added with the -k/–keyword option are still recognized.

—no-location Do not write filename/lineno location comments.

-n –add-location

Write filename/lineno location comments indicating where each extracted string is found in the source. These lines appear before each msgid. The style of comments is controlled by the -S/–style option. This is the default.

-S stylename –style stylename

Specify which style to use for location comments. Two styles are supported:

Solaris # File: filename, line: line-number GNU #: filename:line

The style name is case insensitive. GNU style is the default.

-o filename –output-file=filename

Rename the default output file from messages.pot to filename. If filename is `-‘ then the output is sent to standard out.

-p dir –output-dir=dir

Output files will be placed in directory dir.

-v –verbose

Print the names of the files being processed.

-V –version

Print the version of pygettext and exit.

-w columns –width=columns

Set width of output to columns.

-x filename –exclude-file=filename

Specify a file that contains a list of strings that are not be extracted from the input files. Each string to be excluded must appear on a line by itself in the file.

If `inputfile’ is -, standard input is read.

pygettext.usage(code, msg='')


class pygettext.TokenEater(options)

Table Of Contents

Previous topic


Next topic


This Page