/usr/share/doc/glibc-doc/html/libc_8.html is in glibc-doc 2.15-0ubuntu10.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd">
<html>
<!-- This file documents the GNU C library.
This is Edition 0.13, last updated 2011-07-19,
of The GNU C Library Reference Manual, for version
2.14 (Ubuntu EGLIBC 2.15-0ubuntu10) .
Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002,
2003, 2007, 2008, 2010, 2011 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Free Software Needs Free Documentation"
and "GNU Lesser General Public License", the Front-Cover texts being
"A GNU Manual", and with the Back-Cover Texts as in (a) below. A
copy of the license is included in the section entitled "GNU Free
Documentation License".
(a) The FSF's Back-Cover Text is: "You have the freedom to
copy and modify this GNU manual. Buying copies from the FSF
supports it in developing GNU and promoting software freedom."
-->
<!-- Created on April 20, 2012 by texi2html 1.82
texi2html was written by:
Lionel Cons <Lionel.Cons@cern.ch> (original author)
Karl Berry <karl@freefriends.org>
Olaf Bachmann <obachman@mathematik.uni-kl.de>
and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>The GNU C Library: 8. Message Translation</title>
<meta name="description" content="The GNU C Library: 8. Message Translation">
<meta name="keywords" content="The GNU C Library: 8. Message Translation">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.82">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.roman {font-family:serif; font-weight:normal;}
span.sansserif {font-family:sans-serif; font-weight:normal;}
ul.toc {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Message-Translation"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="libc_7.html#Yes_002dor_002dNo-Questions" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc_7.html#Locales" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Message-Translation-1"></a>
<h1 class="chapter">8. Message Translation</h1>
<p>The program’s interface with the human should be designed in a way to
ease the human the task. One of the possibilities is to use messages in
whatever language the user prefers.
</p>
<p>Printing messages in different languages can be implemented in different
ways. One could add all the different languages in the source code and
add among the variants every time a message has to be printed. This is
certainly no good solution since extending the set of languages is
difficult (the code must be changed) and the code itself can become
really big with dozens of message sets.
</p>
<p>A better solution is to keep the message sets for each language are kept
in separate files which are loaded at runtime depending on the language
selection of the user.
</p>
<p>The GNU C Library provides two different sets of functions to support
message translation. The problem is that neither of the interfaces is
officially defined by the POSIX standard. The <code>catgets</code> family of
functions is defined in the X/Open standard but this is derived from
industry decisions and therefore not necessarily based on reasonable
decisions.
</p>
<p>As mentioned above the message catalog handling provides easy
extendibility by using external data files which contain the message
translations. I.e., these files contain for each of the messages used
in the program a translation for the appropriate language. So the tasks
of the message handling functions are
</p>
<ul>
<li>
locate the external data file with the appropriate translations.
</li><li>
load the data and make it possible to address the messages
</li><li>
map a given key to the translated message
</li></ul>
<p>The two approaches mainly differ in the implementation of this last
step. The design decisions made for this influences the whole rest.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top"><a href="#Message-catalogs-a-la-X_002fOpen">8.1 X/Open Message Catalog Handling</a></td><td> </td><td align="left" valign="top"> The <code>catgets</code> family of functions.
</td></tr>
<tr><td align="left" valign="top"><a href="#The-Uniforum-approach">8.2 The Uniforum approach to Message Translation</a></td><td> </td><td align="left" valign="top"> The <code>gettext</code> family of functions.
</td></tr>
</table>
<hr size="6">
<a name="Message-catalogs-a-la-X_002fOpen"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Message-Translation" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#The-catgets-Functions" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="X_002fOpen-Message-Catalog-Handling"></a>
<h2 class="section">8.1 X/Open Message Catalog Handling</h2>
<p>The <code>catgets</code> functions are based on the simple scheme:
</p>
<blockquote><p>Associate every message to translate in the source code with a unique
identifier. To retrieve a message from a catalog file solely the
identifier is used.
</p></blockquote>
<p>This means for the author of the program that s/he will have to make
sure the meaning of the identifier in the program code and in the
message catalogs are always the same.
</p>
<p>Before a message can be translated the catalog file must be located.
The user of the program must be able to guide the responsible function
to find whatever catalog the user wants. This is separated from what
the programmer had in mind.
</p>
<p>All the types, constants and functions for the <code>catgets</code> functions
are defined/declared in the ‘<tt>nl_types.h</tt>’ header file.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top"><a href="#The-catgets-Functions">8.1.1 The <code>catgets</code> function family</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#The-message-catalog-files">8.1.2 Format of the message catalog files</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#The-gencat-program">8.1.3 Generate Message Catalogs files</a></td><td> </td><td align="left" valign="top"> How to generate message catalogs files which
can be used by the functions.
</td></tr>
<tr><td align="left" valign="top"><a href="#Common-Usage">8.1.4 How to use the <code>catgets</code> interface</a></td><td> </td><td align="left" valign="top"></td></tr>
</table>
<hr size="6">
<a name="The-catgets-Functions"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#The-message-catalog-files" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="The-catgets-function-family"></a>
<h3 class="subsection">8.1.1 The <code>catgets</code> function family</h3>
<dl>
<dt><a name="index-catopen"></a><u>Function:</u> nl_catd <b>catopen</b><i> (const char *<var>cat_name</var>, int <var>flag</var>)</i></dt>
<dd><p>The <code>catgets</code> function tries to locate the message data file names
<var>cat_name</var> and loads it when found. The return value is of an
opaque type and can be used in calls to the other functions to refer to
this loaded catalog.
</p>
<p>The return value is <code>(nl_catd) -1</code> in case the function failed and
no catalog was loaded. The global variable <var>errno</var> contains a code
for the error causing the failure. But even if the function call
succeeded this does not mean that all messages can be translated.
</p>
<p>Locating the catalog file must happen in a way which lets the user of
the program influence the decision. It is up to the user to decide
about the language to use and sometimes it is useful to use alternate
catalog files. All this can be specified by the user by setting some
environment variables.
</p>
<p>The first problem is to find out where all the message catalogs are
stored. Every program could have its own place to keep all the
different files but usually the catalog files are grouped by languages
and the catalogs for all programs are kept in the same place.
</p>
<a name="index-NLSPATH-environment-variable"></a>
<p>To tell the <code>catopen</code> function where the catalog for the program
can be found the user can set the environment variable <code>NLSPATH</code> to
a value which describes her/his choice. Since this value must be usable
for different languages and locales it cannot be a simple string.
Instead it is a format string (similar to <code>printf</code>’s). An example
is
</p>
<table><tr><td> </td><td><pre class="smallexample">/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
</pre></td></tr></table>
<p>First one can see that more than one directory can be specified (with
the usual syntax of separating them by colons). The next things to
observe are the format string, <code>%L</code> and <code>%N</code> in this case.
The <code>catopen</code> function knows about several of them and the
replacement for all of them is of course different.
</p>
<dl compact="compact">
<dt> <code>%N</code></dt>
<dd><p>This format element is substituted with the name of the catalog file.
This is the value of the <var>cat_name</var> argument given to
<code>catgets</code>.
</p>
</dd>
<dt> <code>%L</code></dt>
<dd><p>This format element is substituted with the name of the currently
selected locale for translating messages. How this is determined is
explained below.
</p>
</dd>
<dt> <code>%l</code></dt>
<dd><p>(This is the lowercase ell.) This format element is substituted with the
language element of the locale name. The string describing the selected
locale is expected to have the form
<code><var>lang</var>[_<var>terr</var>[.<var>codeset</var>]]</code> and this format uses the
first part <var>lang</var>.
</p>
</dd>
<dt> <code>%t</code></dt>
<dd><p>This format element is substituted by the territory part <var>terr</var> of
the name of the currently selected locale. See the explanation of the
format above.
</p>
</dd>
<dt> <code>%c</code></dt>
<dd><p>This format element is substituted by the codeset part <var>codeset</var> of
the name of the currently selected locale. See the explanation of the
format above.
</p>
</dd>
<dt> <code>%%</code></dt>
<dd><p>Since <code>%</code> is used in a meta character there must be a way to
express the <code>%</code> character in the result itself. Using <code>%%</code>
does this just like it works for <code>printf</code>.
</p></dd>
</dl>
<p>Using <code>NLSPATH</code> allows arbitrary directories to be searched for
message catalogs while still allowing different languages to be used.
If the <code>NLSPATH</code> environment variable is not set, the default value
is
</p>
<table><tr><td> </td><td><pre class="smallexample"><var>prefix</var>/share/locale/%L/%N:<var>prefix</var>/share/locale/%L/LC_MESSAGES/%N
</pre></td></tr></table>
<p>where <var>prefix</var> is given to <code>configure</code> while installing the GNU
C Library (this value is in many cases <code>/usr</code> or the empty string).
</p>
<p>The remaining problem is to decide which must be used. The value
decides about the substitution of the format elements mentioned above.
First of all the user can specify a path in the message catalog name
(i.e., the name contains a slash character). In this situation the
<code>NLSPATH</code> environment variable is not used. The catalog must exist
as specified in the program, perhaps relative to the current working
directory. This situation in not desirable and catalogs names never
should be written this way. Beside this, this behavior is not portable
to all other platforms providing the <code>catgets</code> interface.
</p>
<a name="index-LC_005fALL-environment-variable"></a>
<a name="index-LC_005fMESSAGES-environment-variable"></a>
<a name="index-LANG-environment-variable"></a>
<p>Otherwise the values of environment variables from the standard
environment are examined (see section <a href="libc_25.html#Standard-Environment">Standard Environment Variables</a>). Which
variables are examined is decided by the <var>flag</var> parameter of
<code>catopen</code>. If the value is <code>NL_CAT_LOCALE</code> (which is defined
in ‘<tt>nl_types.h</tt>’) then the <code>catopen</code> function use the name of
the locale currently selected for the <code>LC_MESSAGES</code> category.
</p>
<p>If <var>flag</var> is zero the <code>LANG</code> environment variable is examined.
This is a left-over from the early days where the concept of the locales
had not even reached the level of POSIX locales.
</p>
<p>The environment variable and the locale name should have a value of the
form <code><var>lang</var>[_<var>terr</var>[.<var>codeset</var>]]</code> as explained above.
If no environment variable is set the <code>"C"</code> locale is used which
prevents any translation.
</p>
<p>The return value of the function is in any case a valid string. Either
it is a translation from a message catalog or it is the same as the
<var>string</var> parameter. So a piece of code to decide whether a
translation actually happened must look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">{
char *trans = catgets (desc, set, msg, input_string);
if (trans == input_string)
{
/* Something went wrong. */
}
}
</pre></td></tr></table>
<p>When an error occurred the global variable <var>errno</var> is set to
</p>
<dl compact="compact">
<dt> <var>EBADF</var></dt>
<dd><p>The catalog does not exist.
</p></dd>
<dt> <var>ENOMSG</var></dt>
<dd><p>The set/message tuple does not name an existing element in the
message catalog.
</p></dd>
</dl>
<p>While it sometimes can be useful to test for errors programs normally
will avoid any test. If the translation is not available it is no big
problem if the original, untranslated message is printed. Either the
user understands this as well or s/he will look for the reason why the
messages are not translated.
</p></dd></dl>
<p>Please note that the currently selected locale does not depend on a call
to the <code>setlocale</code> function. It is not necessary that the locale
data files for this locale exist and calling <code>setlocale</code> succeeds.
The <code>catopen</code> function directly reads the values of the environment
variables.
</p>
<dl>
<dt><a name="index-catgets"></a><u>Function:</u> char * <b>catgets</b><i> (nl_catd <var>catalog_desc</var>, int <var>set</var>, int <var>message</var>, const char *<var>string</var>)</i></dt>
<dd><p>The function <code>catgets</code> has to be used to access the massage catalog
previously opened using the <code>catopen</code> function. The
<var>catalog_desc</var> parameter must be a value previously returned by
<code>catopen</code>.
</p>
<p>The next two parameters, <var>set</var> and <var>message</var>, reflect the
internal organization of the message catalog files. This will be
explained in detail below. For now it is interesting to know that a
catalog can consists of several set and the messages in each thread are
individually numbered using numbers. Neither the set number nor the
message number must be consecutive. They can be arbitrarily chosen.
But each message (unless equal to another one) must have its own unique
pair of set and message number.
</p>
<p>Since it is not guaranteed that the message catalog for the language
selected by the user exists the last parameter <var>string</var> helps to
handle this case gracefully. If no matching string can be found
<var>string</var> is returned. This means for the programmer that
</p>
<ul>
<li>
the <var>string</var> parameters should contain reasonable text (this also
helps to understand the program seems otherwise there would be no hint
on the string which is expected to be returned.
</li><li>
all <var>string</var> arguments should be written in the same language.
</li></ul>
</dd></dl>
<p>It is somewhat uncomfortable to write a program using the <code>catgets</code>
functions if no supporting functionality is available. Since each
set/message number tuple must be unique the programmer must keep lists
of the messages at the same time the code is written. And the work
between several people working on the same project must be coordinated.
We will see some how these problems can be relaxed a bit (see section <a href="#Common-Usage">How to use the <code>catgets</code> interface</a>).
</p>
<dl>
<dt><a name="index-catclose"></a><u>Function:</u> int <b>catclose</b><i> (nl_catd <var>catalog_desc</var>)</i></dt>
<dd><p>The <code>catclose</code> function can be used to free the resources
associated with a message catalog which previously was opened by a call
to <code>catopen</code>. If the resources can be successfully freed the
function returns <code>0</code>. Otherwise it return <code>-1</code> and the
global variable <var>errno</var> is set. Errors can occur if the catalog
descriptor <var>catalog_desc</var> is not valid in which case <var>errno</var> is
set to <code>EBADF</code>.
</p></dd></dl>
<hr size="6">
<a name="The-message-catalog-files"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#The-catgets-Functions" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#The-gencat-program" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Format-of-the-message-catalog-files"></a>
<h3 class="subsection">8.1.2 Format of the message catalog files</h3>
<p>The only reasonable way the translate all the messages of a function and
store the result in a message catalog file which can be read by the
<code>catopen</code> function is to write all the message text to the
translator and let her/him translate them all. I.e., we must have a
file with entries which associate the set/message tuple with a specific
translation. This file format is specified in the X/Open standard and
is as follows:
</p>
<ul>
<li>
Lines containing only whitespace characters or empty lines are ignored.
</li><li>
Lines which contain as the first non-whitespace character a <code>$</code>
followed by a whitespace character are comment and are also ignored.
</li><li>
If a line contains as the first non-whitespace characters the sequence
<code>$set</code> followed by a whitespace character an additional argument
is required to follow. This argument can either be:
<ul class="toc">
<li>-
a number. In this case the value of this number determines the set
to which the following messages are added.
</li><li>-
an identifier consisting of alphanumeric characters plus the underscore
character. In this case the set get automatically a number assigned.
This value is one added to the largest set number which so far appeared.
<p>How to use the symbolic names is explained in section <a href="#Common-Usage">How to use the <code>catgets</code> interface</a>.
</p>
<p>It is an error if a symbol name appears more than once. All following
messages are placed in a set with this number.
</p></li></ul>
</li><li>
If a line contains as the first non-whitespace characters the sequence
<code>$delset</code> followed by a whitespace character an additional argument
is required to follow. This argument can either be:
<ul class="toc">
<li>-
a number. In this case the value of this number determines the set
which will be deleted.
</li><li>-
an identifier consisting of alphanumeric characters plus the underscore
character. This symbolic identifier must match a name for a set which
previously was defined. It is an error if the name is unknown.
</li></ul>
<p>In both cases all messages in the specified set will be removed. They
will not appear in the output. But if this set is later again selected
with a <code>$set</code> command again messages could be added and these
messages will appear in the output.
</p>
</li><li>
If a line contains after leading whitespaces the sequence
<code>$quote</code>, the quoting character used for this input file is
changed to the first non-whitespace character following the
<code>$quote</code>. If no non-whitespace character is present before the
line ends quoting is disable.
<p>By default no quoting character is used. In this mode strings are
terminated with the first unescaped line break. If there is a
<code>$quote</code> sequence present newline need not be escaped. Instead a
string is terminated with the first unescaped appearance of the quote
character.
</p>
<p>A common usage of this feature would be to set the quote character to
<code>"</code>. Then any appearance of the <code>"</code> in the strings must
be escaped using the backslash (i.e., <code>\"</code> must be written).
</p>
</li><li>
Any other line must start with a number or an alphanumeric identifier
(with the underscore character included). The following characters
(starting after the first whitespace character) will form the string
which gets associated with the currently selected set and the message
number represented by the number and identifier respectively.
<p>If the start of the line is a number the message number is obvious. It
is an error if the same message number already appeared for this set.
</p>
<p>If the leading token was an identifier the message number gets
automatically assigned. The value is the current maximum messages
number for this set plus one. It is an error if the identifier was
already used for a message in this set. It is OK to reuse the
identifier for a message in another thread. How to use the symbolic
identifiers will be explained below (see section <a href="#Common-Usage">How to use the <code>catgets</code> interface</a>). There is
one limitation with the identifier: it must not be <code>Set</code>. The
reason will be explained below.
</p>
<p>The text of the messages can contain escape characters. The usual bunch
of characters known from the ISO C language are recognized
(<code>\n</code>, <code>\t</code>, <code>\v</code>, <code>\b</code>, <code>\r</code>, <code>\f</code>,
<code>\\</code>, and <code>\<var>nnn</var></code>, where <var>nnn</var> is the octal coding of
a character code).
</p></li></ul>
<p><strong>Important:</strong> The handling of identifiers instead of numbers for
the set and messages is a GNU extension. Systems strictly following the
X/Open specification do not have this feature. An example for a message
catalog file is this:
</p>
<table><tr><td> </td><td><pre class="smallexample">$ This is a leading comment.
$quote "
$set SetOne
1 Message with ID 1.
two " Message with ID \"two\", which gets the value 2 assigned"
$set SetTwo
$ Since the last set got the number 1 assigned this set has number 2.
4000 "The numbers can be arbitrary, they need not start at one."
</pre></td></tr></table>
<p>This small example shows various aspects:
</p><ul>
<li>
Lines 1 and 9 are comments since they start with <code>$</code> followed by
a whitespace.
</li><li>
The quoting character is set to <code>"</code>. Otherwise the quotes in the
message definition would have to be left away and in this case the
message with the identifier <code>two</code> would loose its leading whitespace.
</li><li>
Mixing numbered messages with message having symbolic names is no
problem and the numbering happens automatically.
</li></ul>
<p>While this file format is pretty easy it is not the best possible for
use in a running program. The <code>catopen</code> function would have to
parser the file and handle syntactic errors gracefully. This is not so
easy and the whole process is pretty slow. Therefore the <code>catgets</code>
functions expect the data in another more compact and ready-to-use file
format. There is a special program <code>gencat</code> which is explained in
detail in the next section.
</p>
<p>Files in this other format are not human readable. To be easy to use by
programs it is a binary file. But the format is byte order independent
so translation files can be shared by systems of arbitrary architecture
(as long as they use the GNU C Library).
</p>
<p>Details about the binary file format are not important to know since
these files are always created by the <code>gencat</code> program. The
sources of the GNU C Library also provide the sources for the
<code>gencat</code> program and so the interested reader can look through
these source files to learn about the file format.
</p>
<hr size="6">
<a name="The-gencat-program"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#The-message-catalog-files" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Common-Usage" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Generate-Message-Catalogs-files"></a>
<h3 class="subsection">8.1.3 Generate Message Catalogs files</h3>
<a name="index-gencat"></a>
<p>The <code>gencat</code> program is specified in the X/Open standard and the
GNU implementation follows this specification and so processes
all correctly formed input files. Additionally some extension are
implemented which help to work in a more reasonable way with the
<code>catgets</code> functions.
</p>
<p>The <code>gencat</code> program can be invoked in two ways:
</p>
<table><tr><td> </td><td><pre class="example">`gencat [<var>Option</var>]… [<var>Output-File</var> [<var>Input-File</var>]…]`
</pre></td></tr></table>
<p>This is the interface defined in the X/Open standard. If no
<var>Input-File</var> parameter is given input will be read from standard
input. Multiple input files will be read as if they are concatenated.
If <var>Output-File</var> is also missing, the output will be written to
standard output. To provide the interface one is used to from other
programs a second interface is provided.
</p>
<table><tr><td> </td><td><pre class="smallexample">`gencat [<var>Option</var>]… -o <var>Output-File</var> [<var>Input-File</var>]…`
</pre></td></tr></table>
<p>The option ‘<samp>-o</samp>’ is used to specify the output file and all file
arguments are used as input files.
</p>
<p>Beside this one can use ‘<tt>-</tt>’ or ‘<tt>/dev/stdin</tt>’ for
<var>Input-File</var> to denote the standard input. Corresponding one can
use ‘<tt>-</tt>’ and ‘<tt>/dev/stdout</tt>’ for <var>Output-File</var> to denote
standard output. Using ‘<tt>-</tt>’ as a file name is allowed in X/Open
while using the device names is a GNU extension.
</p>
<p>The <code>gencat</code> program works by concatenating all input files and
then <strong>merge</strong> the resulting collection of message sets with a
possibly existing output file. This is done by removing all messages
with set/message number tuples matching any of the generated messages
from the output file and then adding all the new messages. To
regenerate a catalog file while ignoring the old contents therefore
requires to remove the output file if it exists. If the output is
written to standard output no merging takes place.
</p>
<p>The following table shows the options understood by the <code>gencat</code>
program. The X/Open standard does not specify any option for the
program so all of these are GNU extensions.
</p>
<dl compact="compact">
<dt> ‘<samp>-V</samp>’</dt>
<dt> ‘<samp>--version</samp>’</dt>
<dd><p>Print the version information and exit.
</p></dd>
<dt> ‘<samp>-h</samp>’</dt>
<dt> ‘<samp>--help</samp>’</dt>
<dd><p>Print a usage message listing all available options, then exit successfully.
</p></dd>
<dt> ‘<samp>--new</samp>’</dt>
<dd><p>Do never merge the new messages from the input files with the old content
of the output files. The old content of the output file is discarded.
</p></dd>
<dt> ‘<samp>-H</samp>’</dt>
<dt> ‘<samp>--header=name</samp>’</dt>
<dd><p>This option is used to emit the symbolic names given to sets and
messages in the input files for use in the program. Details about how
to use this are given in the next section. The <var>name</var> parameter to
this option specifies the name of the output file. It will contain a
number of C preprocessor <code>#define</code>s to associate a name with a
number.
</p>
<p>Please note that the generated file only contains the symbols from the
input files. If the output is merged with the previous content of the
output file the possibly existing symbols from the file(s) which
generated the old output files are not in the generated header file.
</p></dd>
</dl>
<hr size="6">
<a name="Common-Usage"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#The-gencat-program" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Not-using-symbolic-names" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-a-la-X_002fOpen" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="How-to-use-the-catgets-interface"></a>
<h3 class="subsection">8.1.4 How to use the <code>catgets</code> interface</h3>
<p>The <code>catgets</code> functions can be used in two different ways. By
following slavishly the X/Open specs and not relying on the extension
and by using the GNU extensions. We will take a look at the former
method first to understand the benefits of extensions.
</p>
<hr size="6">
<a name="Not-using-symbolic-names"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Common-Usage" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Using-symbolic-names" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Common-Usage" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<h4 class="subsubsection">8.1.4.1 Not using symbolic names</h4>
<p>Since the X/Open format of the message catalog files does not allow
symbol names we have to work with numbers all the time. When we start
writing a program we have to replace all appearances of translatable
strings with something like
</p>
<table><tr><td> </td><td><pre class="smallexample">catgets (catdesc, set, msg, "string")
</pre></td></tr></table>
<p><var>catgets</var> is retrieved from a call to <code>catopen</code> which is
normally done once at the program start. The <code>"string"</code> is the
string we want to translate. The problems start with the set and
message numbers.
</p>
<p>In a bigger program several programmers usually work at the same time on
the program and so coordinating the number allocation is crucial.
Though no two different strings must be indexed by the same tuple of
numbers it is highly desirable to reuse the numbers for equal strings
with equal translations (please note that there might be strings which
are equal in one language but have different translations due to
difference contexts).
</p>
<p>The allocation process can be relaxed a bit by different set numbers for
different parts of the program. So the number of developers who have to
coordinate the allocation can be reduced. But still lists must be keep
track of the allocation and errors can easily happen. These errors
cannot be discovered by the compiler or the <code>catgets</code> functions.
Only the user of the program might see wrong messages printed. In the
worst cases the messages are so irritating that they cannot be
recognized as wrong. Think about the translations for <code>"true"</code> and
<code>"false"</code> being exchanged. This could result in a disaster.
</p>
<hr size="6">
<a name="Using-symbolic-names"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Not-using-symbolic-names" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#How-does-to-this-allow-to-develop" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Common-Usage" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<h4 class="subsubsection">8.1.4.2 Using symbolic names</h4>
<p>The problems mentioned in the last section derive from the fact that:
</p>
<ol>
<li>
the numbers are allocated once and due to the possibly frequent use of
them it is difficult to change a number later.
</li><li>
the numbers do not allow to guess anything about the string and
therefore collisions can easily happen.
</li></ol>
<p>By constantly using symbolic names and by providing a method which maps
the string content to a symbolic name (however this will happen) one can
prevent both problems above. The cost of this is that the programmer
has to write a complete message catalog file while s/he is writing the
program itself.
</p>
<p>This is necessary since the symbolic names must be mapped to numbers
before the program sources can be compiled. In the last section it was
described how to generate a header containing the mapping of the names.
E.g., for the example message file given in the last section we could
call the <code>gencat</code> program as follow (assume ‘<tt>ex.msg</tt>’ contains
the sources).
</p>
<table><tr><td> </td><td><pre class="smallexample">gencat -H ex.h -o ex.cat ex.msg
</pre></td></tr></table>
<p>This generates a header file with the following content:
</p>
<table><tr><td> </td><td><pre class="smallexample">#define SetTwoSet 0x2 /* ex.msg:8 */
#define SetOneSet 0x1 /* ex.msg:4 */
#define SetOnetwo 0x2 /* ex.msg:6 */
</pre></td></tr></table>
<p>As can be seen the various symbols given in the source file are mangled
to generate unique identifiers and these identifiers get numbers
assigned. Reading the source file and knowing about the rules will
allow to predict the content of the header file (it is deterministic)
but this is not necessary. The <code>gencat</code> program can take care for
everything. All the programmer has to do is to put the generated header
file in the dependency list of the source files of her/his project and
to add a rules to regenerate the header of any of the input files
change.
</p>
<p>One word about the symbol mangling. Every symbol consists of two parts:
the name of the message set plus the name of the message or the special
string <code>Set</code>. So <code>SetOnetwo</code> means this macro can be used to
access the translation with identifier <code>two</code> in the message set
<code>SetOne</code>.
</p>
<p>The other names denote the names of the message sets. The special
string <code>Set</code> is used in the place of the message identifier.
</p>
<p>If in the code the second string of the set <code>SetOne</code> is used the C
code should look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">catgets (catdesc, SetOneSet, SetOnetwo,
" Message with ID \"two\", which gets the value 2 assigned")
</pre></td></tr></table>
<p>Writing the function this way will allow to change the message number
and even the set number without requiring any change in the C source
code. (The text of the string is normally not the same; this is only
for this example.)
</p>
<hr size="6">
<a name="How-does-to-this-allow-to-develop"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Using-symbolic-names" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#The-Uniforum-approach" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Common-Usage" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<h4 class="subsubsection">8.1.4.3 How does to this allow to develop</h4>
<p>To illustrate the usual way to work with the symbolic version numbers
here is a little example. Assume we want to write the very complex and
famous greeting program. We start by writing the code as usual:
</p>
<table><tr><td> </td><td><pre class="smallexample">#include <stdio.h>
int
main (void)
{
printf ("Hello, world!\n");
return 0;
}
</pre></td></tr></table>
<p>Now we want to internationalize the message and therefore replace the
message with whatever the user wants.
</p>
<table><tr><td> </td><td><pre class="smallexample">#include <nl_types.h>
#include <stdio.h>
#include "msgnrs.h"
int
main (void)
{
nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
printf (catgets (catdesc, SetMainSet, SetMainHello,
"Hello, world!\n"));
catclose (catdesc);
return 0;
}
</pre></td></tr></table>
<p>We see how the catalog object is opened and the returned descriptor used
in the other function calls. It is not really necessary to check for
failure of any of the functions since even in these situations the
functions will behave reasonable. They simply will be return a
translation.
</p>
<p>What remains unspecified here are the constants <code>SetMainSet</code> and
<code>SetMainHello</code>. These are the symbolic names describing the
message. To get the actual definitions which match the information in
the catalog file we have to create the message catalog source file and
process it using the <code>gencat</code> program.
</p>
<table><tr><td> </td><td><pre class="smallexample">$ Messages for the famous greeting program.
$quote "
$set Main
Hello "Hallo, Welt!\n"
</pre></td></tr></table>
<p>Now we can start building the program (assume the message catalog source
file is named ‘<tt>hello.msg</tt>’ and the program source file ‘<tt>hello.c</tt>’):
</p>
<table><tr><td> </td><td><table class="cartouche" border="1"><tr><td>
<pre class="smallexample">% gencat -H msgnrs.h -o hello.cat hello.msg
% cat msgnrs.h
#define MainSet 0x1 /* hello.msg:4 */
#define MainHello 0x1 /* hello.msg:5 */
% gcc -o hello hello.c -I.
% cp hello.cat /usr/share/locale/de/LC_MESSAGES
% echo $LC_ALL
de
% ./hello
Hallo, Welt!
%
</pre></td></tr></table>
</td></tr></table>
<p>The call of the <code>gencat</code> program creates the missing header file
‘<tt>msgnrs.h</tt>’ as well as the message catalog binary. The former is
used in the compilation of ‘<tt>hello.c</tt>’ while the later is placed in a
directory in which the <code>catopen</code> function will try to locate it.
Please check the <code>LC_ALL</code> environment variable and the default path
for <code>catopen</code> presented in the description above.
</p>
<hr size="6">
<a name="The-Uniforum-approach"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#How-does-to-this-allow-to-develop" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="The-Uniforum-approach-to-Message-Translation"></a>
<h2 class="section">8.2 The Uniforum approach to Message Translation</h2>
<p>Sun Microsystems tried to standardize a different approach to message
translation in the Uniforum group. There never was a real standard
defined but still the interface was used in Sun’s operation systems.
Since this approach fits better in the development process of free
software it is also used throughout the GNU project and the GNU
‘<tt>gettext</tt>’ package provides support for this outside the GNU C
Library.
</p>
<p>The code of the ‘<tt>libintl</tt>’ from GNU ‘<tt>gettext</tt>’ is the same as
the code in the GNU C Library. So the documentation in the GNU
‘<tt>gettext</tt>’ manual is also valid for the functionality here. The
following text will describe the library functions in detail. But the
numerous helper programs are not described in this manual. Instead
people should read the GNU ‘<tt>gettext</tt>’ manual
(see <a href="../gettext/index.html#Top">(gettext)Top</a> section ‘GNU gettext utilities’ in <cite>Native Language Support Library and Tools</cite>).
We will only give a short overview.
</p>
<p>Though the <code>catgets</code> functions are available by default on more
systems the <code>gettext</code> interface is at least as portable as the
former. The GNU ‘<tt>gettext</tt>’ package can be used wherever the
functions are not available.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top"><a href="#Message-catalogs-with-gettext">8.2.1 The <code>gettext</code> family of functions</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#Helper-programs-for-gettext">8.2.2 Programs to handle message catalogs for <code>gettext</code></a></td><td> </td><td align="left" valign="top"></td></tr>
</table>
<hr size="6">
<a name="Message-catalogs-with-gettext"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#The-Uniforum-approach" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Translation-with-gettext" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#The-Uniforum-approach" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="The-gettext-family-of-functions"></a>
<h3 class="subsection">8.2.1 The <code>gettext</code> family of functions</h3>
<p>The paradigms underlying the <code>gettext</code> approach to message
translations is different from that of the <code>catgets</code> functions the
basic functionally is equivalent. There are functions of the following
categories:
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top"><a href="#Translation-with-gettext">8.2.1.1 What has to be done to translate a message?</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#Locating-gettext-catalog">8.2.1.2 How to determine which catalog to be used</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#Advanced-gettext-functions">8.2.1.3 Additional functions for more complicated situations</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#Charset-conversion-in-gettext">8.2.1.4 How to specify the output character set <code>gettext</code> uses</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#GUI-program-problems">8.2.1.5 How to use <code>gettext</code> in GUI programs</a></td><td> </td><td align="left" valign="top"></td></tr>
<tr><td align="left" valign="top"><a href="#Using-gettextized-software">8.2.1.6 User influence on <code>gettext</code></a></td><td> </td><td align="left" valign="top"> The possibilities of the user to influence
the way <code>gettext</code> works.
</td></tr>
</table>
<hr size="6">
<a name="Translation-with-gettext"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Locating-gettext-catalog" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="What-has-to-be-done-to-translate-a-message_003f"></a>
<h4 class="subsubsection">8.2.1.1 What has to be done to translate a message?</h4>
<p>The <code>gettext</code> functions have a very simple interface. The most
basic function just takes the string which shall be translated as the
argument and it returns the translation. This is fundamentally
different from the <code>catgets</code> approach where an extra key is
necessary and the original string is only used for the error case.
</p>
<p>If the string which has to be translated is the only argument this of
course means the string itself is the key. I.e., the translation will
be selected based on the original string. The message catalogs must
therefore contain the original strings plus one translation for any such
string. The task of the <code>gettext</code> function is it to compare the
argument string with the available strings in the catalog and return the
appropriate translation. Of course this process is optimized so that
this process is not more expensive than an access using an atomic key
like in <code>catgets</code>.
</p>
<p>The <code>gettext</code> approach has some advantages but also some
disadvantages. Please see the GNU ‘<tt>gettext</tt>’ manual for a detailed
discussion of the pros and cons.
</p>
<p>All the definitions and declarations for <code>gettext</code> can be found in
the ‘<tt>libintl.h</tt>’ header file. On systems where these functions are
not part of the C library they can be found in a separate library named
‘<tt>libintl.a</tt>’ (or accordingly different for shared libraries).
</p>
<dl>
<dt><a name="index-gettext"></a><u>Function:</u> char * <b>gettext</b><i> (const char *<var>msgid</var>)</i></dt>
<dd><p>The <code>gettext</code> function searches the currently selected message
catalogs for a string which is equal to <var>msgid</var>. If there is such a
string available it is returned. Otherwise the argument string
<var>msgid</var> is returned.
</p>
<p>Please note that all though the return value is <code>char *</code> the
returned string must not be changed. This broken type results from the
history of the function and does not reflect the way the function should
be used.
</p>
<p>Please note that above we wrote “message catalogs” (plural). This is
a specialty of the GNU implementation of these functions and we will
say more about this when we talk about the ways message catalogs are
selected (see section <a href="#Locating-gettext-catalog">How to determine which catalog to be used</a>).
</p>
<p>The <code>gettext</code> function does not modify the value of the global
<var>errno</var> variable. This is necessary to make it possible to write
something like
</p>
<table><tr><td> </td><td><pre class="smallexample"> printf (gettext ("Operation failed: %m\n"));
</pre></td></tr></table>
<p>Here the <var>errno</var> value is used in the <code>printf</code> function while
processing the <code>%m</code> format element and if the <code>gettext</code>
function would change this value (it is called before <code>printf</code> is
called) we would get a wrong message.
</p>
<p>So there is no easy way to detect a missing message catalog beside
comparing the argument string with the result. But it is normally the
task of the user to react on missing catalogs. The program cannot guess
when a message catalog is really necessary since for a user who speaks
the language the program was developed in does not need any translation.
</p></dd></dl>
<p>The remaining two functions to access the message catalog add some
functionality to select a message catalog which is not the default one.
This is important if parts of the program are developed independently.
Every part can have its own message catalog and all of them can be used
at the same time. The C library itself is an example: internally it
uses the <code>gettext</code> functions but since it must not depend on a
currently selected default message catalog it must specify all ambiguous
information.
</p>
<dl>
<dt><a name="index-dgettext"></a><u>Function:</u> char * <b>dgettext</b><i> (const char *<var>domainname</var>, const char *<var>msgid</var>)</i></dt>
<dd><p>The <code>dgettext</code> functions acts just like the <code>gettext</code>
function. It only takes an additional first argument <var>domainname</var>
which guides the selection of the message catalogs which are searched
for the translation. If the <var>domainname</var> parameter is the null
pointer the <code>dgettext</code> function is exactly equivalent to
<code>gettext</code> since the default value for the domain name is used.
</p>
<p>As for <code>gettext</code> the return value type is <code>char *</code> which is an
anachronism. The returned string must never be modified.
</p></dd></dl>
<dl>
<dt><a name="index-dcgettext"></a><u>Function:</u> char * <b>dcgettext</b><i> (const char *<var>domainname</var>, const char *<var>msgid</var>, int <var>category</var>)</i></dt>
<dd><p>The <code>dcgettext</code> adds another argument to those which
<code>dgettext</code> takes. This argument <var>category</var> specifies the last
piece of information needed to localize the message catalog. I.e., the
domain name and the locale category exactly specify which message
catalog has to be used (relative to a given directory, see below).
</p>
<p>The <code>dgettext</code> function can be expressed in terms of
<code>dcgettext</code> by using
</p>
<table><tr><td> </td><td><pre class="smallexample">dcgettext (domain, string, LC_MESSAGES)
</pre></td></tr></table>
<p>instead of
</p>
<table><tr><td> </td><td><pre class="smallexample">dgettext (domain, string)
</pre></td></tr></table>
<p>This also shows which values are expected for the third parameter. One
has to use the available selectors for the categories available in
‘<tt>locale.h</tt>’. Normally the available values are <code>LC_CTYPE</code>,
<code>LC_COLLATE</code>, <code>LC_MESSAGES</code>, <code>LC_MONETARY</code>,
<code>LC_NUMERIC</code>, and <code>LC_TIME</code>. Please note that <code>LC_ALL</code>
must not be used and even though the names might suggest this, there is
no relation to the environments variables of this name.
</p>
<p>The <code>dcgettext</code> function is only implemented for compatibility with
other systems which have <code>gettext</code> functions. There is not really
any situation where it is necessary (or useful) to use a different value
but <code>LC_MESSAGES</code> in for the <var>category</var> parameter. We are
dealing with messages here and any other choice can only be irritating.
</p>
<p>As for <code>gettext</code> the return value type is <code>char *</code> which is an
anachronism. The returned string must never be modified.
</p></dd></dl>
<p>When using the three functions above in a program it is a frequent case
that the <var>msgid</var> argument is a constant string. So it is worth to
optimize this case. Thinking shortly about this one will realize that
as long as no new message catalog is loaded the translation of a message
will not change. This optimization is actually implemented by the
<code>gettext</code>, <code>dgettext</code> and <code>dcgettext</code> functions.
</p>
<hr size="6">
<a name="Locating-gettext-catalog"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Translation-with-gettext" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Advanced-gettext-functions" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="How-to-determine-which-catalog-to-be-used"></a>
<h4 class="subsubsection">8.2.1.2 How to determine which catalog to be used</h4>
<p>The functions to retrieve the translations for a given message have a
remarkable simple interface. But to provide the user of the program
still the opportunity to select exactly the translation s/he wants and
also to provide the programmer the possibility to influence the way to
locate the search for catalogs files there is a quite complicated
underlying mechanism which controls all this. The code is complicated
the use is easy.
</p>
<p>Basically we have two different tasks to perform which can also be
performed by the <code>catgets</code> functions:
</p>
<ol>
<li>
Locate the set of message catalogs. There are a number of files for
different languages and which all belong to the package. Usually they
are all stored in the filesystem below a certain directory.
<p>There can be arbitrary many packages installed and they can follow
different guidelines for the placement of their files.
</p>
</li><li>
Relative to the location specified by the package the actual translation
files must be searched, based on the wishes of the user. I.e., for each
language the user selects the program should be able to locate the
appropriate file.
</li></ol>
<p>This is the functionality required by the specifications for
<code>gettext</code> and this is also what the <code>catgets</code> functions are
able to do. But there are some problems unresolved:
</p>
<ul>
<li>
The language to be used can be specified in several different ways.
There is no generally accepted standard for this and the user always
expects the program understand what s/he means. E.g., to select the
German translation one could write <code>de</code>, <code>german</code>, or
<code>deutsch</code> and the program should always react the same.
</li><li>
Sometimes the specification of the user is too detailed. If s/he, e.g.,
specifies <code>de_DE.ISO-8859-1</code> which means German, spoken in Germany,
coded using the ISO 8859-1 character set there is the possibility
that a message catalog matching this exactly is not available. But
there could be a catalog matching <code>de</code> and if the character set
used on the machine is always ISO 8859-1 there is no reason why this
later message catalog should not be used. (We call this <em>message
inheritance</em>.)
</li><li>
If a catalog for a wanted language is not available it is not always the
second best choice to fall back on the language of the developer and
simply not translate any message. Instead a user might be better able
to read the messages in another language and so the user of the program
should be able to define an precedence order of languages.
</li></ul>
<p>We can divide the configuration actions in two parts: the one is
performed by the programmer, the other by the user. We will start with
the functions the programmer can use since the user configuration will
be based on this.
</p>
<p>As the functions described in the last sections already mention separate
sets of messages can be selected by a <em>domain name</em>. This is a
simple string which should be unique for each program part with uses a
separate domain. It is possible to use in one program arbitrary many
domains at the same time. E.g., the GNU C Library itself uses a domain
named <code>libc</code> while the program using the C Library could use a
domain named <code>foo</code>. The important point is that at any time
exactly one domain is active. This is controlled with the following
function.
</p>
<dl>
<dt><a name="index-textdomain"></a><u>Function:</u> char * <b>textdomain</b><i> (const char *<var>domainname</var>)</i></dt>
<dd><p>The <code>textdomain</code> function sets the default domain, which is used in
all future <code>gettext</code> calls, to <var>domainname</var>. Please note that
<code>dgettext</code> and <code>dcgettext</code> calls are not influenced if the
<var>domainname</var> parameter of these functions is not the null pointer.
</p>
<p>Before the first call to <code>textdomain</code> the default domain is
<code>messages</code>. This is the name specified in the specification of
the <code>gettext</code> API. This name is as good as any other name. No
program should ever really use a domain with this name since this can
only lead to problems.
</p>
<p>The function returns the value which is from now on taken as the default
domain. If the system went out of memory the returned value is
<code>NULL</code> and the global variable <var>errno</var> is set to <code>ENOMEM</code>.
Despite the return value type being <code>char *</code> the return string must
not be changed. It is allocated internally by the <code>textdomain</code>
function.
</p>
<p>If the <var>domainname</var> parameter is the null pointer no new default
domain is set. Instead the currently selected default domain is
returned.
</p>
<p>If the <var>domainname</var> parameter is the empty string the default domain
is reset to its initial value, the domain with the name <code>messages</code>.
This possibility is questionable to use since the domain <code>messages</code>
really never should be used.
</p></dd></dl>
<dl>
<dt><a name="index-bindtextdomain"></a><u>Function:</u> char * <b>bindtextdomain</b><i> (const char *<var>domainname</var>, const char *<var>dirname</var>)</i></dt>
<dd><p>The <code>bindtextdomain</code> function can be used to specify the directory
which contains the message catalogs for domain <var>domainname</var> for the
different languages. To be correct, this is the directory where the
hierarchy of directories is expected. Details are explained below.
</p>
<p>For the programmer it is important to note that the translations which
come with the program have be placed in a directory hierarchy starting
at, say, ‘<tt>/foo/bar</tt>’. Then the program should make a
<code>bindtextdomain</code> call to bind the domain for the current program to
this directory. So it is made sure the catalogs are found. A correctly
running program does not depend on the user setting an environment
variable.
</p>
<p>The <code>bindtextdomain</code> function can be used several times and if the
<var>domainname</var> argument is different the previously bound domains
will not be overwritten.
</p>
<p>If the program which wish to use <code>bindtextdomain</code> at some point of
time use the <code>chdir</code> function to change the current working
directory it is important that the <var>dirname</var> strings ought to be an
absolute pathname. Otherwise the addressed directory might vary with
the time.
</p>
<p>If the <var>dirname</var> parameter is the null pointer <code>bindtextdomain</code>
returns the currently selected directory for the domain with the name
<var>domainname</var>.
</p>
<p>The <code>bindtextdomain</code> function returns a pointer to a string
containing the name of the selected directory name. The string is
allocated internally in the function and must not be changed by the
user. If the system went out of core during the execution of
<code>bindtextdomain</code> the return value is <code>NULL</code> and the global
variable <var>errno</var> is set accordingly.
</p></dd></dl>
<hr size="6">
<a name="Advanced-gettext-functions"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Locating-gettext-catalog" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Charset-conversion-in-gettext" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Additional-functions-for-more-complicated-situations"></a>
<h4 class="subsubsection">8.2.1.3 Additional functions for more complicated situations</h4>
<p>The functions of the <code>gettext</code> family described so far (and all the
<code>catgets</code> functions as well) have one problem in the real world
which have been neglected completely in all existing approaches. What
is meant here is the handling of plural forms.
</p>
<p>Looking through Unix source code before the time anybody thought about
internationalization (and, sadly, even afterwards) one can often find
code similar to the following:
</p>
<table><tr><td> </td><td><pre class="smallexample"> printf ("%d file%s deleted", n, n == 1 ? "" : "s");
</pre></td></tr></table>
<p>After the first complaints from people internationalizing the code people
either completely avoided formulations like this or used strings like
<code>"file(s)"</code>. Both look unnatural and should be avoided. First
tries to solve the problem correctly looked like this:
</p>
<table><tr><td> </td><td><pre class="smallexample"> if (n == 1)
printf ("%d file deleted", n);
else
printf ("%d files deleted", n);
</pre></td></tr></table>
<p>But this does not solve the problem. It helps languages where the
plural form of a noun is not simply constructed by adding an ‘s’ but
that is all. Once again people fell into the trap of believing the
rules their language is using are universal. But the handling of plural
forms differs widely between the language families. There are two
things we can differ between (and even inside language families);
</p>
<ul>
<li>
The form how plural forms are build differs. This is a problem with
language which have many irregularities. German, for instance, is a
drastic case. Though English and German are part of the same language
family (Germanic), the almost regular forming of plural noun forms
(appending an ‘s’) is hardly found in German.
</li><li>
The number of plural forms differ. This is somewhat surprising for
those who only have experiences with Romanic and Germanic languages
since here the number is the same (there are two).
<p>But other language families have only one form or many forms. More
information on this in an extra section.
</p></li></ul>
<p>The consequence of this is that application writers should not try to
solve the problem in their code. This would be localization since it is
only usable for certain, hardcoded language environments. Instead the
extended <code>gettext</code> interface should be used.
</p>
<p>These extra functions are taking instead of the one key string two
strings and an numerical argument. The idea behind this is that using
the numerical argument and the first string as a key, the implementation
can select using rules specified by the translator the right plural
form. The two string arguments then will be used to provide a return
value in case no message catalog is found (similar to the normal
<code>gettext</code> behavior). In this case the rules for Germanic language
is used and it is assumed that the first string argument is the singular
form, the second the plural form.
</p>
<p>This has the consequence that programs without language catalogs can
display the correct strings only if the program itself is written using
a Germanic language. This is a limitation but since the GNU C library
(as well as the GNU <code>gettext</code> package) are written as part of the
GNU package and the coding standards for the GNU project require program
being written in English, this solution nevertheless fulfills its
purpose.
</p>
<dl>
<dt><a name="index-ngettext"></a><u>Function:</u> char * <b>ngettext</b><i> (const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>)</i></dt>
<dd><p>The <code>ngettext</code> function is similar to the <code>gettext</code> function
as it finds the message catalogs in the same way. But it takes two
extra arguments. The <var>msgid1</var> parameter must contain the singular
form of the string to be converted. It is also used as the key for the
search in the catalog. The <var>msgid2</var> parameter is the plural form.
The parameter <var>n</var> is used to determine the plural form. If no
message catalog is found <var>msgid1</var> is returned if <code>n == 1</code>,
otherwise <code>msgid2</code>.
</p>
<p>An example for the us of this function is:
</p>
<table><tr><td> </td><td><pre class="smallexample"> printf (ngettext ("%d file removed", "%d files removed", n), n);
</pre></td></tr></table>
<p>Please note that the numeric value <var>n</var> has to be passed to the
<code>printf</code> function as well. It is not sufficient to pass it only to
<code>ngettext</code>.
</p></dd></dl>
<dl>
<dt><a name="index-dngettext"></a><u>Function:</u> char * <b>dngettext</b><i> (const char *<var>domain</var>, const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>)</i></dt>
<dd><p>The <code>dngettext</code> is similar to the <code>dgettext</code> function in the
way the message catalog is selected. The difference is that it takes
two extra parameter to provide the correct plural form. These two
parameters are handled in the same way <code>ngettext</code> handles them.
</p></dd></dl>
<dl>
<dt><a name="index-dcngettext"></a><u>Function:</u> char * <b>dcngettext</b><i> (const char *<var>domain</var>, const char *<var>msgid1</var>, const char *<var>msgid2</var>, unsigned long int <var>n</var>, int <var>category</var>)</i></dt>
<dd><p>The <code>dcngettext</code> is similar to the <code>dcgettext</code> function in the
way the message catalog is selected. The difference is that it takes
two extra parameter to provide the correct plural form. These two
parameters are handled in the same way <code>ngettext</code> handles them.
</p></dd></dl>
<a name="The-problem-of-plural-forms"></a>
<h4 class="subsubheading">The problem of plural forms</h4>
<p>A description of the problem can be found at the beginning of the last
section. Now there is the question how to solve it. Without the input
of linguists (which was not available) it was not possible to determine
whether there are only a few different forms in which plural forms are
formed or whether the number can increase with every new supported
language.
</p>
<p>Therefore the solution implemented is to allow the translator to specify
the rules of how to select the plural form. Since the formula varies
with every language this is the only viable solution except for
hardcoding the information in the code (which still would require the
possibility of extensions to not prevent the use of new languages). The
details are explained in the GNU <code>gettext</code> manual. Here only a
bit of information is provided.
</p>
<p>The information about the plural form selection has to be stored in the
header entry (the one with the empty (<code>msgid</code> string). It looks
like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
</pre></td></tr></table>
<p>The <code>nplurals</code> value must be a decimal number which specifies how
many different plural forms exist for this language. The string
following <code>plural</code> is an expression which is using the C language
syntax. Exceptions are that no negative number are allowed, numbers
must be decimal, and the only variable allowed is <code>n</code>. This
expression will be evaluated whenever one of the functions
<code>ngettext</code>, <code>dngettext</code>, or <code>dcngettext</code> is called. The
numeric value passed to these functions is then substituted for all uses
of the variable <code>n</code> in the expression. The resulting value then
must be greater or equal to zero and smaller than the value given as the
value of <code>nplurals</code>.
</p>
<p>The following rules are known at this point. The language with families
are listed. But this does not necessarily mean the information can be
generalized for the whole family (as can be easily seen in the table
below).<a name="DOCF1" href="libc_fot.html#FOOT1">(1)</a>
</p>
<dl compact="compact">
<dt> Only one form:</dt>
<dd><p>Some languages only require one single form. There is no distinction
between the singular and plural form. An appropriate header entry
would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=1; plural=0;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Finno-Ugric family</dt>
<dd><p>Hungarian
</p></dd>
<dt> Asian family</dt>
<dd><p>Japanese, Korean
</p></dd>
<dt> Turkic/Altaic family</dt>
<dd><p>Turkish
</p></dd>
</dl>
</dd>
<dt> Two forms, singular used for one only</dt>
<dd><p>This is the form used in most existing programs since it is what English
is using. A header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n != 1;
</pre></td></tr></table>
<p>(Note: this uses the feature of C expressions that boolean expressions
have to value zero or one.)
</p>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Germanic family</dt>
<dd><p>Danish, Dutch, English, German, Norwegian, Swedish
</p></dd>
<dt> Finno-Ugric family</dt>
<dd><p>Estonian, Finnish
</p></dd>
<dt> Latin/Greek family</dt>
<dd><p>Greek
</p></dd>
<dt> Semitic family</dt>
<dd><p>Hebrew
</p></dd>
<dt> Romance family</dt>
<dd><p>Italian, Portuguese, Spanish
</p></dd>
<dt> Artificial</dt>
<dd><p>Esperanto
</p></dd>
</dl>
</dd>
<dt> Two forms, singular used for zero and one</dt>
<dd><p>Exceptional case in the language family. The header entry would be:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=2; plural=n>1;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Romanic family</dt>
<dd><p>French, Brazilian Portuguese
</p></dd>
</dl>
</dd>
<dt> Three forms, special case for zero</dt>
<dd><p>The header entry would be:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Baltic family</dt>
<dd><p>Latvian
</p></dd>
</dl>
</dd>
<dt> Three forms, special cases for one and two</dt>
<dd><p>The header entry would be:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Celtic</dt>
<dd><p>Gaeilge (Irish)
</p></dd>
</dl>
</dd>
<dt> Three forms, special case for numbers ending in 1[2-9]</dt>
<dd><p>The header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
plural=n%10==1 && n%100!=11 ? 0 : \
n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Baltic family</dt>
<dd><p>Lithuanian
</p></dd>
</dl>
</dd>
<dt> Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]</dt>
<dd><p>The header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Slavic family</dt>
<dd><p>Croatian, Czech, Russian, Ukrainian
</p></dd>
</dl>
</dd>
<dt> Three forms, special cases for 1 and 2, 3, 4</dt>
<dd><p>The header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Slavic family</dt>
<dd><p>Slovak
</p></dd>
</dl>
</dd>
<dt> Three forms, special case for one and some numbers ending in 2, 3, or 4</dt>
<dd><p>The header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=3; \
plural=n==1 ? 0 : \
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Slavic family</dt>
<dd><p>Polish
</p></dd>
</dl>
</dd>
<dt> Four forms, special case for one and all numbers ending in 02, 03, or 04</dt>
<dd><p>The header entry would look like this:
</p>
<table><tr><td> </td><td><pre class="smallexample">Plural-Forms: nplurals=4; \
plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
</pre></td></tr></table>
<p>Languages with this property include:
</p>
<dl compact="compact">
<dt> Slavic family</dt>
<dd><p>Slovenian
</p></dd>
</dl>
</dd>
</dl>
<hr size="6">
<a name="Charset-conversion-in-gettext"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Advanced-gettext-functions" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#GUI-program-problems" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="How-to-specify-the-output-character-set-gettext-uses"></a>
<h4 class="subsubsection">8.2.1.4 How to specify the output character set <code>gettext</code> uses</h4>
<p><code>gettext</code> not only looks up a translation in a message catalog. It
also converts the translation on the fly to the desired output character
set. This is useful if the user is working in a different character set
than the translator who created the message catalog, because it avoids
distributing variants of message catalogs which differ only in the
character set.
</p>
<p>The output character set is, by default, the value of <code>nl_langinfo
(CODESET)</code>, which depends on the <code>LC_CTYPE</code> part of the current
locale. But programs which store strings in a locale independent way
(e.g. UTF-8) can request that <code>gettext</code> and related functions
return the translations in that encoding, by use of the
<code>bind_textdomain_codeset</code> function.
</p>
<p>Note that the <var>msgid</var> argument to <code>gettext</code> is not subject to
character set conversion. Also, when <code>gettext</code> does not find a
translation for <var>msgid</var>, it returns <var>msgid</var> unchanged –
independently of the current output character set. It is therefore
recommended that all <var>msgid</var>s be US-ASCII strings.
</p>
<dl>
<dt><a name="index-bind_005ftextdomain_005fcodeset"></a><u>Function:</u> char * <b>bind_textdomain_codeset</b><i> (const char *<var>domainname</var>, const char *<var>codeset</var>)</i></dt>
<dd><p>The <code>bind_textdomain_codeset</code> function can be used to specify the
output character set for message catalogs for domain <var>domainname</var>.
The <var>codeset</var> argument must be a valid codeset name which can be used
for the <code>iconv_open</code> function, or a null pointer.
</p>
<p>If the <var>codeset</var> parameter is the null pointer,
<code>bind_textdomain_codeset</code> returns the currently selected codeset
for the domain with the name <var>domainname</var>. It returns <code>NULL</code> if
no codeset has yet been selected.
</p>
<p>The <code>bind_textdomain_codeset</code> function can be used several times.
If used multiple times with the same <var>domainname</var> argument, the
later call overrides the settings made by the earlier one.
</p>
<p>The <code>bind_textdomain_codeset</code> function returns a pointer to a
string containing the name of the selected codeset. The string is
allocated internally in the function and must not be changed by the
user. If the system went out of core during the execution of
<code>bind_textdomain_codeset</code>, the return value is <code>NULL</code> and the
global variable <var>errno</var> is set accordingly. </p></dd></dl>
<hr size="6">
<a name="GUI-program-problems"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Charset-conversion-in-gettext" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Using-gettextized-software" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="How-to-use-gettext-in-GUI-programs"></a>
<h4 class="subsubsection">8.2.1.5 How to use <code>gettext</code> in GUI programs</h4>
<p>One place where the <code>gettext</code> functions, if used normally, have big
problems is within programs with graphical user interfaces (GUIs). The
problem is that many of the strings which have to be translated are very
short. They have to appear in pull-down menus which restricts the
length. But strings which are not containing entire sentences or at
least large fragments of a sentence may appear in more than one
situation in the program but might have different translations. This is
especially true for the one-word strings which are frequently used in
GUI programs.
</p>
<p>As a consequence many people say that the <code>gettext</code> approach is
wrong and instead <code>catgets</code> should be used which indeed does not
have this problem. But there is a very simple and powerful method to
handle these kind of problems with the <code>gettext</code> functions.
</p>
<p>As an example consider the following fictional situation. A GUI program
has a menu bar with the following entries:
</p>
<table><tr><td> </td><td><pre class="smallexample">+------------+------------+--------------------------------------+
| File | Printer | |
+------------+------------+--------------------------------------+
| Open | | Select |
| New | | Open |
+----------+ | Connect |
+----------+
</pre></td></tr></table>
<p>To have the strings <code>File</code>, <code>Printer</code>, <code>Open</code>,
<code>New</code>, <code>Select</code>, and <code>Connect</code> translated there has to be
at some point in the code a call to a function of the <code>gettext</code>
family. But in two places the string passed into the function would be
<code>Open</code>. The translations might not be the same and therefore we
are in the dilemma described above.
</p>
<p>One solution to this problem is to artificially enlengthen the strings
to make them unambiguous. But what would the program do if no
translation is available? The enlengthened string is not what should be
printed. So we should use a little bit modified version of the functions.
</p>
<p>To enlengthen the strings a uniform method should be used. E.g., in the
example above the strings could be chosen as
</p>
<table><tr><td> </td><td><pre class="smallexample">Menu|File
Menu|Printer
Menu|File|Open
Menu|File|New
Menu|Printer|Select
Menu|Printer|Open
Menu|Printer|Connect
</pre></td></tr></table>
<p>Now all the strings are different and if now instead of <code>gettext</code>
the following little wrapper function is used, everything works just
fine:
</p>
<a name="index-sgettext"></a>
<table><tr><td> </td><td><pre class="smallexample"> char *
sgettext (const char *msgid)
{
char *msgval = gettext (msgid);
if (msgval == msgid)
msgval = strrchr (msgid, '|') + 1;
return msgval;
}
</pre></td></tr></table>
<p>What this little function does is to recognize the case when no
translation is available. This can be done very efficiently by a
pointer comparison since the return value is the input value. If there
is no translation we know that the input string is in the format we used
for the Menu entries and therefore contains a <code>|</code> character. We
simply search for the last occurrence of this character and return a
pointer to the character following it. That’s it!
</p>
<p>If one now consistently uses the enlengthened string form and replaces
the <code>gettext</code> calls with calls to <code>sgettext</code> (this is normally
limited to very few places in the GUI implementation) then it is
possible to produce a program which can be internationalized.
</p>
<p>With advanced compilers (such as GNU C) one can write the
<code>sgettext</code> functions as an inline function or as a macro like this:
</p>
<a name="index-sgettext-1"></a>
<table><tr><td> </td><td><pre class="smallexample">#define sgettext(msgid) \
({ const char *__msgid = (msgid); \
char *__msgstr = gettext (__msgid); \
if (__msgval == __msgid) \
__msgval = strrchr (__msgid, '|') + 1; \
__msgval; })
</pre></td></tr></table>
<p>The other <code>gettext</code> functions (<code>dgettext</code>, <code>dcgettext</code>
and the <code>ngettext</code> equivalents) can and should have corresponding
functions as well which look almost identical, except for the parameters
and the call to the underlying function.
</p>
<p>Now there is of course the question why such functions do not exist in
the GNU C library? There are two parts of the answer to this question.
</p>
<ul>
<li>
They are easy to write and therefore can be provided by the project they
are used in. This is not an answer by itself and must be seen together
with the second part which is:
</li><li>
There is no way the C library can contain a version which can work
everywhere. The problem is the selection of the character to separate
the prefix from the actual string in the enlenghtened string. The
examples above used <code>|</code> which is a quite good choice because it
resembles a notation frequently used in this context and it also is a
character not often used in message strings.
<p>But what if the character is used in message strings. Or if the chose
character is not available in the character set on the machine one
compiles (e.g., <code>|</code> is not required to exist for ISO C; this is
why the ‘<tt>iso646.h</tt>’ file exists in ISO C programming environments).
</p></li></ul>
<p>There is only one more comment to make left. The wrapper function above
require that the translations strings are not enlengthened themselves.
This is only logical. There is no need to disambiguate the strings
(since they are never used as keys for a search) and one also saves
quite some memory and disk space by doing this.
</p>
<hr size="6">
<a name="Using-gettextized-software"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#GUI-program-problems" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="#Helper-programs-for-gettext" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#Message-catalogs-with-gettext" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="User-influence-on-gettext"></a>
<h4 class="subsubsection">8.2.1.6 User influence on <code>gettext</code></h4>
<p>The last sections described what the programmer can do to
internationalize the messages of the program. But it is finally up to
the user to select the message s/he wants to see. S/He must understand
them.
</p>
<p>The POSIX locale model uses the environment variables <code>LC_COLLATE</code>,
<code>LC_CTYPE</code>, <code>LC_MESSAGES</code>, <code>LC_MONETARY</code>, <code>LC_NUMERIC</code>,
and <code>LC_TIME</code> to select the locale which is to be used. This way
the user can influence lots of functions. As we mentioned above the
<code>gettext</code> functions also take advantage of this.
</p>
<p>To understand how this happens it is necessary to take a look at the
various components of the filename which gets computed to locate a
message catalog. It is composed as follows:
</p>
<table><tr><td> </td><td><pre class="smallexample"><var>dir_name</var>/<var>locale</var>/LC_<var>category</var>/<var>domain_name</var>.mo
</pre></td></tr></table>
<p>The default value for <var>dir_name</var> is system specific. It is computed
from the value given as the prefix while configuring the C library.
This value normally is ‘<tt>/usr</tt>’ or ‘<tt>/</tt>’. For the former the
complete <var>dir_name</var> is:
</p>
<table><tr><td> </td><td><pre class="smallexample">/usr/share/locale
</pre></td></tr></table>
<p>We can use ‘<tt>/usr/share</tt>’ since the ‘<tt>.mo</tt>’ files containing the
message catalogs are system independent, so all systems can use the same
files. If the program executed the <code>bindtextdomain</code> function for
the message domain that is currently handled, the <code>dir_name</code>
component is exactly the value which was given to the function as
the second parameter. I.e., <code>bindtextdomain</code> allows overwriting
the only system dependent and fixed value to make it possible to
address files anywhere in the filesystem.
</p>
<p>The <var>category</var> is the name of the locale category which was selected
in the program code. For <code>gettext</code> and <code>dgettext</code> this is
always <code>LC_MESSAGES</code>, for <code>dcgettext</code> this is selected by the
value of the third parameter. As said above it should be avoided to
ever use a category other than <code>LC_MESSAGES</code>.
</p>
<p>The <var>locale</var> component is computed based on the category used. Just
like for the <code>setlocale</code> function here comes the user selection
into the play. Some environment variables are examined in a fixed order
and the first environment variable set determines the return value of
the lookup process. In detail, for the category <code>LC_xxx</code> the
following variables in this order are examined:
</p>
<dl compact="compact">
<dt> <code>LANGUAGE</code></dt>
<dt> <code>LC_ALL</code></dt>
<dt> <code>LC_xxx</code></dt>
<dt> <code>LANG</code></dt>
</dl>
<p>This looks very familiar. With the exception of the <code>LANGUAGE</code>
environment variable this is exactly the lookup order the
<code>setlocale</code> function uses. But why introducing the <code>LANGUAGE</code>
variable?
</p>
<p>The reason is that the syntax of the values these variables can have is
different to what is expected by the <code>setlocale</code> function. If we
would set <code>LC_ALL</code> to a value following the extended syntax that
would mean the <code>setlocale</code> function will never be able to use the
value of this variable as well. An additional variable removes this
problem plus we can select the language independently of the locale
setting which sometimes is useful.
</p>
<p>While for the <code>LC_xxx</code> variables the value should consist of
exactly one specification of a locale the <code>LANGUAGE</code> variable’s
value can consist of a colon separated list of locale names. The
attentive reader will realize that this is the way we manage to
implement one of our additional demands above: we want to be able to
specify an ordered list of language.
</p>
<p>Back to the constructed filename we have only one component missing.
The <var>domain_name</var> part is the name which was either registered using
the <code>textdomain</code> function or which was given to <code>dgettext</code> or
<code>dcgettext</code> as the first parameter. Now it becomes obvious that a
good choice for the domain name in the program code is a string which is
closely related to the program/package name. E.g., for the GNU C
Library the domain name is <code>libc</code>.
</p>
<p>A limit piece of example code should show how the programmer is supposed
to work:
</p>
<table><tr><td> </td><td><pre class="smallexample">{
setlocale (LC_ALL, "");
textdomain ("test-package");
bindtextdomain ("test-package", "/usr/local/share/locale");
puts (gettext ("Hello, world!"));
}
</pre></td></tr></table>
<p>At the program start the default domain is <code>messages</code>, and the
default locale is "C". The <code>setlocale</code> call sets the locale
according to the user’s environment variables; remember that correct
functioning of <code>gettext</code> relies on the correct setting of the
<code>LC_MESSAGES</code> locale (for looking up the message catalog) and
of the <code>LC_CTYPE</code> locale (for the character set conversion).
The <code>textdomain</code> call changes the default domain to
<code>test-package</code>. The <code>bindtextdomain</code> call specifies that
the message catalogs for the domain <code>test-package</code> can be found
below the directory ‘<tt>/usr/local/share/locale</tt>’.
</p>
<p>If now the user set in her/his environment the variable <code>LANGUAGE</code>
to <code>de</code> the <code>gettext</code> function will try to use the
translations from the file
</p>
<table><tr><td> </td><td><pre class="smallexample">/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
</pre></td></tr></table>
<p>From the above descriptions it should be clear which component of this
filename is determined by which source.
</p>
<p>In the above example we assumed that the <code>LANGUAGE</code> environment
variable to <code>de</code>. This might be an appropriate selection but what
happens if the user wants to use <code>LC_ALL</code> because of the wider
usability and here the required value is <code>de_DE.ISO-8859-1</code>? We
already mentioned above that a situation like this is not infrequent.
E.g., a person might prefer reading a dialect and if this is not
available fall back on the standard language.
</p>
<p>The <code>gettext</code> functions know about situations like this and can
handle them gracefully. The functions recognize the format of the value
of the environment variable. It can split the value is different pieces
and by leaving out the only or the other part it can construct new
values. This happens of course in a predictable way. To understand
this one must know the format of the environment variable value. There
is one more or less standardized form, originally from the X/Open
specification:
</p>
<p><code>language[_territory[.codeset]][@modifier]</code>
</p>
<p>Less specific locale names will be stripped of in the order of the
following list:
</p>
<ol>
<li>
<code>codeset</code>
</li><li>
<code>normalized codeset</code>
</li><li>
<code>territory</code>
</li><li>
<code>modifier</code>
</li></ol>
<p>The <code>language</code> field will never be dropped for obvious reasons.
</p>
<p>The only new thing is the <code>normalized codeset</code> entry. This is
another goodie which is introduced to help reducing the chaos which
derives from the inability of the people to standardize the names of
character sets. Instead of ISO-8859-1 one can often see 8859-1,
88591, iso8859-1, or iso_8859-1. The <code>normalized
codeset</code> value is generated from the user-provided character set name by
applying the following rules:
</p>
<ol>
<li>
Remove all characters beside numbers and letters.
</li><li>
Fold letters to lowercase.
</li><li>
If the same only contains digits prepend the string <code>"iso"</code>.
</li></ol>
<p>So all of the above name will be normalized to <code>iso88591</code>. This
allows the program user much more freely choosing the locale name.
</p>
<p>Even this extended functionality still does not help to solve the
problem that completely different names can be used to denote the same
locale (e.g., <code>de</code> and <code>german</code>). To be of help in this
situation the locale implementation and also the <code>gettext</code>
functions know about aliases.
</p>
<p>The file ‘<tt>/usr/share/locale/locale.alias</tt>’ (replace ‘<tt>/usr</tt>’ with
whatever prefix you used for configuring the C library) contains a
mapping of alternative names to more regular names. The system manager
is free to add new entries to fill her/his own needs. The selected
locale from the environment is compared with the entries in the first
column of this file ignoring the case. If they match the value of the
second column is used instead for the further handling.
</p>
<p>In the description of the format of the environment variables we already
mentioned the character set as a factor in the selection of the message
catalog. In fact, only catalogs which contain text written using the
character set of the system/program can be used (directly; there will
come a solution for this some day). This means for the user that s/he
will always have to take care for this. If in the collection of the
message catalogs there are files for the same language but coded using
different character sets the user has to be careful.
</p>
<hr size="6">
<a name="Helper-programs-for-gettext"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Using-gettextized-software" title="Previous section in reading order"> < </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next section in reading order"> > </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="#The-Uniforum-approach" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Programs-to-handle-message-catalogs-for-gettext"></a>
<h3 class="subsection">8.2.2 Programs to handle message catalogs for <code>gettext</code></h3>
<p>The GNU C Library does not contain the source code for the programs to
handle message catalogs for the <code>gettext</code> functions. As part of
the GNU project the GNU gettext package contains everything the
developer needs. The functionality provided by the tools in this
package by far exceeds the abilities of the <code>gencat</code> program
described above for the <code>catgets</code> functions.
</p>
<p>There is a program <code>msgfmt</code> which is the equivalent program to the
<code>gencat</code> program. It generates from the human-readable and
-editable form of the message catalog a binary file which can be used by
the <code>gettext</code> functions. But there are several more programs
available.
</p>
<p>The <code>xgettext</code> program can be used to automatically extract the
translatable messages from a source file. I.e., the programmer need not
take care for the translations and the list of messages which have to be
translated. S/He will simply wrap the translatable string in calls to
<code>gettext</code> et.al and the rest will be done by <code>xgettext</code>. This
program has a lot of option which help to customize the output or do
help to understand the input better.
</p>
<p>Other programs help to manage development cycle when new messages appear
in the source files or when a new translation of the messages appear.
Here it should only be noted that using all the tools in GNU gettext it
is possible to <em>completely</em> automate the handling of message
catalog. Beside marking the translatable string in the source code and
generating the translations the developers do not have anything to do
themselves.
</p><hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Message-Translation" title="Beginning of this chapter or previous chapter"> << </a>]</td>
<td valign="middle" align="left">[<a href="libc_9.html#Searching-and-Sorting" title="Next chapter"> >> </a>]</td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left"> </td>
<td valign="middle" align="left">[<a href="libc.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="libc_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[<a href="libc_42.html#Concept-Index" title="Index">Index</a>]</td>
<td valign="middle" align="left">[<a href="libc_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
<font size="-1">
This document was generated by <em>root</em> on <em>April 20, 2012</em> using <a href="http://www.nongnu.org/texi2html/"><em>texi2html 1.82</em></a>.
</font>
<br>
</p>
</body>
</html>
|