Page moved

Scott Lahteine 2017-09-23 22:03:26 -05:00
parent 0c96ba1331
commit f2460662c3

@ -1,158 +1 @@
# LCD Language Font System [This page has moved!](http://marlinfw.org/docs/development/lcd_language.html)
We deal with a variety of different displays and we try to display a lot of different languages in different scripts on them. This system is meant to solve some of the related problems.
## The Displays
We have two different technologies for the displays:
### Character based displays:
- Have a fixed set of symbols (charset - font) in their ROM.
- All of them have a similar but not identical symbol set at the positions 0 to 127 similar to US-ASCII.
On the other hand symbols at places higher than 127 have major differences.
Until now we know of (and support):
- HD44780 and similar with Kana charset A00 [HD44780](https://www.sparkfun.com/datasheets/LCD/HD44780.pdf) (Page 17) These are very common, but sadly not very useful when writing in European languages.
- HD44780 and similar with Western charset A02 [HD44780](https://www.sparkfun.com/datasheets/LCD/HD44780.pdf) (Page 18). These are rare, but fairly useful for European languages. Also a limited number of Cyrillic symbols is available.
- HD44780 and similar with [Cyrillic charset](http://store.comet.bg/download-file.php?id=466) (Page 14). Some of our Russian friends use them.
On all these displays you can define 8 custom symbols to display at once. In Marlin these characters are used on the Info Screen for the Bed Temp, Degree symbol, Thermometer, "FR" (feed-rate), Clock, and Progress Bar. On the SD Card listing screens some of them get re-used for Up-level, Folder, and Refresh.
### Full Graphical Displays
Graphical displays provide the full freedom to display whatever we want, so long as we provide a program for it. Currently we deal with 128x64 Pixel Displays and divide this area into ~5 Lines with ~22 columns. So we need mon-space fonts with a bounding box of about 6x10.
- Until now we've been using a custom Marlin font similar to ISO10646-1 but with special symbols at the end, which made 'ü' and 'ä' inaccessible at 6x10 size.
- Because these letters were too big for some positions on the Info Screen, we use a full ISO10646-1 font at 6x9 (3200 bytes).
- When we define `USE_BIG_EDIT_FONT` we use an additional ISO10646-1 9x18 font, eating up another 3120 bytes of PROGMEM - but readable without glasses!
## The Languages
For the moment Marlin wants to support a lot of languages:
Code|Language
----|--------
en|English
an|Aragonese
bg|Bulgarian
ca|Catalan
cn|Chinese
cz|Czech
de|German
el|Greek
el-gr|Greek (Greece)
es|Spanish
eu|Basque-Euskera
fi|Finnish
fr|French
gl|Galician
hr|Croatian
it|Italian
kana|Japanese
kana_utf8|Japanese (UTF8)
nl|Dutch
pl|Polish
pt|Portuguese
pt-br|Portuguese (Brazilian)
pt-br_utf8|Portuguese (Brazilian UTF8)
pt_utf8|Portuguese (UTF8)
ru|Russian
tr|Turkish
uk|Ukrainian
## The Problem
All these languages, except English, normally use extended symbol sets, not contained in US-ASCII. Even the English translation uses some Symbols not in US-ASCII. ( '`\002`' for Thermometer, `STR_h3` for '³') And worse, in the code itself symbols are used, not taking into account the display they're written on. [(This may still be true only for Displays with Japanese charset)](https://github.com/MarlinFirmware/Marlin/blob/Development/Marlin/ultralcd_implementation_hitachi_HD44780.h) On Western displays you'll see a '`~`' while on Cyrillic an "arrow coming from top - pointing to left," which is quite the opposite of what the programmer wanted.) The Germans want to use "`ÄäÖöÜüß`", the Finnish at least "`äö`". Other European languages want to see their accents on their letters too. For other scripts like Cyrillic, Japanese, Greek, Hebrew, ... you have to find totally different symbol sets.
Until now the problems were ignored widely. The German translation used UTF-8 'ä' and 'ö' and did not care about showing garbage on ALL displays. Russian translators knew their system only works on Cyrillic displays and relied on special LCD routines (`LiquidCrystalRus.cpp`) to handle UTF-8 - but forgot to implement a proper `strlen()`.
The Japanese translator dealt with two scripts. He introduced a special font for the Full Graphic Displays and made use of the Japanese version of the character displays. Therefore he ended up with two pretty unreadable `language.h` files full of '`\xxx`' definitions.
Other languages either tried to avoid words that included special symbols or ignored the problem and used the basic symbols without the accents, dots... whatever.
## The (Partial) Solution
On a 'perfect' system like Windows or Linux we'd dig out `unifont.ttf` and some code from the libraries and they'd do what we want. But on an embedded system with very limited resources we have to find ways to limit the used space (Just `unifont.ttf` alone is about 12MB!), requiring some compromise.
### Aims:
- Make the input for translators as convenient as possible. (Unicode UTF8)
- Make the displays show the scripts as best as they can. (fonts, mapping tables)
- Don't destroy the existing language files.
- Don't use more CPU resources.
- Don't use too much memory.
### Actions:
- Declare the display hardware we use. (`Configuration.h`)
- Declare the language ore script we use. (`Configuration.h`)
- Declare the kind of input we use. Ether direct pointers to the font (`\xxx`) or UTF-8 and the font to use on graphic displays. (`language_xx.h`)
- Declare the needed translations. (`language_xx.h`)
- Make strlen() work with UTF8. (`ultralcd.cpp`)
- Separate the Marlin Symbols into their own font. (`dogm_font_data_Marlin_symbols.h`)
- Make the fontswitch function remember the last used font. (`ultralcd_impl_DOGM.h`)
- Make output functions that count the number of written chars and switch the font to Marlin symbols and back when needed. (`ultralcd_impl_DOGM.h`) (`ultralcd_impl_HD44780.h`)
- Make three fonts to simulate the HD44780 charsets on dogm-displays. With this fonts the translator can check how his translation will look on the character based displays.
- Make ISO fonts for Cyrillic and Katakana because they do not need a mapping table and are faster to deal with and have a better charset (less compromises) than the HD44780 fonts.
- Make mapping functions and tables to convert from UTF8 to the fonts and integrate in the new output functions. (`utf_mapper.h`)
- Delete the obsolete `LiquidCrystalRus.xxx` files and their calls in '`ultralcd_implementation_hitachi_HD44780.h`'.
- Split '`dogm_font_data_Marlin.h`' into separate fonts and delete. (+`dogm_font_data_6x9_marlin.h`, +`dogm_font_data_Marlin_symbols.h`, -`dogm_font_data_Marlin.h`)
- Do a bit of preprocessor magic to match displays - fonts and mappers in `utf_mapper.h`.
## Translators handbook
- Check if there is already a `language_xx.h` file for your language (-> b.) or not (-> e.)
- Either there's a `MAPPER_NON` declared (-> c.) or an other mapper (-> d.)
- Symbols outside the normal ASCII-range (32-128) are written as "`\xxx`" and point directly into the font of the hardware you declared in 'Configuration.h'
- This is one of the three fonts of the character based Hitachi displays (`JAPANESE`, `WESTERN`, `CYRILLIC`) set by `DISPLAY_CHARSET_HD44780`.
- Even on the full graphic displays one of these will be used when `SIMULATE_ROMFONT` is defined.
- If you don't make use of the extended character set your file will look like `language_en.h` and your language file will work on all the displays.
- If you make extensive use, your file will look like `language_kana.h` and your language file will only work on one of the displays (in this case `DISPLAY_CHARSET_HD44780` == `JAPANESE`).
- Be careful with the characters `0x5C = '\'`, and `0x7B - 0x7F` "`{|}`". These are not the same on all variants.
- `MAPPER_NON` is the fastest and least memory-hungry variant.
- If you want to make use of more than a few symbols outside standard ASCII or want to improve the portability to more different types of displays use UTF-8 input. That means, define another mapper.
- When a mapper other than `MAPPER_NON`, UTF-8 input is used. Instead of "`\xE1`" (on a display with Japanese font) or `STR_ae` simply use "ä". When the string is read byte by byte the "ä" will expand to "`\0xC3\0xA4`" or "Я" will expand to "`0xD0\0xAF`" or "ホ" will expand to "`\0xE3\0x83\0x9B`"
- To limit memory usage we can't use all of UTF-8 at the same time. We define a subset matching the language or script we use.
- `MAPPER_C2C3` corresponds well with Western-European languages. The possible symbols are listed at [this Latin-1 page](http://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)).
- `MAPPER_D0D1` corresponds well with the Cyrillic languages. See [this Cyrillic page](http://en.wikipedia.org/wiki/Cyrillic_(Unicode_block)).
- `MAPPER_E382E383` works with the Japanese Katakana script. See [this Katakana page](http://en.wikipedia.org/wiki/Katakana_(Unicode_block)).
Mapper functions will only catch the 'lead-in' described in the mapper's name. If the input doesn't match the mapper will output a '?' or garbage.
The last byte in the sequence ether points directly into a matching ISO10646 font or via a mapper_table into one of the HD44780 fonts.
The mapper_tables do their best to find a similar symbol in the HD44780_fonts. For example replacing small letters with the matching capital letters. But they may fail to find something matching and will output a '?'. There are combinations of language and display what simply have no corresponding symbols - like Cyrillic on a Japanese display or visa versa - than the compiler will throw an error.
In short: Choose a Mapper that works with the symbols you want to use. Use only symbols matching the mapper. On Full Graphic Displays all will be fine, but check for bad replacements or question-marks in the output of character based displays by defining `SIMULATE_ROMFONT` and trying the different variants.
If you get a lot of question marks on the Hitachi-based displays with your new translation, maybe creating an additional language file with the format `language_xx_utf8.h` is the way to go.
- `MAPPER_NON` is the fastest and least memory-hungry variant.
- Mappers together with a ISO10646_font are the second-best choice regarding speed and memory consumption. Only a few more decisions are made per-character.
- Mappers together with the HD44780_fonts use about additional 128 bytes for the mapping_table.
- Creating a new language file is not a big thing. Just make a new file with the format '`language_xx.h`' or maybe '`language.xx.utf8.h`', define a mapper and a font in there and translate some of the strings defined in `language_en.h`. You can drop the surrounding `#ifndef` `#endif`. You don't have to translate all the strings - the missing one will be added by `language_en.h` - in English - of course.
- If you can't find a matching mapper things will be a bit more complex. With the Hitachi-based displays you won't have be able to make something useful unless you have one with a matching charset. For a full graphic display - lets take the example of Greek:
- Find a matching charset. ([Greek and Coptic](http://en.wikipedia.org/wiki/Greek_and_Coptic))
- Provide a font containing the symbols in the right size. Normal ASCII in the lower 127 places, the upper with your selection.
- Write a mapper that catches -in this case- `0xCD` to `0xCF` and add it to `utf_mapper.h`.
- In case of a ISO10646 font we have a `MAPPER_ONE_TO_ONE` and don't have to make a table.
- If you discover enough useful symbols in one of the HD44780 fonts you can provide a mapping table. For example `WESTERN` contains 'alpha', 'beta', 'pi', 'Sigma', 'omega' 'My' - which is not enough to make USEFUL table - I think.
- If you want to integrate an entirely new variant of a Hitachi-based display. Add it to `Configuration.h` and define mapper tables in `utf_mapper.h`. You may need to add a new mapper function.
The length of the strings is limited. "17 chararacters" was a crude rule of thumb. Obviously 17 is too long for the 16x2 displays. A more exact rule would be `max_strlen = LCD_WIDTH - 2 - strlen(edit_value)`. This is a bit complicated. So try and count is my rule of thumb.
On the 16x2 displays the strings are cut at the end to fit on the display. So it's a good idea to make them differ early. ('`Someverylongoptionname x`' -> '`x Somverylongoptionname`')
You'll find all translatable strings in `language_en.h`. Strings in `language.h` are for serial output, so don't require any translation. Core error strings must always be in English to satisfy host protocols.
For information about fonts see [`buildroot/share/fonts/README.md` file](/MarlinFirmware/Marlin/tree/1.1.x/buildroot/share/fonts#readme).
## User Instructions
Define your hardware and the desired language in `Configuration.h`.
To find out what character set your hardware uses, set `#define LCD_LANGUAGE test` and compile Marlin. In the menu you'll see two lines from the upper half of the character set:
- `JAPANESE` displays "`バパヒビピフブプヘベペホボポマミ`"
- `WESTERN` displays "`ÐÑÒÓÔÕÖרÙÚÛÜÝÞß`"
- `CYRILLIC` displays "`РСТУФХЦЧШЩЪЫЬЭЮЯ`"
If you get an error message about "missing mappers" during compilation - lie about your display's hardware font to see at least some garbage, or select another language.
English works on all hardware.