Coding Practices
Have the appropriate coding practices been followed in producing the product?
Externalize Translatable Information
Are all strings externalized? It is not a good practice, long term, to simply pile all resource strings into one giant .properties file. There needs to be a strategy for separating items by type (errors, etc.) as well as by Internationalization effort. There are several types of strings in a product:
- Company Owned and Translatable. An example would be error messages. The proper model is not “change the .properties” entry. The proper model is to provide an extensible error mechanism to allow the addition of more messages.
- Company Owned and not Translatable. An example would be any string used as a key in a lookup or logic point.
- Non-localizable resource strings. An example would be any HTML tags that are not locale sensitive (which is most of them).
- User Modifiable and Translatable. An example would be the explanatory text at the top of the a UI page and any error messages that cannot be locked up and owned by the product.
- User Modifiable and not Translatable. An example would be HTML tags that are locale sensitive or that are likely to need modification by the user. For example, font-face needs to be configurable to allow for items like Japanese characters.
Consider designing and implementing a generic resource/string manager. When your needs go beyond the default “1 world per VM” functionality in Java, treating a suite of products like an Application will not result in an International friendly product. This generic resource/string manager, no matter who implements it, could be a platform level service. Possibly a standalone servlet if it makes sense.
Is the text externalized from the images? Text included on images can be a very labor intensive item in localization. The text should be isolated from the graphic by either removing it completely or by placing it on a layer of the graphic which is easily accessible.
Are characters processed correctly for all relevant locales (Unicode)? You must use the appropriate functions and operations to get the next character in a string. Incrementing a pointer by 1 is not an option when dealing with double-byte characters. Characters can be mixed single-byte and double-byte characters in a single string.
Are all fonts externalized? Hard coded font faces in generated HTML is problematic at best. With the set chosen you are probably covered for European locales but not for Pacific Rim. In the long run, hard coding a font face is not International friendly and you will need to investigate mechanisms for generating locale specific HTML from servlets and JSPs. Over the short haul you could just make sure that the font tags are in localizable resources and that the fonts specified are appropriate for the locale. The user can (at least in Netscape) override the font choices you make so that may provide you with another short term solution.
Is string concatenation avoided? String concatenation is a bane to localizers. When faced with a partial string the translator does not know how to handle the translation because there is no way to know what comes next. Even if the translator works directly in the source files (not a good idea) the fragments of strings can be a big problem for word order. This only gets worse when the fragments are used more than once to make up different strings.
Is all character encoding externalized?
Do all error messages have a unique identifier? This is good, if for no other reason than it makes life a little easier when trying to debug a user’s issue. If the user can tell you which message number is shown you can find it and determine what it says without relying on the user’s ability to translate it. Generally, when a user tries to translate a message on the fly you get something like “well the message has something to do with the program not working right”. Well that is true of all error messages so it really did not help you much. If the user can say that message number E1042 followed by message I2388 is shown on the console then you have exactly what you need to find out where the issue occurred (if you do a good job of recording errors). This also means that you can not do the sloppy general catchall messages so popular with hackers.
Databases and Files
Is there a database storage schema for multilingual deployment?
Are all strings or files being parsed as true Unicode?
Are there any limitations in file operations, such as constraints in regards to valid file and directory names?
Processing at the Client
Does adjusting regional settings on the client side cause any problems? The problems could occur on either the client side or the server side of the application, so remember to check it both ways.
Are there any modules or fragments of code, which do not fully support Unicode?
Have accessibility issues been addressed? Sometimes addressing issues dealing with accessibility can lead you to novel ways to handle linguistic and internationalization issues.
Code Organization
Is there one set of source code for all language editions? One set of source code and a build environment to handle the various language editions is better, in the long run, than having a separate set of code for each language.
1 Trackback or Pingback
[...] Coding Practices [...]