-----Original Message-----
From: Arko, Phil [mailto:phil.arko@scr.siemens.com] 
Sent: Friday, October 24, 2003 8:33 PM
To: 'public-i18n-geo@w3.org'
Subject: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring Applications

Greetings all!

Below is the Q&A about setting encoding in various web authoring applications. Your feedback is appreciated.


Phil Arko
Sr. Human Factors Engineer
Siemens Corporate Research
User Interface Design Center



How do I set character encoding in my web authoring application? [??? or: "Where is the feature hidden in my application?" ???]


Content on the web can be authored using a variety of software applications. Even within a single site, the content may have been created using multiple authoring tools. For example, a website that was created using Macromedia Dreamweaver might also include a page created using Microsoft Access' data access page feature, as well as a dynamic Flash movie that allows for language selection. In order for all of these files to properly serve the correct text, they need to be properly encoded.

This article is not meant to be a tutorial on defining and using character encoding within the web authoring applications, but rather to identify where some of the key functionality exists. This is not a complete listing of software, but rather a collection of some of the more popular web authoring applications in use.

As software evolves, it is possible that the location of the functionality may change. In addition, specific options of character encodings may vary depending on the user's installation version and location, and so these are not discussed in detail for each application. For more detailed information, refer to the specific application's help content or user manuals. Common keywords for searches include Character Encoding, Internationalization, Multilingual, Unicode, and UTF.

There are two main points to remember when creating properly encoded files:

  1. the markup within the document must properly designate the encoding (such as charset=iso-8859-1 in an XHTML/HTML meta tag, or encoding="UTF-8" in an XML declaration statement).
  2. the file, itself, must be saved in the proper encoding format (such as UTF-8).

Most of these applications will save the file in the proper format, but may not input the proper markup within the document.

Another key element in the markup is the language indicator. Many of the applications listed here combine the encoding and language in the user-selectable options. If the language is not included by the application, it is good practice to also include that in the markup manually. Some applications may acquire the regional settings of your operating system to create a locale tag.


[??? Adobe Acrobat ???]

[??? can't find anything specific yet ???]

[??? Adobe FrameMaker ???]

[??? can't find anything specific yet ???]

Adobe GoLive 5.0 (Mac)

[??? Newer version?, PC version the same? ???]

To specify the character encoding for your pages, go to Edit Preferences Encodings category.

[??? Adobe Page Maker ???]

[??? can't find anything specific yet ???]

Apple TextEdit

You will need to input the proper encoding into the XHTML/HTML file. Files are natively saved as UTF-8, so no further action is necessary.

Macromedia ColdFusion (Windows)

To properly configure a ColdFusion application, become familiar with the various encoding-related commands and functions (a few of which include "setEncoding," "cfcontent," and the form attribute "enctype").

Macromedia Dreamweaver MX (Mac & Windows)

To specify the character encoding for your pages, go to Modify Page Properties. Select the proper encoding from the "Document Encoding" dropdown menu.

To specify the character encoding for viewing pages while editing, go to Edit Preferences Fonts category (Dreamweaver Preferences Fonts category on Mac).

Macromedia Flash MX (Mac & Windows)

When efficiently designed, multilingual Flash movies often store the text for each language in separate include files (#include), reducing the time needed to download a flash movie by only sending the selected language data. UTF-8 text can be stored in an include file. The include file should start with "//!-- UTF8" and must be saved in UTF-8 format.

UTF-8 character notation can also be specified in Flash's ActionScript environment. U+0000 would be written using the escape sequence "\u0000" within the ActionScript code.

Another setting worth noting is the encoding setting for the end-user's Flash Player. This is defaulted to false (system.useCodepage = false;), which will use UTF-8. There are times when this may have been changed for some special purpose, but must be changed back to "false" before displaying UTF-8 text again by placing the proper ActionScript in the timeline before calling any new text.

Macromedia HomeSite+

You need to input the encoding information in the file. You can then go to File Save As and select the proper encoding using the Encoding dropdown menu.

There is also an HTML Tidy feature that can check your code as you type. The encoding options are located here: Options
Settings CodeSweeper category
HTML Tidy CodeSweeper subcategory Macromedia HTML
subcategory Char
encoding dropdown menu.

Microsoft Office -- Access, Excel, PowerPoint, and Word

(version 2000 for Windows, version X for Mac OS X) [??? NEED TO CHECK IF THIS IS THE SAME IN OFFICE XP ???]

Microsoft Word is often used to export documents directly to HTML. Increasingly, spreadsheets and presentations (from Excel and PowerPoint, respectively) are also being exported to web pages. Exporting database content into web pages has become easier for the desktop user with the addition of data access pages within Microsoft Access (Windows only).

Select "Tools Options General tab Web Options button Encoding tab." Select the appropriate selection in the "Save document as" dropdown menu.

Note: In Access, first open the data access page in design view.

Microsoft Frontpage 2000 (Windows)

The encoding options are under "Language (character set)." Go to: Tools Page Options Default Font tab. You will notice an option that says "Multilingual (UTF-8)."

Microsoft Notepad (Windows)

If you create or edit documents using Notepad, you will need to specify the character encoding and language when you write the markup code. When you save the document, select "File Save as" and select the proper encoding from the Encoding dropdown list at the bottom. Be aware that there is a known issue with this, which can be fixed with a Pearl script. [??? CAN ANYONE PROVIDE MORE INFO ABOUT THIS ???]

Helios TextPad

The proper markup for encoding will need to be entered into the file. When saving the document, the proper file format can be selected here: File Save As Encoding dropdown menu.

W3C Amaya (Mac, Unix, Windows)

When saving the file, go to File Save as. Amaya will make sure that the encoding is correct in the xml declaration (for XHTML) and the <meta> statement. Amaya also uses the appropriate encoding ('charset') in the HTTP headers when it saves a document remotely using PUT. Amaya also understands several other encodings when loading a document, but is not able so save in any of these.


Keep in mind that the end user can select both the encoding to use, as well as the font to use for each encoding [??? CAN THIS BE OVERWRITTEN BY CSS ???]. For example in Microsoft Internet Explorer, the current encoding can be viewed (and revised) by going to the cascading menus under View Encoding. Note that "Right-To-Left Document" or "Left-To-Right Document" will also appear when it has been set.

Another option that is selectable by the user for Internet Explorer users is the option to "Always send URLs as UTF-8." This can be found here: Tools Internet Options Advanced tab Browsing category.

When content is ready to be published, it is good practice to also validate your content using the W3 validation tool [http://validator.w3.org/ ].


Hints & Tips: Character Encodings http://www.w3.org/International/O-charset.html

Unicode Enabled Products http://www.unicode.org/onlinedat/products.html

Encoding Forms http://www.unicode.org/standard/principles.html#Encoding_Forms