Because the HTML code that Word generates is not automatically accessible to all users, there are two options for making Word documents accessible.
At first, both of these options may seem daunting, but it is not as difficult as you might think. For a basic Word document without much formatting, you can create a new HTML document based on an accessible template. For documents containing more complex features and formatting, it may be easier to edit the code Word has generated. This second approach is the one you will take with the sample document.
Note: The following explanation covers Word 2000 and Word XP. There are some differences for Word 2007. To view these differences, please click Here
The next section will look at the unaccessible aspects of the HTML code generated by Word and how to modify these by hand to be accessible.
To make changes to the source code for your web page, you have three options in which to open the html file:
Notepad will be used for this example. All Windows computers should have Notepad installed.
The first line of your file should be <html>
which tells the web browser what language this code is written in. The
first thing a document must contain in order to be accessible and compliant
with standards is a DOCTYPE statement. The DOCTYPE statement tells the
browser exactly which version of HTML it is dealing with. Copy the following
line into the code, just above the <html> line.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
This tells the browser that this document should comply with the W3C standards for HTML version 4.01. The "Transitional" clause simply means that there may be some older elements and tags used since this is a transitional period and not all browsers support the "Strict" implementation of this document type. The web address tells the browser where it can find the details of this specification if it does not recognize the DOCTYPE. Make sure this line is the first thing in your html file (see Image 11).
Now the beginning of your document should look like the following:
Next, you should tell the browser what language the text in this file
is written. For a user who can see, it is easy to know that the page is
in English. For someone using a screen reader, it is not so easy. The
screen reader needs to know the language in order to know how to read
the document. This information goes in the <html>
tag.
<html lang="EN-US">
The <head> tag tells the browser
that all the information in this section is for the browser's use and
should not be displayed.
The next thing to address is the <title>...</title>
line.
If you used Word XP to generate the HTML, you should have given your
document a title already. If you used Word 2000, you may see the file
name or something like "Untitled." Make sure not to erase either
of the <title> tags or the brackets
around them, but change the title to something meaningful if necessary.
When a user views your web page, the title will appear in the browser's
blue title bar (at the top of the browser window). The title is also what
shows up by default in the Favorites list when a user bookmarks your page.
For this example, make the title even more descriptive by changing it
to "ITSK 1701 Syllabus."
The next block of code begins with <style>
and tells the browser how to display certain elements of the page. Word
generally includes extra style information so you should delete this now.
The W3C guidelines and other accessibility organizations recommend the
use of external styles, so you should delete the style that Word has applied
within the document; it is best to leave the style outside of the document.
Select everything between <style>
and </style> and delete this material.
In addition, you will want to remove the "meta" tags that Word has inserted. These can cause problems in displaying "smart quotations" and other special characters.
Once you have removed the above-mentioned information, you should now
have something that looks like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"
"http://www.w3c.org/TR/html4/loose.dtd">
<HTML lang=EN-US>
<HEAD>
<title>ITSK 1701 Syllabus</title>
</HEAD>
The next section is the body, denoted by <body>.
This tells the browser that this is the main part of the page and the
things in this section should be displayed to the user.
First, since you have deleted all of Word's style definitions, use Find and Replace to get rid of the remaining style information. HTML ignores extra white-space in a document.
Word sometimes includes <span> tags
in strange places. These can also be deleted. Make sure to remove the opening
and closing tags (<span> and </span>).
The next thing you should change is in the table that displays the instructor's
contact information. Find the first <table> tag (see Image
13).
Most users can look at a table and quickly see the relationships represented. A user who cannot see may have a harder time. For this reason, you should modify the table in a couple of ways:
summary="Instructor
contact information"<td>
tag represents a "normal" data cell and the <th>
tag represents a header cell. In this example, the left column contains
our headers so we will change those tags. Find the <td>
tags for each of the left-hand cells and replace "td" with
"th." Since header cells are generally displayed as bold and
centered, you will also add the attribute "align=left." You
should also replace the appropriate closing tags (</td>)
with </th> closing tags.Image 14 offers a sample of the HTML code with changes highlighted and deletions crossed out. The final document with all changes made should look like Image 15.


Repeat this process for each cell and table in the document.
The table containing the textbook information is an exception. That table is essentially for layout so it does not need a summary or headers. The W3C recommends that you do not use tables for layout. The best thing do to is convert the table to a non-table format, going left-to-right and top-to-bottom, removing the table that is used for layout; this "linearizes" the table.
To create a non-table format, move the content into plain HTML without the table tag or any TH or TD tags around the content. In this example, the display order of the content should be:
The resultant webpage will look like Image 16.



The HTML code for the bulleted and numbered lists should be the last
elements to check for accessibility. When two lists are in close proximity
to each other, Word assumes that the second list is a continuation of
the first. For this reason, there is an extra ordered list (<ol>)
tag that you should delete (see Image
17).
The <ol> tag tells the browser that
this will be a numbered "ordered list." When the browser sees
that tag, it looks for a list item (<li>)
tag to start a line in the list. In this example, there is not a <li>,
just an unordered "bulleted" list represented by the <ul>
tag. In order to be completely compliant with HTML guidelines, you should
correct Word's mistake.
To correct the problem, remove the starting and ending </ol>
tags around the "class expectations" list.
Your resulting HTML code should appear similar to Image 18.
The data table that the Word document contained is not fully accessible.
When data tables are used in an accessible HTML document, they must contain
CAPTION, THEAD and TBODY tags and use TH and TD tags appropriately.
Here is the basic template that data tables should follow:
<table summary="SUMMARY HERE">
<caption> CAPTION TO ASSOCIATE WITH THE DATA TABLE </caption>
<thead> HEADER INFORMATION HERE ABOUT TABLE COLUMNS </thead>
<tbody> DATA CONTAINED HERE WITH TR AND TD TAGS </tbody>
</table>
In this example syllabus there are two data tables, one for the percentage of the overall grade to which each assignment contributes and one for grade cutoffs (what's an A, what's a B, etc.).
The HTML that Word generated for these two tables is quite close to accessible HTML, but you should add the summary, caption, and header information for complete compliance. For this example, an appropriate caption for the first table would be "How much each assignment counts towards your overall grade" and "Letter grades for overall numeric grades" for the second. Summaries for each would be similar (and are displayed below). Adding the CAPTION, THEAD, and TBODY tags is easy to do.
As you can see from the following HTML example (from the first of the two tables), these tables are relatively easy to correct and make fully accessible.
The accessible "grade values" table:
<table border=1 cellspacing=0 cellpadding=0 style='border-collapse:collapse;
border:none;' summary="Grade values for each assignment">
<caption> How much each assignment counts towards your overall grade
</caption>
<thead>
<th scope="col">Assignment</th>
<th scope="col">Percentage of grade</th>
</thead>
<tbody> THIS IS THE MATERIAL THAT WORD GENERATED - YOU DON'T NEED
TO CHANGE IT </tbody>
</table>
The accessible "grade distribution" table:
<table border=1 cellspacing=0 cellpadding=0 style='border-collapse:collapse;
border:none;' summary="Letter grade distribution">
<caption> Letter grades for overall numeric grades </caption>
<thead>
<th scope="col">Letter grade earned</th>
<th scope="col">Numeric grade range</th>
</thead>
<tbody> THIS IS THE MATERIAL THAT WORD GENERATED - YOU DON'T NEED
TO CHANGE IT </tbody>
</table>
When you have finished, choose "Save" from the File menu. Open your web page in a browser and you will see that it looks almost identical - now that it is accessible to everyone.
If you view the webpage code generated by Word 2007's "Save As Webpage" feature, you will immediately notice that it seems like nonsense. Depending on the length of the content of your webpage, the first roughly 50-75% of the generated code will be "XML" markup, which you (the editor) don't even need to change.
An example "XML" Tag: <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267">. Tags of a similar fashion, and CSS Stylesheets, will be contained all throughout the <head> section.
In order to actually begin editing the content, you must find where the <body> tag begins. In order to find this in the file, you can either search for "body", or the end of the "head" section, which will look like this: </head>.
A noticable difference between how Word 2007 generates webpage code and how Word 2000/XP does is the location of the "lang" declaration. In Word 2007, this occurs in the body tag, leaving your body tag looking potentially something like this: <body lang=EN-US style='tab-interval:.5in'>.
As you are browsing through the body section, you will notice that preceding every <img> (image) tag, there is more XML markup. In order to avoid having to edit this, you should skip past it, and continue editing the remainder of the code. If you do need to change an image, however, the best method of doing this is editing it in the original word document and repeating the process. It's tedious, so you should make sure you are happy with your images before you continue the process.
Look at the selected text in Image N01 if you are having trouble discerning which text is related to an image.
Tables, lists, and other elements generated by Word 2007's "Save as Webpage" feature remain the same as for previous versions, or, just standard HTML. Therefor, you can view the above section for information on making these elements accessible.
The next page will discuss publishing Word documents to the course management system, Blackboard.