When multimedia presentations (such as video with audio) are available on a web page, the audio portion must be captioned and the captions must be synchronized with the presentation.
Audio Information
Core Techniques (W3C Source)
Auditory presentations must be accompanied by text transcripts of auditory events. When these transcripts are presented synchronously with a video presentation they are called "captions" and are used by people who cannot hear the audio track of the video material.
Some media formats (e.g., QuickTime and SMIL) allow captions and video descriptions (audio description of visual content) to be added to a multimedia clip. Microsoft's SAMI allows captions to be added. The following example demonstrates that captions should include speech as well as any other sounds in the environment that help viewers understand what is going on.
Example: Captions for a scene from "E.T." The phone rings three times, then is answered.
[phone rings]
[ring]
[ring]
"Hello?"
End example.
Until the format you are using supports alternative tracks, two versions of a movie could be made available, one with captions and video descriptions, and one without. Some technologies, such as SMIL and SAMI, allow audio/visual files to be combined with separate text files via a synchronization file to create captioned audio and movies.
Some technologies also allow users to choose from multiple sets of captions to match their reading skills. For more information see the SMIL specification.
Equivalents for sounds can be provided in the form of a text phrase on the page that links to a text transcript or description of the sound file. The link to the transcript should appear in a highly visible location such as at the top of the page. However, if a script is automatically loading a sound, it should also be able to automatically load a visual indication that the sound is currently being played and provide a description or transcript of the sound.
Note: Some controversy surrounds this last technique because ideally the browser should simply load the visual form of the information instead of the auditory form if the user preferences are set to do so. However, strategies must also work with today's browsers.
For more information, please refer to NCAM.
Text Equivalents for Multimedia
HTML Techniques (W3C source)
If necessary, a text equivalent should be provided for visual information necessary for an understanding of a webpage. For example, consider a repeating animation that shows cloud cover and precipitation as part of a weather status report. Since the animation is supplementing the rest of the weather report (which is presented in text), a less verbose description of the animation is necessary. However, if the animation appears in a pedagogical setting where students are learning about cloud formations in relation to land mass, then the animation ought to be described for those who cannot view the animation but want to learn the lesson.
Note: Following is a related concern, not included on the section 508 guidelines:
WAI Guideline 1.3 Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation. [Priority 1]
For core techniques, see Visual information and motion .