by Matt Campbell and Mike Calvo
Screen readers often need to obtain detailed information about the contents of the display. For the past several years, these products have obtained such information by chaining their own code to the display driver, at least under Windows 2000, XP, and Server 2003. Today, the new display driver model introduced in Windows Vista does not support display driver chaining, so the currently dominant screen readers now install a mirror driver instead. However, Windows must disable significant functionality to make the mirror driver work. The results of disabling that functionality have the potential to adversely affect even users who are totally blind. To support mirror drivers as used by screen readers, Windows must also continue to support the antiquated Windows XP display driver model. Meanwhile, newer screen readers are using an alternative technique called API hooking to obtain essentially the same information about the contents of the display, without any crippling side effects, even under Windows Vista. API hooking has considerable advantages over display-driver-based techniques now, and the advantages will become more significant as Windows continues to evolve. This paper explains the history that has led to the use of mirror drivers, the problems introduced by mirror drivers, and the advantages of API hooking.
To understand why the currently dominant screen readers now use mirror drivers, it is important to understand the problem that mirror drivers are intended to solve and the techniques that these products have previously used to solve this problem.
The primary problem that Windows screen reader developers face is that neither Windows nor the majority of applications were designed to be used by the blind. Screen reader developers must therefore retrofit accessibility onto Windows and a wide variety of applications. This was especially true in the formative years of the Windows screen reader industry. Before the introduction of Microsoft Active Accessibility and the rich document object models of Internet Explorer and Microsoft Office, screen readers received very little help from applications or Windows itself. Even today, screen reader developers must spend considerable resources retrofitting accessibility onto applications which do not support it by design.
To make applications or operating system components accessible in the absence of a proper accessibility framework or a sufficiently rich object model, screen readers must obtain detailed information about the contents of the display itself. At its simplest, this information includes all of the text on the display as well as some form of identifier for each image or icon. This information is often much more detailed; it has historically included the position, foreground color, background color, and font attributes of each piece of text on the display, and even the exact position of each character. All of this information about the contents of the display is collectively known as an off-screen model.
Because neither Windows nor most applications provide this information in a form that can be readily queried or presented through speech output or Braille, screen readers must use unusual techniques to gather this information. Specifically, these products intercept basic graphics operations at some point between the application and the display driver, before the contents of the display are reduced to pixels. (Once the information is reduced to pixels, optical character recognition and other artificial intelligence are required to process it.) Fortunately for screen reader developers, the graphics commands sent to Windows display drivers have historically contained the information that screen readers need. Thus, to gather information for their off-screen models, screen readers can either use API hooking or intercept graphics commands at the display driver level.
API hooking is the process of inserting code into running applications at the boundary between the application and the Windows operating system for the purpose of intercepting application operations before Windows carries them out. It follows that screen readers can use API hooking to intercept the operations that applications use to draw text and images, such as the functions in the Windows Graphics Device Interface (GDI). In fact, some screen readers used API hooking effectively under Windows 3.1, as well as Windows 95, 98, and Millennium Edition, collectively called 9x. However, the introduction of the Windows NT line of operating systems, which now includes Windows 2000, XP, Server 2003, and VIsta, brought with it some new challenges in the implementation of API hooking, as well as uncertainty about this technique's viability in the future. In addition, there are some elements of the display, such as standard window title bars and standard menu bars, for which screen readers cannot obtain precise information for an off-screen model through API hooking. Therefore, the currently dominant screen reader vendors have abandoned API hooking in favor of display driver chaining and now mirror drivers, which we will discuss shortly. We will revisit API hooking later in this paper.
The Windows 9x operating systems provided a means for screen readers to install display driver hooks at run time, for the purpose of intercepting graphics commands. Some screen readers used this technique with good results under these operating systems. Windows 9x had no meaningful security constraints; all applications, including screen readers, had unfettered access to the system. No system restart was needed to install or remove display driver hooks, so this technique was suitable even for the installers of these screen readers. Furthermore, no significant functionality needed to be disabled; at worst, these products had to make slight changes to the appearance of the display. Thus, run-time display driver hooking was a reasonable solution under Windows 9x.
As stated earlier, the currently dominant screen reader vendors decided to abandon API hooking under the Windows NT operating systems. These operating systems do not provide a run-time display driver hooking mechanism analogous to the one provided by Windows 9x. Thus, these screen reader developers devised a technique called display driver chaining. When a screen reader which uses this technique is installed, it installs a virtual display driver and registers that virtual display driver as the system's primary display driver. After a system restart, this virtual display driver receives all graphics commands from Windows, sends the appropriate information to the screen reader if it is running, and passes those commands on to the next driver in the chain, which may be either the actual display driver or another screen reader's virtual display driver. When the screen reader is uninstalled, it attempts to remove its virtual display driver without breaking the display driver configuration.
This technique was fraught with problems from its conception. The most infamous problem was that in this technique's first implementations, multiple screen readers had to be uninstalled in precisely the reverse of the order in which they were installed, or the display driver configuration would be broken. To solve this problem, the screen reader developers of the time jointly devised a system called the Driver Chaining Manager (DCM) which Microsoft ratified. Even with DCM, it is not unusual for the display driver chain to be broken, rendering a screen reader unusable or even severely crippling the system's display functionality. Furthermore, because administrative privileges are required to set up display driver chaining, a screen reader which uses this technique cannot be run from portable media on a locked-down system, unless the system administrator has already installed the required virtual display driver for that product. In short, display driver chaining was far from an optimal solution.
Windows Vista introduced an overhauled infrastructure for display drivers. Unlike display drivers designed for earlier Windows NT systems, display drivers designed for Windows Vista receive graphics commands that are useless to screen readers. For example, when a display driver designed for Windows Vista is being used, text is typically not sent to that display driver as text but as pixels. From Microsoft's vantage point, this overhaul simplifies display driver development and has the potential to make Windows much more robust. However, to screen reader developers, it means that display driver chaining is useless if it were even possible under the Windows Vista display driver model.
However, Windows Vista does support a type of virtual display driver called a mirror driver. While mirror drivers have existed for several years, they did not receive the information that screen readers need until Windows Vista. Still, mirror drivers are based on the Windows XP display driver model and not the new Windows Vista display driver model. A mirror driver is installed along with the screen reader and runs alongside the actual display driver. Windows sends graphics commands to all active mirror drivers as well as the actual display driver, so a mirror driver is not responsible for passing commands on to the actual display driver or to any other mirror driver that may be active. A mirror driver is only active while it is needed, that is, while the screen reader is running. Because Windows treats mirror drivers as separate from the actual display driver, mirror drivers can be managed more easily and more reliably than a driver chain. In this sense, the use of mirror drivers is an improvement over display driver chaining.
The fundamental problem with mirror drivers under Windows Vista is that while a mirror driver is active, both the Windows Vista display driver model and the Windows XP display driver model must operate concurrently. This requirement imposes a few significant limitations on the display subsystem and on Microsoft's ability to improve Windows in the future.
Perhaps the most visible new feature of Windows Vista is Windows Aero, a new look and feel that sports transparency in window title bars and borders, fade effects when windows open and close, 3D task switching, window thumbnails, and more. At the heart of Windows Aero is the Desktop Window Manager (DWM), a new system program which radically changes the way that the desktop is rendered. Prior to DWM, applications usually rendered their window contents directly to the screen. With DWM, applications first render their windows into off-screen areas in video memory. DWM then combines all of these off-screen areas into a single image on screen in a process called compositing, many times per second. In addition to enabling the eye candy mentioned above, the use of off-screen rendering and compositing generally makes the visual experience smoother than before.
However, the Desktop Window Manager depends heavily on the new features of the Windows Vista display driver model. It cannot operate efficiently under the Windows XP display driver model, which is still used by mirror drivers. Even if it could, it's possible that the information required by screen readers would be lost by the time DWM renders the desktop to the screen. Therefore, while a mirror driver is active, Windows disables DWM and Windows Aero.
The currently dominant screen reader vendors make the seemingly reasonable assertion that disabling Windows Aero does not cause any loss in functionality but only changes how the desktop looks, which is irrelevant to blind users anyway. However, there are two problems with disabling Windows Aero that these vendors overlook. First, even though blind users are not directly affected by the appearance of the desktop, the sighted people that they work with will notice the change. In particular, if a blind person is providing technical support (perhaps remotely) for a non-technical sighted user, it would be undesirable for the sighted user's first reaction to be, "What did you do to my screen? It doesn't look the same!" First impressions are important, and non-technical sighted users tend to judge first by appearance. A blind person may want to provide remote technical support to sighted users in a configuration where the sighted users don't hear the speech output; in this case, the sighted users don't need to know that a blind person is helping them. Downgrading system functionality, even if only at a superficial level, undermines the potential of remote technical support to level the playing field for blind professionals.
Secondly, while the Desktop Window Manager currently only enables superficial changes in the appearance of the desktop, it has the potential to become much more. After all, DWM represents a complete overhaul in the way that windows are rendered and managed. DWM provides an application programming interface (API) which application developers can use to take advantage of DWM's advanced features. While no currently known applications require DWM to be enabled, it's conceivable that future applications, wishing to take full advantage of Windows Vista's much-touted new features, will require DWM. Furthermore, it's likely that Microsoft will focus future user interface developments on DWM and pay much less attention to the classic, pre-DWM windowing system. It's not hard to imagine that in a future version of Windows, disabling DWM will seriously cripple Windows itself. Microsoft may even want to remove the classic windowing system in future versions of Windows; in this case, disabling DWM would not be an option. While disabling DWM is not currently known to significantly degrade functionality, this is a considerable risk for all screen readers that use a mirror driver, as DWM evolves to be more than just a skin.
According to the Windows DirectX diagnostics tool (DXDiag), DirectDraw acceleration and AGP texture acceleration are not available while a mirror driver is active. Direct3D acceleration is apparently available, but given that some forms of acceleration are disabled by the use of a mirror driver, it is conceivable that Direct3D acceleration is limited to some extent as well. Note that Direct3D underlies the new Windows Presentation Foundation (WPF), one of the most touted new features of Windows Vista for application developers. This means that the use of a mirror driver has the potential to reduce system performance, especially as advanced graphics technologies make their way into mainstream applications beyond multimedia and gaming. However, this is speculation; more research is required to determine the actual consequences of this reduced acceleration, especially the effects of a mirror driver, if any, on Direct3D and WPF.
The presence of a mirror driver interferes with Windows Vista's content protection system. This means that playback of protected high-definition video, such as HD-DVD and Blu-ray movies or HDTV, is disabled while a mirror driver is active. This was confirmed by a very technically competent user, who said the following in March of 2008:
Earlier this month a DVD finale to a TV show I watched came out. Having gotten it in the mail yesterday, I was quite excited to watch it. I hooked my Dell laptop up to my TV using the HDMI connection and started to play the DVD. Windows Media Player returned an error stating that the current display driver was incompatible because Microsoft was unable to start copy-protection on the driver. Being familiar with how the technology works, I turned JAWS off and started System Access. The DVD started like a charm. This highlights the reasons for not using legacy technologies to access the screen-- a simple thing like Windows Media player's digital copy-protection feature is disabled by the use of a mirror driver to access screen elements.
As stated above, the deepest problem with mirror drivers under Windows Vista is that they require both the Windows Vista display driver model and the Windows XP display driver model to operate concurrently. During the development of Windows Vista, this requirement and the need to be compatible with existing hardware limited the degree to which Microsoft could create a new, clean infrastructure for display drivers. Microsoft has stated that display drivers are a substantial stability problem, and that one major goal of the Windows Vista display driver model was to improve stability through a clean break from legacy infrastructure. As hardware advances, Microsoft may want to eliminate the Windows XP display driver model entirely in a future version of Windows. In this case, two scenarios are possible as long as screen readers use mirror drivers; either Microsoft will maintain some form of the Windows XP display driver model for the sake of screen readers, thus compromising system robustness at least while the mirror driver is active, or screen reader developers who have depended on mirror drivers will need to switch to API hooking. Either way, if Microsoft attempts to move further from the Windows XP display driver model, someone will lose as long as screen readers depend on mirror drivers.
Fortunately for the future of Windows screen readers, API hooking is a viable alternative to intercepting display driver commands, which is the aim of both display driver chaining and using a mirror driver. Newer screen readers, such as Serotek's System Access, are using API hooking to gather detailed information about the contents of the display and build an off-screen model. To be clear, a screen reader cannot depend solely on an off-screen model built using API hooking; as we mentioned earlier, API hooking cannot intercept precise information about some user interface elements. However, Microsoft Active Accessibility and the object models of various applications have significantly reduced the need for a comprehensive off-screen model. We hope this trend will continue, especially as new accessibility frameworks such as Microsoft's UI Automation and IBM's IAccessible2 become more widely used. However, there will always be applications that don't implement an accessibility framework or provide a document object model, and API hooking is an effective way to fill in the gaps. In fact, API hooking has considerable advantages over the currently dominant approach now, and the advantages will become more significant as Windows continues to evolve.
Both display driver chaining and mirror drivers require administrative privileges to set up. In contrast, API hooking is effective even under the most restrictive guest accounts on public computers. This makes API hooking suitable for screen readers which run directly from portable media without prior installation. As blind people become more mobile, this feature will become increasingly important, because it means that blind people can take their accessibility with them.
To be clear, screen readers under Windows Vista do need administrative privileges at installation time to enable full access to the User Account Control dialogs introduced in Windows Vista. However, this is not likely to be a concern when running from portable media, especially on public computers.
API hooking does not disable or hinder any system functionality. It works well with the Desktop Window Manager and therefore Windows Aero. It also has no effect on advanced graphics acceleration, including Direct3D and the Windows Presentation Foundation. This means that screen readers which use API hooking will not cripple cutting-edge applications or render them useless.
The effectiveness of both display driver chaining and mirror drivers in screen readers hinges on the assumption that Windows will send graphics commands to the display driver in a form that is useful to the screen reader. As more advanced graphics technologies make their way into mainstream applications beyond gaming and multimedia, this assumption is proving unreliable. For example, the new Windows Calendar application in Windows Vista renders some important text in such a way that the currently dominant screen readers cannot intercept that text, so they currently cannot make Windows Calendar accessible. In contrast, API hooking can be used to support emerging graphics technologies. System Access already supports Windows Calendar, and it will also be adapted to support other cutting-edge applications as needed.
By definition, API hooking has greater immunity to changes in Windows than display-driver-based techniques. This is because the code inserted through API hooking sits at the boundary between applications and the operating system, rather than inside the operating system core. The System Access off-screen model required little modification to work well under Windows Vista, even with Windows Aero enabled. If Microsoft decides to completely eliminate the Windows XP display driver model in a future version of Windows, API hooking will continue to work well. It is also worth noting that Microsoft itself uses API hooking, especially in its application compatibility shims and in the hot-patching feature which enables security updates to be applied without restarting applications or services. Thus, screen readers which use API hooking instead of display-driver-based techniques are prepared for the future.
In the context of screen readers under Windows Vista and beyond, the use of mirror drivers is clearly not a long-term solution. Throughout the development of Windows Vista, Microsoft has attempted to break with the past and radically change many aspects of Windows for the better, including the display driver model; this trend will surely continue in future versions of Windows. It is short-sighted to assume that the use of mirror drivers will never lead to serious degradation of system functionality. Furthermore, it is naive to assume that Microsoft will indefinitely allow innovation and progress to be stifled for the sake of currently dominant screen readers. Fortunately for blind users, newer screen readers leveraging the power of API hooking, along with accessibility frameworks and document object models, to ensure that access to Windows for the blind has a bright future.
© 2007-2010 Serotek Corporation. All rights reserved.
Last updated on April 12, 2010.