eMclient support for accessibility via speech recognition navigation of menus etc

This is more of an FYI than it is a particular request - although I will be mentioning some things that eMclient might do better, as well as some things and I’m pretty happy with eMclient for.

I am evaluating eMclient. I suppose that you might say that I am historically a power user, e.g. using quite a few Gmail labels, enough to cause eMclient to take 10 to 12 seconds every time I open up my tags list. One of the things that attracts me about eMclient is that it supports Gmail labels properly, not just via the IMAP folders kluge. this in itself might be enough to make me purchase eMclient and recommend it to others, as long as there are no other showstoppers. (10-12 seconds Is a bit of a showstopper :slight_smile:

But more importantly, I suffer computeritis which makes it painful and slow for me to do a lot of typing on the keyboard. I use speech recognition to control programs as much as possible, not just dictating text, but writing speech commands to perform operations such as traversing menus to colorize and highlight text, etc.

I am fortunate in that I can still type and use a mouse, although I do try to minimize such usage and do as much as possible by voice. Others, of course, are much more limited and need to do almost everything by voice.

In any case, I am evaluating eMclient with great emphasis on how accessible it is to somebody US use speech recognition. I use Nuance Dragon NaturallySpeaking speech recognition both both for dictation and speech commands. others might use Windows Speech Recognition, mostly for dictation. Note that Microsoft has recently acquired Dragon.

There are certain important things in making software accessible for speech recognition control.

—+ Ideally applications would be speech-enabled (but I’m not holding my breath, most aren’t)

Let’s skip over the least likely: In an ideal world all applications would use of Microsoft SAPI (Speech Application Programming Interface), and to be fully speech-enabled with Select-and-Say etc. however, not very many applications are so enabled, mostly some but by no means all Microsoft apps. like I said, this might be an ideal world, but I’m an open-source guy and unrealistic, and in fact most of the apps I use are not fully speech-enabled, nor even do they have helper apps or plug-ins provided.

In an even more ideal world applications would have a command interface. Like AppleScript, elisp (gag!) Visual Basic for Automation.

—+ sending keyboard and mouse events for speech control of applications

But realistically speech control of applications is usually done by sending keyboard and mouse events from the speech software to the application. e.g. I can make the selected word bold by saying “Press Control B”. now it happens that Dragon already knows that most Windows applications make text bold by doing ^B, so I could also say “bold that”. Similarly, Dragon knows how to click many but not all menu items by name.

But navigating menus by reciting menu name then submenu name then … quickly becomes tiring. For that reason many people use speech commands written in Visual Basic or AutoHotKey or … which send the keyboard sequences rather than uttering the menu item names. (Actually, it’s better if the automation scripts can emit the menu item names, but that is usually not possible unless you have a scripting language like AppleScript.) Anyway, speech commands will typically admit keyboard events, i.e. hotkeys or keyboard shortcuts or menu accelerators, to navigate the menus, often selecting multiple menu items at one time. I.e. speech commands are often compound menu operations.

Using speech recognition has slightly different characteristics than using a menu interface designed for mouse: with mouse & menus you are very much limited in what you can have on the screen, whereas with speech you can fairly easily remember how to say many more commands that are visible on the screen at any time.

I will typically start automating an application like eMclient or speech recognition

  1. by copy/pasting all of your keyboard shortcuts and then editing them into speech commands was names are nearly always the short description of the keyboard shortcut

  2. traversing many if not all of your menus that can be opened by keyboard shortcuts, and recording to keyboard accelerators for things which are not already keyboard shortcuts. I love it when I can do this automatically, otherwise I do it for the most frequent commands.

e.g. for eMclient I am almost certain to have commands that I will call something like

“tags menu” or “tags dialog” => {alt}{m}{t} – just make it easier to get to the top level menu for tags

but also

“gmail tag (menu|dialog)” => {alt}{m}{t}{shift+tab 3}{down 3}{enter}

and probably also

“gmail tags” => {AppsKey}{g}{down 10}{enter}

Which selects the alternate eMclient way of setting a tag on a message, providing the option of searching. ( unfortunately this example incurs a 10 to 15 second stall whenever I use it - eMclient is not quite ready to handle my roughly 300 Gmail labels/tags).

My apologies if you already know all of the above. I just wanted to provide background because many people don’t. many people have trouble believing that the state-of-the-art in speech control of applications is so primitive.

Yes, it’s quite a bit of work to create speech commands. But if the alternative is that you cannot use a computer… yes, it would be really nice if all application software companies created speech-enabled apps, but since that hasn’t happened and isn’t likely to happen soon, people like me write and share speech commands, and there is a community of consultants who do it for hire for people who are not programmers.

I don’t expect software vendors to fully enable speech accessibility for their applications. but most modern software uses keyboard shortcuts and menu accelerators as well as mouse clicking, and there are simple things that can be done when designing your keyboard shortcuts and other aspects of replication but make it much easier for others like me to enable speech control.

—+ Keyboard Control as much as possible

First, it is much easier and more reliable to send/emulate keyboard events than it is mouse events with speech recognition. it is much easier to utter or emit a keyboard sequence like {alt m}… than it is to move the mouse to the second icon in the upper left corner of the window. this is why I am very fortunate to still be able to use my mouse, rather than having to resort to a “mouse grid”.

Therefore, it is really really good if there is a keyboard driven way of accessing every menu and icon on the screen, and all subitems on each of their submenus and dialogues etc.

Most applications compliant with Windows and Apple user interface design standards at least enable this for the top level of menus, but they often stop providing it once you are in a submenu or a sub submenu :frowning:

NOTE: I am NOT asking for keyboard shortcuts for everything. There just aren’t enough unused keys with alt/shift/control modifiers.

—+ Consistency Please!

and annoying problem when automating windows style menu key accelerators is when the ease used depend greatly on context. e.g. {alt m} opens the eMclient menu, {alt m}{s}{s} opens the Messages submenu, {alt m}{s}{s}{a} archives the message and {alt m}{s}{s}{a}{a} adds a note - except when eMclient archiving is not enabled (eg because using Gmail archiving), in which case {alt m}{s}{s}{a} will open a note instead of archiving the message. that’s not too bad in itself, but if it’s part of a longer sequence of commands things can go badly wrong.

basically, if keyboard navigation to a menu item or icon is very context-sensitive, frequently grayed out, reliable speech automation cannot use it in compound commands - and frequently it’s dangerous to do it at all.

life would be easier if a given keyboard accelerator letter was only use once and only once in any menu. That’s unlikely, I know.

a further example: in the eMclient message writing window you can use back tab, {shift+tab}, navigate through things like the To and From fields, and the line of Icons above the text box. unfortunately, the number of times you have to press {shift+tab} to get to an icons such as the [A] font style icon, which opens up a dialog to choose foreground and background colors, CHANGES ACCORDING TO WINDOW SIZE.

Yes, I know, Microsoft’s UI standards encourage such context sensitivity. Somebody who’s interactively controlling can look at what’s selected. But unless the speech command can similarly look at what’s selected, well, it’s hard to use such keyboard shortcuts.

This would not be all that bad for the bold and italic icons at the top of the eMclient New message window, since there are alternative ways of doing the same thing, Ctrl+B and Ctrl+I. but as far as I can tell there is no way of specifying the text background color in eMclient except by using those icons (or editing HTML, which is what I do in Thunderbird).

If you can provide a consistent way for keyboard navigation to an equivalent command or menu item, please…

—+ discoverability of keyboard menu accelerators

It’s really good that Microsoft’s UI standards make keyboard accelerators for menus discoverable, e.g. by underlining the letter to be typed, and/or listing the even faster keyboard shortcuts on the side.

it’s unfortunate that sometimes they are not so discoverable except by a trial and error

e.g. in the {alt}{m}{s} message menu, the fact that letter A can be typed to select Add Note is not visible. at least not on my screen.

—+ Absolute vs Toggle&Relative Commands and Menu Items

Because there are space limitations on menu items, keyboard shortcuts and accelerators, mouse given menu items are often a relative or toggle commands - e.g. they might toggle between on and off, or rotate between Normal, Wide, and Narrow views.

an interactive speech user can use such toggle or relative commands. But it is quite easy for an interactive speech user to say “bold on” or “bold off”, and can be left confusing and toggling.

It is hard to use such relative or toggle operations in compound commands, whether speech or other, if the speech command cannot tell if it is already on or off.

a classic cause of bugs in speech commands is when they have implicit dependencies on initial state that cannot be verified. It’s really bad when the command is something completely different from what you want it to.

—+ enable context sensitivity for speech control

the basic problem with automating applications by emitting mouser keyboard events is that it is often open-loop. The speech command system often held that you are in a message window or browsing the message list. if the user invokes a speech command in the wrong context, and if the keyboard events drive completely different menus that are expected, there can be a cascade of errors

speech command systems have some ability to detect application context.

for example, Dragon can enable different commands based on application (.EXE name) and window title. Inside the Dragon speech command basic code, or in AutoHotKey, you can also check the name and class of the focused control or the control under the mouse.

it is nice if the window name distinguishes, as eMclient does, the main window (title /^eM Client$/) with the message list from the message composition window (title /.*New Message$/). Unfortunately cannot distinguish a completely new message from a reply from a forwarded message, at least not if the user has deleted Re: or Fw:, but beggars can’t be choosers

Commands often need to know whether they are in message body or the To/From/Subject lines - E.g. to determine if speech commands do things like change text color are allowed ( or are likely to go astray). The focussed control class/name can tell us this, e.g. ClassNN:
WindowsForms10.Window.8.app.0.3b93019_r3_ad1 vs vs Chrome_WidgetWin_01. but any good programmer hyperventilates at the fragility of doing something like this - such wizard generated names often change. you know, the programmer is allowed to specify a friendlier name for these controls… you probably don’t want to do that for every control, but the really most important control like the message contents might be nice…

In an email program, in the main window, it is really nice to be able to know if the focus is in the message list or the folder list or the reading pane. eMclient is about par for the course with Outlook and Thunderbird: The focus control distinguishes the reading pane from anywhere else, but you can’t tell if the folder list or the message list is focused. this can lead to speech commands going astray if menus differ depending on where the focus is.

because of this, I often try to put the window in any known good state before running a command. e.g. if there is a keyboard shortcut that always puts the focus in the folder list, hallelujah. i.e. an absolute keyboard command, rather than a relative or toggle command.

Outlook has A limited number of such absolute focus positioning key sequences. I haven’t found so far for eMclient. I actually hope that eMclient doesn’t need it, I hope that the keyboard shortcuts and menu acceleraters are independent of where the focus is - but I doubt it.

—+ Speech recognition errors can lead to error cascades

One of the biggest problems with automating applications by emitting mouse & keyboard events is that it is often open-loop. The speech command system often cannot tell what the application/window status, so cannot tell if a particular key sequence K will perform action A1 or A2.

I have reason to hope that eMclient may be better in this regard than Thunderbird, since eMclient was drawn to my attention by a post saying “What I love so far is that I can dictate anywhere in the client and it so far hasn’t triggered random commands” in a speech recognition forum

See next item.

Also, see I’m not just criticizing eMclient, I’m mentioning the things that eMclient seems to be doing well.

—+ Unmodified keyboard shortcuts considered dangerous

A classic problem with speech commands is when what you intended to be a command is misrecognized as dictation. now, it is not that bad if a command like “tag menu” is misrecognized as dictation “tag menswear” and entered into an email message - the user can see and easily fix. but it is really really bad if the ordinary, unmodified letter {t} actually does something, i.e. itself is a keyboard shortcut, and so on for the rest of the letters.

fortunately, eMclient seems pretty well behaved. as far as I can tell all of the eMclient keyboard shortcuts require a modifier like Ctrl or Alt. THANK YOU!

( by comparison, Thunderbird and Gmail do not require modifiers and can be very dangerous to use. they have naked printable keyboard shortcuts, e.g. Gmail #delete and e to archive. it is easy to accidentally archive a whole slew of messages by accident, and quite possible to delete whole slew. sometimes you can catch it in time to undo, sometimes not, depending on how much misrecognized dictation was entered in and the size of the undo buffer.)

another class of error is when a key sequence terminates, e.g. presses a grayed out menu item… but the command is still emitting keys. eMclient does seem to be vulnerable to this sort of error. the only way that I really know how to mitigate this sort of problem is to try to flush all pending keyboard events — but there’s no real standard way to do that — or to absorbed incoming keyboard events for a certain amount of time and toss on the floor, or to open a dialog box warning about the error. I know, this is annoying — I only mention the problem because you might be able to think of a better way.

—+ Timing Dependency

Another of the classic problems for application automation, not just speech recognition but any automation is timing dependency. It is scary how many times the person writing the speech command has to add a delay to give a window or menu a chance to open. it can be really bad if a speech command tries to send keyboard events to a menu that is not yet open, or which has failed to open, and if those keyboard events do something completely different.

speech commands are capable of waiting until a window or menu is open — as long as they can tell what they need to wait for, e.g. as long as the menu has a recognizable and unique title.

while it would be good if the speech command can do this all the time, realistically it only gets done before particularly slow menus and windows.

My poster child for a slow eMclient menu is opening up my list of Gmail labels/eMclient tags, which takes between 10 and 15 seconds, and sometimes more, at least on my machine. I really don’t want to wait a worst-case time of 20 or 30 seconds all the time. unfortunately, there does not seem to be any unique identifier or window title for the list of Gmail labels, so I cannot tell what it is open. fortunately, while the list is waiting to be opened the parent menu transitions to ahk_class Ghost, so I can wait until it’s done. obviously this is fragile, I suspect if you change your widget library it will break my speech commands. it would be nicer if the menu had a name like “list of Gmail tags”

—+ Wrapping Up

If you have read this far, then you are a conscientious tech support person.

If my having written written this and you having read this has increased the chances of making your application slightly better for speech recognition, then it has achieved my goal. (Plus, I am recording this diatribe for my blog.) I mainly just wanted to write this once and only once in detail, rather than having to make small explanations about why I am making asking several of my eMclient forum questions.

If eMclient makes itself more accessible for speech recognition users, and thereby makes more sales, great.

I just wanted to point out that making your application friendlier to speech recognition accessibility is not necessarily an all or nothing question of rewriting it to use Microsoft SAPI.

I am excited to find eMclient, the first Windows app that I have found that has proper support for Gmail tags apart from Gmail itself, which also provides the message sorting and saved searches which Gmail does not. I hope that I find no showstoppers in using eMclient with speech recognition. ( unfortunately the 10 second stall opening the list of Gmail labels/eMclient tags is a showstopper, but it should not be that hard to fix.)