Caret Navigation in Web Applications
A little over two years ago, I left Google. In my farewell blog post, I noted:
"One of the many things Google has taught me is that building simple things is often extremely complicated and [Google Tasks] was no exception. (I think I've spent at least one man-month trying to figure out the best way for the cursor to move up and down between tasks, but that's a topic for another post.)"Somehow, my blog post made it onto Reddit where some disgruntled group of people modded down my post because they were hoping to hear more about this whole cursoring thing.
Fast-forward to today where a little company named Asana has decided to take a stab at task management software. Now, I still use Google Tasks heavily, and even though the Tasks community has loudly and clearly expressed its desire to share task lists, Google has failed to come through. Finding myself in need of a shared task list, I decided to give Asana a try.
Upon firing it up, one of the things that Asana tells you is that it aspires to be a lightweight text editor, of sorts. This is precisely the approach that we took with Google Tasks, so I was intrigued to see how Asana chose to deal with all of the design issues that my team encountered several years earlier.
I was disappointed to discover that the Asana team has taken a number of shortcuts: hierarachy is limited, tasks are not linkified, and most importantly, task text does not wrap. From experience, I can tell you that each of these shortcuts makes things considerably easier to implement, but with three years of development and $10.2M in funding, I would expect a little more.
But my goal here is not to bash Asana (on the contrary, I really like their sharing and tag features, email integration, and support for headings), but to shed some light on why dealing with wrapped text and cursoring is so complicated. Hopefully this will help others build high-fidelity, web-based user interfaces that require text editing.
Caret navigation in Google Tasks
In a normal text editor, when you use the up or down arrow keys to navigate between lines, the editor generally does its best to preserve the x-coordinate of the cursor as the y-coordinate changes. For example, consider navigating the cursor downwards through following chunk of text starting from the middle of the word "little:"Here is a nice lit|tle passage. (Character offset: 18)If you do so in a native text editor, the cursor will follow the path marked by the red | characters. Note that each | appears at a different character offset in each of the three lines. Feel free to verify this yourself in the following textarea:
It contains three| sentences. (Character offset: 17)
None of which is| all that interesting. (Character offset: 16)
This is significant because this means that it is invalid to use the character offset of the cursor position on the previous line as the character offset for the cursor on the next line. (Note this would not be an issue if a monospace font were used, and there were many days that I got out of bed wondering if we could get away with releasing Google Tasks with a monospace font, though you will see that indenting and wrapping preclude using a monospace font as a drop-in solution to this problem.)
In order to emulate native cursoring, you need to "do the math" and calculate where the cursor is and where it should go. This decision process roughly breaks down as follows:
- Which element had focus when the user pressed a key?
- Where was the cursor when the user pressed that key?
- If the cursor should be moved to a new task, where should it be placed?
In Google Tasks, each task is displayed in a
contentEditable
div
(except on Firefox 2, which doesn't support contentEditable
elements, and will
be discussed later). By comparison, Asana has one input
element
per task, which means that task text cannot wrap.
This is my primary issue with Asana, as I often want to
write long task descriptions, all of which can be viewed on one screen.
Asana forces me to move more information into the "Notes" section, which is
frustrating because it is only possible to see one note at a time.
Although it has not been widely publicized, there is a fullscreen view for Google Tasks: https://mail.google.com/tasks/desktop. For each task in your list, you can see the first line of notes associated with each task without digging into the details pane. (Though if you want to get to the details pane, it is easily accessible via either clicking the arrow on the right of the task, or hitting shift+enter when the task has keyboard focus.)
If you are a heavy Tasks user, then you should definitely bookmark the fullscreen view. I have created a Chrome extension that provides a link to Tasks from the bar that is at the top of most Google web properties. Unfortunately, sometimes when Google updates its web properties, my extension breaks, but I do my best to keep it up to date.
Fortunately for Asana engineers, displaying each task in an input
element
makes cursor management considerably easier to implement.
Because task text is only one line, when the user hits the down arrow,
it always means the user is navigating to the next task (likewise for the up arrow).
By comparision, when a
user hits the down arrow in Google Tasks, the user may either be
navigating within a multi-line task, or navigating to the next task.
This seemingly small difference in product makes a world of difference in engineering.
Which element had focus when the user pressed a key?
Because key events bubble, it is possible to add a single key event listener at the root element that contains all of thecontentEditable
task elements.
When the listener is triggered, the target can be inspected to determine which
element had focus when the key event was fired, and the keyCode
can
be inspected to determine whether the user pressed the up or down arrow.
Adding one key event listener per task div
is not desirable because it would use more
memory since the number of listeners would grow with the length of the task
list. Listeners are frequently constructed as anonymous functions, which means
they often carry references to environment objects that cannot be garbage
collected until the listener is removed.
Further, using a single listener makes bookkeeping simpler.
As the user modifies the list, the div
s used to display task text
can be added and removed from the DOM without having to keep the set of key
event handlers in sync.
You can learn more about the more general form of this pattern on Kushal's blog.
Assuming up or down
was pressed, the next step is to determine whether to allow the key event to
proceed as it normally would (moving the cursor within the task), or whether it
should be suppressed and the cursor should be moved into an adjacent task.
The latter occurs when either the cursor is in the top line of the div
and
up is pressed, or when the cursor is in the bottom line of the div
and down is
pressed. Therefore, the location of the cursor must be determined in order to
make this decision.
Where was the cursor when the user pressed that key?
Unfortunately, there is no cross-browser API to determine the (x, y) location of the cursor directly, so we must use other heuristics. Specifically, we use the DOM range API to determine the cursor position, which often, but not always, maps to a unique (x, y) position on the screen.For those of you who are unfamiliar with the DOM range API, it is basically a way of representing a contiguous subtree within a DOM. (I became intimiately familiar with this API when working on Chickenfoot.) When you select HTML on a web page, the selected HTML is always represented as a DOM range. When a single point in the DOM is selected, as is the case for a cursor, it is referred to as a "collapsed" range.
Therefore, when a key is pressed,
we can ask the browser for its selected range at that instant,
double-check that it is collapsed (i.e., represents the cursor position),
and then map that range to an (x, y) coordinate on the screen.
Originally, I did this by recreating the HTML of the task in an offscreen
element with a span
inserted at the range position so I could use
standard DOM APIs to calculate the position of the span
, which I
would then use to calculate the position of the cursor relative to the
div
containing the task. For example, suppose a task was rendered
in the list using the following HTML:
<div class="task" contenteditable >Take out the trash and bundle the recycling.</div>and that the selection was a range whose start and end elements were the text node within the
div
and the start and end offset were 23 (just
before the "b" in "bundle"). The next step would be to add the following HTML to
the DOM (the whitespace between nodes is added for clarity, but would have to
be removed in practice to match the original task HTML exactly):
<div id="buffer" style="position: absolute; left: -10000px; top: -10000px"> <div class="task" contenteditable> Take out the trash and <span id="cursor"></span>bundle the recycling. </div> </div>Now the (x, y) offset relative to the upper-left-hand corner of the task can be calculated as follows using some standard utilities from the Closure Library:
var cursorEl = goog.dom.getElement('cursor'); var cursorOffset = goog.style.getPageOffset(cursorEl); var bufferEl = goog.dom.getElement('buffer'); var bufferBounds = goog.style.getBounds(bufferEl); var xOffset = cursorOffset.x - bufferBounds.left; var yOffset = cursorOffset.y - bufferBounds.top; var taskHeight = bufferBounds.height;The
yOffset
can be used to determine whether the cursor is in the
top or bottom line of the div
. Specifically, if yOffset
is less than or equal to the top padding of the .task
CSS, then the
cursor is in the first line of rendered text. Conversely, if yOffset
is greater than taskHeight
minus the line height of the task text
and the bottom padding of the .task
CSS, then the cursor is in the
last line of rendered text. (Note that due to browser differences and/or
subpixel rendering, you may have to add a pixel of tolerance when doing these
calculations.)
When I originally tried to implement this technique, my first instinct was to
insert a tiny character wrapped in a span
to serve as the element
to measure because I assumed it would need to take up a non-zero amount of space
for the DOM calculations to work.
I used a pipe (|
) character to minimize the change in width of
the task content, so I originally had something like:
Take out the trash and <span id="cursor">|</span>bundle the recycling.It turned out that inserting this extra character could cause wrapping behavior that would screw up the calculations. Fortunately, my team mate suggested I try dropping the character, which turned out to work just fine.
Bear in mind that the #buffer
must display text exactly the same
as the original element in order to reflect line breaks accurately. The HTML
snippet above neglects this in two important ways.
First, the width
of #buffer
should be specified as an
inline style to match that of the div
being cloned. Without a fixed
width, the absolutely positioned #buffer
will display as one long
string of text, in which case the cursor calculation logic would always
determine the cursor to be in both the first and last line of task text.
Second, although #buffer
has the same CSS class as the original
div
, that does not mean that it has inherited all of the same
styles as the div
. For example, if the original HTML were:
<style> body { font-size: 14px; } .task { font-size: 120%; } </style> <div style="font-size: 10px"> <div class="task"> Take out the trash and bundle the recycling. </div> </div>In this scenario, the resulting font size of the
.task
element
would be 12px
, but if that element were cloned and added as a child
of the DOM, then its font size would be 14px
. This difference in
font sizes would cause the text to draw (and possibly wrap) differently in
#buffer
, which would throw off the cursor calculations.
Therefore, it is important to include any styles that cascade into .task
that affect rendering as inline styles of #buffer
.
It is true that if the original .task
element were used in place
of the buffer, then there would be no need to set the width or cascading styles
on the buffer element. However, redrawing the content of a contentEditable
element while it has focus may disrupt focus considerably on some browsers.
Further, drawing into an absolutely positioned element rather than one in the
middle of the document should, in theory,
reduce the amount of document reflowing that the browser has to do after
modifying the DOM.
In practice, setting the width of the #buffer
and determining the
styles that cascade into it is not that difficult. If you look at the UI for
Google Tasks in Gmail, Calendar, or the standalone view, you will see that
it actually wraps itself in an iframe
so that it can be embedded
anywhere without having to worry about unexpected styles from parent elements
cascading into it, which would throw off the cursoring calculations.
I have discussed other advantages of this approach
on my blog,
as well as on
p. 392 of my book in the section on "Externs versus exports."
The final gotcha in this approach, which is what makes this technique imprecise, is that a collapsed DOM range may not map to a unique (x, y) position on the screen. This happens when the cursor is at the beginning or end of a line. For example, if the area where the task was rendered were particularly narrow such that the text wrapped before the "t" in "the recycling," then the cursor could be in either of the following positions that maps to the same DOM range:
Take out the trash and bundle |This is significant because if the cursor were at the end of the first line, then hitting the up arrow should navigate into the previous task while hitting the down arrow should move the cursor to the end of the second line of the task. Conversely, if the cursor were at the start of the second line, then hitting the up arrow should navigate to the beginning of the first line of the task while hitting the down arrow should move the cursor into the next task.
|the recycling.
Feel free to use the arrow keys in the following textarea to see how native cursoring behaves. At least in the current versions of Chrome and Safari, using the left and right arrows to move the cursor seems to "ignore" the space after the "e" in "bundle," as right-arrowing from there brings you to the start of the next line. However, it is possible to use the mouse to place the cursor at the end of the space after that "e" on the first line, though strangely hitting the right arrow from that position moves the cursor into the second character of "the" instead of to the start of the line.
The question remains: how do you implement a solution that accounts for this edge case? One option is to treat it as a hysteretic system such that all previous keyboard and mouse input should be recorded so that the current cursor position can always be calculated, though that would be rather complicated.
A simpler, but imperfect, solution is to resolve the ambiguity by assuming that the cursor is at the beginning of the lower line rather than at the end of the upper line:
|the recycling.
In practice, this is more likely the case because the user is probably arrowing down through the left side of the task list, or the user has right-arrowed to get to the start of the next line. The only case where this heuristic is incorrect is when the user uses the mouse to place the cursor on the right side of the space at the end of the line.
(You can verify this failure case yourself in Google Tasks by typing the above task text into Tasks in Gmail, using the mouse to place your cursor where the red mark is, and then hitting the down arrow. Assuming it is not the last task in the list, the cursor will go to the start of the next task rather than to the end of the current task because Google Tasks assumed you were moving the cursor from the blue position.)
Although this appears to be a practical way to resolve the ambiguity, this introduces the challenge of determining whether you are in an ambiguous case! For example, consider the following HTML (whitespace between HTML tags is included for clarity):
<div id="buffer" style="position: absolute; width: 189px"> <div contenteditable class="task"> Take out the trash and bundle <span id="cursor"></span>the recycling. </div> </div>
This is the HTML that represents the selection at character offset 30 within
the task.
Using the cursor calculations introduced earlier yields an
xOffset
of 177
and a yOffset
of 0
.
This (x, y) position corresponds to the red cursor
position on the upper line rather than the blue cursor position on the lower line.
Unfortunately, this is the opposite of the behavior we want, so we need
to amend our technique for calculating the cursor position.
The solution is to take a second measurement. For example,
when cursoring down, we perform the same cursor calculation at
one additional offset within the selection. In this case, performing the cursor
calculation for the DOM range that corresponds to character offset 31 yields
an xOffset
of 4
and a yOffset
of 16
.
Because the yOffset
in the second measurement is greater than the
yOffset
of the first measurement, the current cursor position must
be on the boundary of a line break. We have now identified being in the ambiguous
case of the red versus blue cursor position, but as explained above, we
choose to assume the cursor is at the beginning of the lower line, and move the
cursor into the next task accordingly.
If the cursor should be moved to a new task, where should it be placed?
As explained earlier, theyOffset
and taskHeight
can
be used from the cursor calculations to determine whether the cursor should be
moved to the previous or next task. The remaining question is how to determine
where to place the cursor in the destination task.
From the calculations, we have the xOffset
of where the cursor was,
so the x in the (x, y) of the new cursor position should
be as close to xOffset
as possible while still maintaining the
correct row position. When moving the cursor downward, the desired row is
always the first row of the new task where the new yOffset
should
be 0
. However, when moving the cursor upward, the taskHeight
of the destination task must be calculated in order to determine the yOffset
for cursor positions in the last row of that task.
Once the desired destination xOffset
and yOffset
values have
been calculated,
we employ a binary search to find the closest cursor position in the
destination task. Specifically, each character offset in the destination task
text is a candidate location for the new cursor position. We can create a search
space by populating our existing #buffer
element with span
elements at each character offset. For example, if the destination task text
were "homework," then the HTML would be as follows (again, whitespace between HTML tags is included for clarity):
<div id="buffer" style="position: absolute; width: 189px"> <div contenteditable class="task"> <span id="buffer-0"></span>h <span id="buffer-1"></span>o <span id="buffer-2"></span>m <span id="buffer-3"></span>e <span id="buffer-4"></span>w <span id="buffer-5"></span>o <span id="buffer-6"></span>r <span id="buffer-7"></span>k <span id="buffer-8"></span> </div> </div>To get the (x, y) position of character offset n, we can use
goog.dom.getElement('buffer-' + n)
to get
the corresponding placeholder element and then use our existing cursor calculation
logic to determine the corresponding (x, y) position.
At each step of the binary search, the candidate (x, y) position
is compared to the destination xOffset
and yOffset
values:
var evaluate = function(n) { var cursorEl = goog.dom.getElement('buffer-' + n); var cursorOffset = goog.style.getPageOffset(cursorEl); var yDelta = yOffset - cursorOffset.y; // If the candidate point is not in the right row, // then this cannot be a match. if (yDelta !== 0) { return yDelta; } return xOffset - cursorOffset.x; }; // Number of characters in the task text if the task element has a single // child node, which is a text node. var taskLength = taskEl.firstChild.nodeValue.length; // Create an array of 0..taskLength where each element is its index. var candidates = new Array(taskLength + 1); for (var i = 0, len = candidates.length; i < len; i++) { candidates[i] = i; } var characterOffset = goog.array.binarySelect(candidates, evaluate);The resulting
characterOffset
can be mapped to a collapsed DOM
range that can be used as the selection to set the new cursor position. Note
that when characterOffset
is negative, indicating that no exact
match was found, additional logic is required to find the best match.
When characterOffset
is negative, the
corresponding insertionPoint
is -1 * (characterOffset + 1)
.
When cursoring downward, either insertionPoint
or
insertionPoint - 1
should be the desired characterOffset
.
For example, if the binary search "overshoots" and returns an
insertionPoint
that corresponds to the first character in the second row,
then the cursor should actually be at the last character in the first row, which
should be insertionPoint - 1
:
var pickBestPoint = function(insertionPoint) { if (insertionPoint === 0) { return 0; } var cursorEl = goog.dom.getElement('buffer-' + insertionPoint); var cursorOffset = goog.style.getPageOffset(cursorEl); if (yOffset !== cursorOffset.y) { // insertionPoint corresponds to the wrong row: choose the previous index. return insertionPoint - 1; } var previousCursorEl = goog.dom.getElement('buffer-' + (insertionPoint - 1)); var previousCursorOffset = goog.style.getPageOffset(previousCursorEl); var xDelta1 = xOffset - previousCursorOffset.x; var xDelta2 = cursorOffset.x - xOffset; return xDelta1 < xDelta2 ? insertionPoint - 1 : insertionPoint; }There are also edge cases when cursoring upwards and
characterOffset
is negative, but how best to handle those is left as an exercise for the reader.
One final note is that when determining the xOffset
of both the old
and new cursor position, bear in mind that if the adjacent tasks are not at the
same depth in the hierarchy, then some adjustment will have to be made.
Additional challenges
The previous section explained the basics required to emulate native cursor movement in a web application. This section discusses a number of additional product requirements imposed by Google Tasks that made the problem even more difficult.Supporting user agents that do not support contentEditable elements
When I started working on Google Tasks, Firefox 3.0 was still in beta and IE7 had just been released, so supporting Firefox 2.0 and IE6 was a firm requirement for the product. Like most mid-2000 frontend engineers, I did the bulk of my development in Firefox and would go back and add hacks for IE later to provide cross-browser support. Unfortunately, Firefox 2.0 did not supportcontentEditable
elements, so I was
unaware of them when I started on Tasks,
and therefore my original implementation did not use them at all.
The contentEditable
attribute
was a feature pioneered by Internet Explorer 5.5, which was not supported by
Mozilla until Firefox 3.0. I first learned about contentEditable
when talking to
two
Googlers who worked on
Closure Library's rich text editor widget,
as they were intimiately familiar with the difficulties of creating a text editor
in the browser.
A loose account of this conversation is provided by Nick Santos in the foreword of my book,
Closure: The Definitive Guide.
Today it may seem as though worrying about user agents that do not support
contentEditable
is a problem of the past, but support on mobile is fairly
recent. Specifically, contentEditable
support
was only introduced in iOS 5.0 and Android 3.0,
while Opera Mini/Mobile do not yet support it at all.
Originally, each task was displayed as an ordinary div
. A task
could receive keyboard focus in one of two ways: the user could click on a
task with the mouse, or the user could hit the up or down arrow to navigate to
an adjacent task.
(Note that there are other edge cases that we will not even discuss, such
as hitting backspace at the beginning of a task, which joins it with the
previous task, putting the cursor at the boundary of the join.)
In either case, as a result of the user action, a singleton
textarea
would be "shuttled" across the screen so that it was
displayed over the task's div
.
The content of the textarea
was updated to match the content of the task exactly,
and once the textarea
was in place, the cursor was moved to the
point were the user would expect, using logic similar to that described in the
previous section. It was imperative that the textarea
line up
exactly, or else the task text would appear to jitter as the user cursored up
and down through the task list. (It took a considerable amount of experimentation
with CSS to eliminate this jitter.)
You might wonder: why not just have one textarea
per task rather
than waste so much energy shuttling a single textarea
around?
Initially, one of my weaker arguments was that this approach would make it
easier to select task text because you can drag to select text across
div
elements, but you cannot do so across input
or
textarea
elements. (Go ahead, try selecting a bunch of tasks in
Asana, copying them to the clipboard, and pasting them in a text editor.)
It turns out that once we supported drag-and-drop in Google Tasks, most attempts
at selecting task text were misinterpreted as drag-and-drops, anyway.
However, the more compelling reason turned out to be support for rich text in
tasks. Today, the only task formatting that Google Tasks supports besides plaintext is
hyperlinks, but because tasks are displayed in div
s rather than
textarea
s, all sorts of additional markup is possible. Again, compare
this to Asana where if a task contains a URL, you cannot click on it. Users
clamored for linkification of tasks until we added it.
Unfortunately, one of the drawbacks to the "shuttle" approach was that the
textarea
had to be continually
resized as the user entered text in order to make sure that all of the task
text was visible while the user was typing.
This turned out to be error-prone because occasionally there would
be cases where text wrapped differently in the div
used to measure
the text than it did in the textarea
used to display it,
such that the textarea
was too small.
This caused all sorts of visual errors, so moving to a contentEditable
element
that resized itself natively as the user typed eliminated this problem altogether.
Ideally, once the contentEditable
solution was introduced, we would have
deleted all of the code for the "shuttle" solution, but Firefox 2.0 died a slow
death, so we had to keep it around for quite some time. These different code
paths created a significant additional burden in testing, so we were eager to
eliminate it. Although
Google Apps dropped support for Firefox 2.0 on March 1, 2010, spoofing the
user-agent as Firefox 2.0 on the fullscreen view for Google Tasks today
indicates that the textarea
code path is still alive and well.
Wrapping task text
We believed that to be an effective task list, you needed to be able to see it alongside your email. Obviously, Gmail already has a lot going on, so there was not much real estate to allot for Tasks. Google Tasks received a small, collapsible space called a "mole" because opening and closing chat windows in Gmail is akin to "whack-a-mole."Originally, the Tasks mole was slightly wider than the chat moles, but no taller. (Though recently, it appears that someone failed to update the CSS for the Tasks mole in one of recent Gmail redesigns, so now the heights do not match, and the Tasks mole is narrower than the chat moles.) Because the Tasks mole was less than 250 pixels wide, we knew that we had to allow task text to wrap, and that task text would wrap often.
Consider the Asana UI where each task is displayed in a
single-line input
. If that interface were embedded in the same
space as Tasks, you would only be able to see about four words per task
(or fewer for a sub-task, which Asana does not allow)
before the rest of the task text would be clipped.
Because Asana is currently used only as a fullscreen webapp, the clipping may be tolerable to most users. However, this design makes it difficult to reuse the existing UI as an embeddable widget that can be viewed alongside other applications where you want to be able to see your task list, such as mail and calendar. For example, Tasks is limited to 162 pixels of horizontal real estate in Google Calendar—it may be cramped, but it's readable!
For the most part, the use of contentEditable
elements to display tasks
addressed the wrapping issue. However, some browsers would not automatically
wrap long strings of characters (such as URLs, which appeared frequently in tasks),
in which case Tasks had to provide the browser some hints on how such text
should be wrapped. These "hints" were strategically placed word-break tokens
in long sequences of non-whitespace characters.
The exact type of word-break token to use varied by browser, as
format.js
in the Closure Library determines the appropriate HTML to use for word-breaking
as follows:
/** * Constant for the WBR replacement used by insertWordBreaks. Safari requires * <wbr></wbr>, Opera needs the ­ entity, though this will give a visible * hyphen at breaks. IE8 uses a zero width space. * Other browsers just use <wbr>. * @type {string} */ goog.format.WORD_BREAK_HTML = goog.userAgent.WEBKIT ? '<wbr></wbr>' : goog.userAgent.OPERA ? '­' : goog.format.IS_IE8_OR_ABOVE_ ? '​' : '<wbr>';Note that once
goog.format.WORD_BREAK_HTML
was introduced into the
task text, the content of a contentEditable
task div
was no longer
guaranteed to be a single text node, but may now be a series of alternating
text nodes and wbr
elements. Therefore, any of the aforementioned
cursoring logic that assumed a task was always rendered as a single text node (of which
there is definitely some) must be updated to account for this relaxed restriction.
Unfortunately, introducing these word breaks interferes with cursoring on some
browsers. For example, a task containing the text browserfeature.js
in Google Tasks will be rendered as the following HTML due to its word-break logic:
browserfeature.<wbr></wbr>jsIf you right-arrow from the beginning of the task on Firefox 11, after you arrow past the
.
character, hitting the right arrow again fails to move
the cursor, and hitting the right arrow once more takes the cursor to the start
of the following task instead of to the right of the j
. It appears
as though the cursor gets "stuck" in the wbr
somehow, interfering
with the native cursoring behavior. Ideally, some sort of browser-specific
workaround would be introduced to fix this bug.
Displaying URLs as hyperlinks
Initially, Tasks did not allow for any sort of formatting for a task: it only allowed plain text. There were many requests for rich formatting, but we resisted: we wanted Tasks to feel lightweight, and we feared introducing too many options would have detracted from that.
One compromise would have been to allow GChat/wiki-style formatting where surrounding text with underscores leads to italicized text, asterisks lead to bold, etc. To get the benefits of unformatted text editing but formatted text viewing, it would make sense for the task to display itself as wiki text when it was being edited, but as rendered HTML when it did not have keyboard focus. However, this would have introduced a jitter as the user cursored through tasks because the number of displayed characters would change as a task gained or lost keyboard focus.
Nevertheless, the one concession we made on this front was hyperlinks. We frequently found ourselves pasting URLs into Tasks, and it was infuriating being unable to click on them (as is the case in Asana). At first glance, the logic to implement this feature seems trivial:
- When displaying a task, linkify the task text.
- When editing a task, use goog.dom.getTextContent()
or an equivalent utility to normalize the content of the task
div
, escape it, and set it as theinnerHTML
of thecontentEditable
div
for editing.
wbr
every 20 characters, while a naïve linkifier might search for substrings
that start with http
and wrap them in a
tags.
The problem is that both of those functions take plain text as input and produce
HTML as output, so it is not appropriate to compose the output of one as the input
of the other. For example, linkifying the text http://cnn.com/
might yield:
<a target="_blank" href="http://cnn.com/">http://cnn.com/</a>If this string were passed to our naïve word-break inserter, it would become:
<a target="_blank" h<wbr>ref="http://cnn.com/<wbr>">http://cnn.com/</a<wbr>>The result is far from valid HTML.
One solution is to linkify the text first, and then apply the word-break inserter to each text node produced by the linkification step. This enables code reuse without sacrificing correctness.
If you look at the HTML produced by Google Tasks, you can see that linkified URLs
get special treatment with respect to word breaks.
For example, if you have the URL http://code.google.com/p/closure-library/source/detail?r=736
as part of a task, Google Tasks will render it as follows
(again, whitespace between HTML tags is included for clarity):
<a href="http://code.google.com/p/closure-library/source/detail?r=736"> http:/ <wbr></wbr> /code. <wbr></wbr> google. <wbr></wbr> com/ <wbr></wbr> p/closure-library/ <wbr></wbr> source/ <wbr></wbr> detail?r= <wbr></wbr> 736 </a>Note how the
wbr
elements are not inserted at arbitrary intervals
in the URL text. Instead, they are placed at more "natural" boundaries so that
if the URL text wraps, it is easier to read. This is no accident! I added this
logic specifically so that URLs would always be easy to use in Tasks,
regardless of the width of the UI.
Testing caret behavior
Implementing the caret-positioning logic was fairly complicated: there were many edge cases it had to tolerate and a lot of browser-specific code under the hood. Once you finally solved an edge case on one of the browsers, you were more than happy to write a test for it because you would want to be alerted to any sort of regression. The problem was that there was no way to write tests.Not only was there no way to write tests, but no one believed me when I told them that there was no way to write tests, and everyone kept asking me why I wasn't using Selenium. At the time, only Selenium 1.0 was stable and WebDriver was far from ready.
Selenium 1.0 works by emulating user input events in JavaScript. That means if
you want to test your autocomplete widget, you instantiate a KeyEvent
object
in code and invoke a method on the input
element to dispatch the event object.
This will send the event through the event system, following the same bubbling
and capturing paths that a native event would.
Take a look at the
events code in the Selenium project to see how event objects can be created in
JavaScript. This code contains a lot of browser-specific logic, but look for
calls to createEvent()
, createEventObject()
,
initMouseEvent()
, and initKeyEvent()
to see how you
can simulate browser events from unprivileged JavaScript.
The problem is that programmatically dispatching a keydown event for the
letter a
on an input field exercises the key handler, but it does
not type the letter a
in the field and move the cursor forward one
character (the input value is not changed at all). This is a significant problem
when you are trying to test a mix of native and custom cursoring, as I was in
Google Tasks. This limitation is what made it impossible for me to write tests
using existing frameworks.
I sought out a solution that would trigger the same logic as native events, and I suspected that I would have to write some low-level code to do so. As I was not interested in learning C or the Windows API to solve this problem, I spent many hours trying to find an easy-to-use API for injecting native input events across platforms. I have periodically performed a search for such an API, and the only solution that I have ever found is java.awt.Robot.
It turned out that Robot
had precisely the API that I needed,
though some browsers did not respond to it perfectly. For example, to test out Robot
,
I created a Java applet that would fire a click event on a div
with a JavaScript
click handler attached to it. If I remember correctly, in Internet Explorer,
the click handler did not fire until I physically moved the mouse after
Robot
had sent the click event.
(Perhaps the click was stuck in some sort of input event processing queue
internally on Windows?) Although Robot
appeared to provide the
right API, it was clear that I was not going to be able to use it on all
browsers.
Fortunately, Robot
seemed to work just fine on Firefox, which again,
was my development browser of choice, anyway. Because I had become intimately
familiar with LiveConnect
(a technology that allows JavaScript to talk to Java applets) as part of my
Master's thesis,
Chickenfoot,
I was excited at the prospect of writing my functional tests in JavaScript so
that I could access both the in-memory data structures in my webapp as well as
Robot
from JavaScript.
Because Chickenfoot ran JavaScript in a privileged environment, it could also call Firefox APIs as well as execute calls in the shell (such as writing to files). For this reason, I ended up writing all of the functional tests for Tasks in Chickenfoot. (I discuss this system in more detail in my 2009 essay, "Functional Testing for Web Applications.") The only drawback was that getting Chickenfoot to run on Google's continuous integration system would have been quite complicated, so everyone on my team was responsible for running the tests manually before checking in a change.
This created a situation analogous to xkcd 303: "Compiling",
in that while the tests were running, you could not do any work because using the
mouse or keyboard would interfere with the tests. Your only option was to
sit back and watch the show put on by Robot
.
At the time, I did not feel like learning about xvfb
so that I could run the functional tests in the background
(or maybe I just enjoyed the five minute break I got while running tests),
but eventually the Tasks team got a
Noogler
who wrote a Python script to do exactly that.
Fortunately, WebDriver has come a long way since 2007, and today it would be my preferred way of writing these types of functional tests. The WebDriver team has taken on the dirty work of writing native code for all modern user agents so that user input can be emulated accurately during testing. This solves the primary limitation of the system I cobbled together using Chickenfoot, which was that it only worked on Firefox. Nevertheless, at the time, having functional tests for Firefox was better than not having any functional tests at all!
The big picture
Although I am extremely excited about the community's recent progress in providing (and documenting!) better tools for building web applications, I think that it is important to remember that using the latest and greatest tools is not sufficient to guarantee the best product. In this essay, I have discussed many of the details that went into the interface for Google Tasks. I assure you that there were just as many interesting problems in the data layer that also required creative solutions, but no one would care about any of them if there weren't a powerful and performant interface that surfaced that data to the user.Today, when I look at Hacker News, articles about technologies like Backbone.js and Meteor are at the top of the list. Even Asana seems to have distracted itself with Luna, its in-house web framework, rather than focus on the UI for its flagship product. I worry that we will end up with a collection of webapps with beautiful architectures (hopefully that even work offline!), yet are borderline unusable because their interfaces are a thoughtless collection of jQuery UI widgets.
But to be fair, I have to remind myself that I am biased. I was fortunate enough to work on two consumer web products at Google with millions of users: Calendar and Tasks. Both presented the challenge of building a high-fidelity interface in a web browser when web browsers weren't designed to support such interfaces. I enjoyed those challenges, and I found the work incredibly satisfying.
Yet few web applications require that level of detail: many enterprise applications are nothing more than a polite sequence of web forms. Admittedly, those same applications solve real problems and make users more productive. There is an incredible amount of value there, which is why we continue to see new tools and frameworks that help build those types of solutions.
In sum, to move the web forward, we need both better frameworks and better interfaces. Better frameworks are well under way, but developers need to be more pro-active in talking about their UI challenges and communicating these challenges to browser vendors. To build robust products, we need good APIs for building custom user interfaces as well as testing them. Again, not every application may require a fancy frontend solution, but look out for such opportunities and exploit them! I suspect that pushing for a more creative UI will increase both your job satisfaction and your user base.