Developing a spreadsheet rendering component with Excel-like formulas
For the social network ITnetwork.cz, I developed a spreadsheet component using their custom component system.
This included creating a library for evaluating spreadsheet formulas, support for conditional formatting, and thorough unit testing.
For internal purposes, the social network ITnetwork.cz needed a spreadsheet component
that could evaluate simple Excel-like formulas and apply conditional styling to cells based on their calculated value.
For this purpose, I developed an extensible library for evaluating spreadsheet formulas
and built a spreadsheet component using their custom component system,
which handled extracting spreadsheet data from a simple HTML table and transforming it by
adding row and column headers, evaluating formulas using the aforementioned library
and applying user-specified conditional styling to ranges of cells.
As part of our agreement, I was allowed to release the library responsible for evaluating spreadsheet formulas.
This allowed me to continue working on it as a personal project after the commission was done,
adding several improvements, including support for editing cells.
Because I don't want to go into too much detail about the client's proprietary component system,
the rest of this case study will focus primarily on the formula evaluation library.
The result
One result of this project is a spreadsheet component with support for formulas and conditional formatting, which helps the client
simplify and improve some internal issues.
Another result is a simple library for evaluating spreadsheet formulas. It originated as part of the work for the client,
and later I continued developing it as a personal project (with the client's permission to publish the library's code, of course).
To show off the capabilities of the formula evaluation library without going into details about
the spreadsheet component I developed for the client, I built a demo in the form of a simple spreadsheet editor.
Using the library, it took only about 300 lines of JavaScript to quickly hack
together a spreadsheet editor with editable cells, the ability to undo/redo changes and support for basic keyboard navigation.
Play around with the demo below. For example, you can try to fix the syntax error on D8 (hint: I think someone "forgot" to add a
closing parenthesis). Or watch how changing the price or quantity of a product automatically updates the total.
If you mess up, you can always use Ctrl+Z to undo your changes.
If you're interested, you can also take a look at the reference for the formula syntax
or check out the source code of the editor.
(Note that this is not production-ready code, just a simple example of how the library can be used.)
My approach
Decoupling UI and formula evaluation
From the beginning, it was quite clear that the logic for evaluating spreadsheet formulas should be
contained in a library separate from the spreadsheet component itself, which would be primarily responsible
for how the calculated results are presented to the user.
In addition to more readable code and the ability to reuse the library for different purposes, this separation of concerns
also allows to easily unit test the formula evaluation logic independently from the user interface.
Storing cell data
To store data about spreadsheet cells, I decided not to use a grid-like data structure such as
a 2D array, but instead to use a Map with cell names such as "B3" as keys. Besides making it trivial
to look up and modify the data of a particular cell, this decision was based on the following observations:
Spreadsheets are often sparsely populated and can contain many empty cells.
The number of cells in a spreadsheet is width * height, which can quickly get into the thousands
once the spreadsheet is a few dozen cells wide and high.
When calculating cell values, we don't really care about the physical layout or size of the spreadsheet.
What matters is being able to quickly look up cells by their name.
Using a Map to store cell data makes it possible to simply ignore empty cells and
makes the library more independent of how the calculated results are used. For example,
the library doesn't need to know anything about the dimensions of the spreadsheet.
Conversely, using an array would, in the worst case of a large spreadsheet with only a few non-empty cells,
require dealing with thousands of empty cells.
Parsing and evaluating formulas
Tokenization and parsing
The library parses and evaluates cell formulas lazily/on-demand, calculating a cell's value only when being asked for it.
This gives the consumer of the library more control over when calculations are performed and avoids potentially unnecessary work.
The library does not know (or care) when the value of a cell will be needed, or if it will be needed at all – for example,
a cell may never be displayed to the user.
For parsing spreadsheet formulas, I decided to use a recursive descent parser,
because it is relatively simple and easy to understand.
This makes it easier to change the formula syntax in case it's not enough to just add another spreadsheet function.
To simplify the parser and improve overall performance, there is an additional "tokenization" step that happens before parsing,
where the formula string is broken up into individual chunks ("tokens"), such as numbers, strings, identifiers, operators, and so on.
Evaluating formulas
The evaluation of a cell's syntax tree also happens recursively.
But unlike the parser, which is only concerned with the text of one cell at a time,
the evaluator sometimes needs to know the resulting values of other cells when they are mentioned in a cell's formula.
To avoid evaluating the same cell repeatedly, the computed value of a cell is cached
(it may need to be recomputed when a cell is edited, but more on that later).
The evaluator must also keep track of all references between different cells to detect circular references
(such as A1 → B1 → C1 → A1) and provide a helpful error message if such a cycle is found.
Customizable spreadsheet functions
An important feature of the library is the ability to easily define the functions available
in spreadsheet formulas (such as SUM(), AVERAGE(), etc.) in the form of plain old JavaScript
functions. The library also supports functions, whose arguments need to be evaluated lazily,
such as IF(condition, ifTrue, ifFalse), where only one of the arguments is evaluated depending on the condition.
Adding cell editing capabilities
Since the spreadsheet component I built for ITnetwork.cz didn't need to have
editing capabilities (the component just rendered a static spreadsheet with the calculated results),
the initial version of the formula evaluation library did not support changing the contents of cells after the
spreadsheet was initialized. However, I later added support for editing cells while continuing to work
on the library as a personal project.
Adding support for editing the value of a cell required invalidating cached values of other cells, whose formulas
referenced the edited cell. I also added a callback to the library, that gets
a list of all cells whose values have changed after an edit.
Although editing a cell invalidates the cached values of other cells that depend on it, the
parsed syntax tree of the dependent cells remains the same, because it depends only on the text of a cell.
Because of this, I added a separate cache for parsed formulas (after verifying that it actually makes a difference
by profiling the code with and without caching on a large test spreadsheet containing formulas with many dependencies between cells).
Unit testing
The logic for evaluating formulas isn't always trivial and leaves a lot of room for edge cases, especially
when invalidating cached values comes into play. For this reason, an important part of the project was unit testing.
To write and execute unit tests, I used the currently most popular JavaScript testing framework, Jest.
The library is only tested through its public interface. I deliberately avoided testing implementation details,
even if it meant that writing tests was sometimes a bit more challenging (for example when making sure that edge cases around
caching were well covered without having access to the contents of the internal cache).
Testing only through the public interface had the advantage of allowing me to refactor parts of the code without
changing the unit tests – avoiding the introduction of bugs by updating tests simultaneously with the code
and always making it clear that a failed test meant a problem with a change I had made.
What I learned
While working on this project, I came to really appreciate how useful unit testing can be, beyond just checking that the code I just wrote works as expected.
For one thing, I saw how writing tests forces me to precisely define the desired behavior, rather than leaving it to chance how the
code will behave in some edge cases.
Second, I came to value having good unit tests while refactoring the inner workings of the library,
because I could immediately see if a change in the code caused a bug or a change in observable behavior. And when I inevitably
found some bugs that still slipped through, adding some tests helped prevent regressions and made sure the same
problem didn't happen again in the future.
This project also taught me a thing or two about parsing things in JavaScript with decent performance and
using the built-in profiler of web browsers.
Besides optimizing the code of the parser and interpreter, another important part was
caching the results and intermediate values of calculations and removing them when a cached value is no longer valid.
After lately working a lot with React and Next.js, it was nice to be reminded of how much fun it can be to build things with vanilla JavaScript, without any other frameworks or libraries.
Developing a simple spreadsheet editor, including support for undo/redo and basic keyboard navigation was a matter of two afternoons and a few hundred lines of code!
I am grateful I got to work on such an interesting project and
hope to continue improving the formula evaluation library in the future.
I already have some thoughts about improving the way formulas are evaluated,
and there is always room to add more built-in functions, operators, or supported data types.