About the project

For internal purposes, the social network ITnetwork.cz needed a spreadsheet component that could evaluate simple Excel-like formulas and apply con­di­tion­al styling to cells based on their calculated value.

For this purpose, I developed an extensible library for evaluating spreadsheet formulas and built a spreadsheet component using their custom component system, which handled ex­tract­ing spreadsheet data from a simple HTML table and transforming it by adding row and column headers, evaluating formulas using the afore­men­tioned library and applying user-specified conditional styling to ranges of cells.

As part of our agreement, I was allowed to release the library responsible for eval­u­at­ing spread­sheet formulas. This allowed me to continue working on it as a person­al project after the commission was done, adding several im­prove­ments, including sup­port for editing cells. Because I don't want to go into too much detail about the client's pro­pri­etary com­po­nent system, the rest of this case study will focus primarily on the formula evaluation library.

The result

One result of this project is a spreadsheet component with support for formulas and con­di­tion­al formatting, which helps the client simplify and improve some internal issues. Another result is a simple library for evaluating spreadsheet formulas. It orig­i­nat­ed as part of the work for the client, and later I continued developing it as a personal project (with the client's per­mis­sion to publish the library's code, of course).

To show off the capabilities of the formula evaluation library without going into details about the spreadsheet component I developed for the client, I built a demo in the form of a simple spreadsheet editor. Using the library, it took only about 300 lines of JavaScript to quickly hack together a spreadsheet editor with editable cells, the ability to undo/redo changes and support for basic keyboard navigation.

Play around with the demo below. For example, you can try to fix the syntax error on D8 (hint: I think someone "forgot" to add a closing parenthesis). Or watch how changing the price or quantity of a product au­to­mat­i­cal­ly updates the total. If you mess up, you can always use Ctrl+Z to undo your changes.

Spreadsheet editor demo
Fullscreen

If you're interested, you can also take a look at the reference for the formula syntax or check out the source code of the editor. (Note that this is not production-ready code, just a simple example of how the library can be used.)

My approach

Decoupling UI and formula evaluation

From the beginning, it was quite clear that the logic for evaluating spreadsheet formulas should be contained in a library separate from the spreadsheet component itself, which would be primarily responsible for how the calculated results are presented to the user. In addition to more readable code and the ability to reuse the library for different purposes, this sep­a­ra­tion of concerns also allows to easily unit test the formula evaluation logic in­de­pen­dent­ly from the user interface.

Library interface overview.

Storing cell data

To store data about spreadsheet cells, I decided not to use a grid-like data structure such as a 2D array, but instead to use a Map with cell names such as "B3" as keys. Besides making it trivial to look up and modify the data of a particular cell, this decision was based on the fol­low­ing observations:

  • Spreadsheets are often sparsely populated and can contain many empty cells.
  • The number of cells in a spreadsheet is width * height, which can quickly get into the thousands once the spreadsheet is a few dozen cells wide and high.
  • When calculating cell values, we don't really care about the physical layout or size of the spreadsheet. What matters is being able to quickly look up cells by their name.

Using a Map to store cell data makes it possible to simply ignore empty cells and makes the library more independent of how the calculated results are used. For example, the library doesn't need to know anything about the dimensions of the spreadsheet. Conversely, using an array would, in the worst case of a large spreadsheet with only a few non-empty cells, require dealing with thousands of empty cells.

Parsing and evaluating formulas

Tokenization and parsing

The library parses and evaluates cell formulas lazily/on-demand, calculating a cell's value only when being asked for it. This gives the consumer of the library more control over when calculations are performed and avoids potentially unnecessary work. The library does not know (or care) when the value of a cell will be needed, or if it will be needed at all – for ex­am­ple, a cell may never be displayed to the user.

For parsing spreadsheet formulas, I decided to use a recursive descent parser, because it is relatively simple and easy to understand. This makes it easier to change the formula syntax in case it's not enough to just add another spreadsheet function. To simplify the parser and improve overall performance, there is an additional "to­keniza­tion" step that happens before parsing, where the formula string is broken up into individual chunks ("tokens"), such as numbers, strings, identifiers, operators, and so on.

Steps of evaluating a formula.

Evaluating formulas

The evaluation of a cell's syntax tree also happens recursively. But unlike the parser, which is only concerned with the text of one cell at a time, the evaluator sometimes needs to know the resulting values of other cells when they are mentioned in a cell's formula.

To avoid evaluating the same cell repeatedly, the computed value of a cell is cached (it may need to be recomputed when a cell is edited, but more on that later). The evaluator must also keep track of all ref­er­ences between different cells to detect circular references (such as A1 → B1 → C1 → A1) and provide a helpful error message if such a cycle is found.

Customizable spreadsheet functions

An important feature of the library is the ability to easily define the functions available in spreadsheet formulas (such as SUM(), AVERAGE(), etc.) in the form of plain old JavaScript functions. The library also supports functions, whose arguments need to be evaluated lazily, such as IF(condition, ifTrue, ifFalse), where only one of the arguments is evaluated depending on the condition.

Adding cell editing capabilities

Since the spreadsheet component I built for ITnetwork.cz didn't need to have editing ca­pa­bil­i­ties (the component just rendered a static spreadsheet with the calculated results), the ini­tial version of the formula evaluation library did not support changing the contents of cells after the spreadsheet was initialized. However, I later added support for editing cells while con­tin­u­ing to work on the library as a personal project.

Adding support for editing the value of a cell required invalidating cached values of other cells, whose formulas referenced the edited cell. I also added a callback to the library, that gets a list of all cells whose values have changed after an edit.

Although editing a cell invalidates the cached values of other cells that depend on it, the parsed syntax tree of the dependent cells remains the same, because it depends only on the text of a cell. Because of this, I added a separate cache for parsed formulas (after verifying that it actually makes a difference by profiling the code with and without caching on a large test spreadsheet containing formulas with many dependencies between cells).

Unit testing

The logic for evaluating formulas isn't always trivial and leaves a lot of room for edge cases, especially when invalidating cached values comes into play. For this reason, an important part of the project was unit testing. To write and execute unit tests, I used the currently most popular JavaScript testing framework, Jest.

The library is only tested through its public interface. I deliberately avoided testing im­ple­men­ta­tion details, even if it meant that writing tests was sometimes a bit more challenging (for ex­am­ple when making sure that edge cases around caching were well covered with­out having access to the contents of the in­ter­nal cache). Testing only through the public interface had the advantage of allowing me to re­factor parts of the code without changing the unit tests – avoiding the introduction of bugs by updating tests si­mul­ta­ne­ous­ly with the code and always making it clear that a failed test meant a problem with a change I had made.

What I learned

While working on this project, I came to really appreciate how useful unit testing can be, beyond just checking that the code I just wrote works as expected. For one thing, I saw how writing tests forces me to precisely define the desired behavior, rather than leaving it to chance how the code will behave in some edge cases. Second, I came to value having good unit tests while refactoring the inner workings of the library, because I could immediately see if a change in the code caused a bug or a change in observable behavior. And when I in­evitably found some bugs that still slipped through, adding some tests helped prevent regressions and made sure the same problem didn't happen again in the future.

This project also taught me a thing or two about parsing things in JavaScript with decent perfor­mance and using the built-in profiler of web browsers. Besides optimizing the code of the parser and interpreter, another important part was caching the results and in­ter­me­di­ate values of calculations and removing them when a cached value is no longer valid.

After lately working a lot with React and Next.js, it was nice to be reminded of how much fun it can be to build things with vanilla JavaScript, without any other frameworks or libraries. De­vel­op­ing a simple spreadsheet editor, including support for undo/redo and basic keyboard navigation was a matter of two afternoons and a few hundred lines of code!

I am grateful I got to work on such an interesting project and hope to continue improving the formula evaluation library in the future. I already have some thoughts about improving the way formulas are evaluated, and there is always room to add more built-in functions, operators, or supported data types.