Developing a spreadsheet rendering component with Excel-like formulas

Skip to select language
Skip to main content

About the project

For internal purposes, the social network ITnetwork.cz needed a spreadsheet component that could evaluate simple Excel-like formulas and apply conditional styling to cells based on their calculated value.

For this purpose, I developed an extensible library for evaluating spreadsheet formulas and built a spreadsheet component using their custom component system, which handled extracting spreadsheet data from a simple HTML table and transforming it by adding row and column headers, evaluating formulas using the aforementioned library and applying user-specified conditional styling to ranges of cells.

As part of our agreement, I was allowed to release the library responsible for evaluating spreadsheet formulas. This allowed me to continue working on it as a personal project after the commission was done, adding several improvements, including support for editing cells. Because I don't want to go into too much detail about the client's proprietary component system, the rest of this case study will focus primarily on the formula evaluation library.

The result

One result of this project is a spreadsheet component with support for formulas and conditional formatting, which helps the client simplify and improve some internal issues. Another result is a simple library for evaluating spreadsheet formulas. It originated as part of the work for the client, and later I continued developing it as a personal project (with the client's permission to publish the library's code, of course).

To show off the capabilities of the formula evaluation library without going into details about the spreadsheet component I developed for the client, I built a demo in the form of a simple spreadsheet editor. Using the library, it took only about 300 lines of JavaScript to quickly hack together a spreadsheet editor with editable cells, the ability to undo/redo changes and support for basic keyboard navigation.

Play around with the demo below. For example, you can try to fix the syntax error on D8 (hint: I think someone "forgot" to add a closing parenthesis). Or watch how changing the price or quantity of a product automatically updates the total. If you mess up, you can always use Ctrl+Z to undo your changes.

Spreadsheet editor demo

Fullscreen

If you're interested, you can also take a look at the reference for the formula syntax or check out the source code of the editor. (Note that this is not production-ready code, just a simple example of how the library can be used.)

My approach

Decoupling UI and formula evaluation

From the beginning, it was quite clear that the logic for evaluating spreadsheet formulas should be contained in a library separate from the spreadsheet component itself, which would be primarily responsible for how the calculated results are presented to the user. In addition to more readable code and the ability to reuse the library for different purposes, this separation of concerns also allows to easily unit test the formula evaluation logic independently from the user interface.

Storing cell data

To store data about spreadsheet cells, I decided not to use a grid-like data structure such as a 2D array, but instead to use a Map with cell names such as "B3" as keys. Besides making it trivial to look up and modify the data of a particular cell, this decision was based on the following observations:

Spreadsheets are often sparsely populated and can contain many empty cells.
The number of cells in a spreadsheet is width * height, which can quickly get into the thousands once the spreadsheet is a few dozen cells wide and high.
When calculating cell values, we don't really care about the physical layout or size of the spreadsheet. What matters is being able to quickly look up cells by their name.

Using a Map to store cell data makes it possible to simply ignore empty cells and makes the library more independent of how the calculated results are used. For example, the library doesn't need to know anything about the dimensions of the spreadsheet. Conversely, using an array would, in the worst case of a large spreadsheet with only a few non-empty cells, require dealing with thousands of empty cells.

Parsing and evaluating formulas

Tokenization and parsing

The library parses and evaluates cell formulas lazily/on-demand, calculating a cell's value only when being asked for it. This gives the consumer of the library more control over when calculations are performed and avoids potentially unnecessary work. The library does not know (or care) when the value of a cell will be needed, or if it will be needed at all – for example, a cell may never be displayed to the user.

For parsing spreadsheet formulas, I decided to use a recursive descent parser, because it is relatively simple and easy to understand. This makes it easier to change the formula syntax in case it's not enough to just add another spreadsheet function. To simplify the parser and improve overall performance, there is an additional "tokenization" step that happens before parsing, where the formula string is broken up into individual chunks ("tokens"), such as numbers, strings, identifiers, operators, and so on.

Evaluating formulas

The evaluation of a cell's syntax tree also happens recursively. But unlike the parser, which is only concerned with the text of one cell at a time, the evaluator sometimes needs to know the resulting values of other cells when they are mentioned in a cell's formula.

To avoid evaluating the same cell repeatedly, the computed value of a cell is cached (it may need to be recomputed when a cell is edited, but more on that later). The evaluator must also keep track of all references between different cells to detect circular references (such as A1 → B1 → C1 → A1) and provide a helpful error message if such a cycle is found.

Customizable spreadsheet functions

An important feature of the library is the ability to easily define the functions available in spreadsheet formulas (such as SUM(), AVERAGE(), etc.) in the form of plain old JavaScript functions. The library also supports functions, whose arguments need to be evaluated lazily, such as IF(condition, ifTrue, ifFalse), where only one of the arguments is evaluated depending on the condition.

Adding cell editing capabilities

Since the spreadsheet component I built for ITnetwork.cz didn't need to have editing capabilities (the component just rendered a static spreadsheet with the calculated results), the initial version of the formula evaluation library did not support changing the contents of cells after the spreadsheet was initialized. However, I later added support for editing cells while continuing to work on the library as a personal project.

Adding support for editing the value of a cell required invalidating cached values of other cells, whose formulas referenced the edited cell. I also added a callback to the library, that gets a list of all cells whose values have changed after an edit.

Although editing a cell invalidates the cached values of other cells that depend on it, the parsed syntax tree of the dependent cells remains the same, because it depends only on the text of a cell. Because of this, I added a separate cache for parsed formulas (after verifying that it actually makes a difference by profiling the code with and without caching on a large test spreadsheet containing formulas with many dependencies between cells).

Unit testing

The logic for evaluating formulas isn't always trivial and leaves a lot of room for edge cases, especially when invalidating cached values comes into play. For this reason, an important part of the project was unit testing. To write and execute unit tests, I used the currently most popular JavaScript testing framework, Jest.

The library is only tested through its public interface. I deliberately avoided testing implementation details, even if it meant that writing tests was sometimes a bit more challenging (for example when making sure that edge cases around caching were well covered without having access to the contents of the internal cache). Testing only through the public interface had the advantage of allowing me to refactor parts of the code without changing the unit tests – avoiding the introduction of bugs by updating tests simultaneously with the code and always making it clear that a failed test meant a problem with a change I had made.

What I learned

While working on this project, I came to really appreciate how useful unit testing can be, beyond just checking that the code I just wrote works as expected. For one thing, I saw how writing tests forces me to precisely define the desired behavior, rather than leaving it to chance how the code will behave in some edge cases. Second, I came to value having good unit tests while refactoring the inner workings of the library, because I could immediately see if a change in the code caused a bug or a change in observable behavior. And when I inevitably found some bugs that still slipped through, adding some tests helped prevent regressions and made sure the same problem didn't happen again in the future.

This project also taught me a thing or two about parsing things in JavaScript with decent performance and using the built-in profiler of web browsers. Besides optimizing the code of the parser and interpreter, another important part was caching the results and intermediate values of calculations and removing them when a cached value is no longer valid.

After lately working a lot with React and Next.js, it was nice to be reminded of how much fun it can be to build things with vanilla JavaScript, without any other frameworks or libraries. Developing a simple spreadsheet editor, including support for undo/redo and basic keyboard navigation was a matter of two afternoons and a few hundred lines of code!

I am grateful I got to work on such an interesting project and hope to continue improving the formula evaluation library in the future. I already have some thoughts about improving the way formulas are evaluated, and there is always room to add more built-in functions, operators, or supported data types.