I was asked recently (for the 97,000th time), “I loved the treemap layout algorithm in SpaceMonger. I’d like to implement something similar. How did that algorithm work?”
I’ve been meaning to write a few articles on this for a while now, so here we go with the first one: How did SpaceMonger 1.x’s treemapping work?
First, Dr. Ben Shneiderman the basic concept of the treemap in 1990, and he has a great treemaps page with all sorts of links and resources.
That said, I didn’t use any of the information on Dr. Shneiderman’s page when I created SM 1.0 in 1997. I don’t believe his page even existed then; I heard about the concept of laying out a screen using nested rectangles in a book or magazine or something, and I derived myself a little algorithm that solved the problem inside a Windows program written in C++ with MFC. I’d recently installed Windows 95 on my computer, and was appalled at how much disk space was just gone compared to the Windows 3.1 days. In the space of an afternoon, I made a useful little thing that could explain what the heck happened to all my storage, and I called it SpaceMonger (mostly for complete lack of a better name at the time, hah! — I so totally should never be allowed to name things).
During the SM 2.x days, I figured that SM 1.x’s algorithm was passé, so I published it on the website of Sixty-Five, the company I’d started to sell SM 2.x. The company and website are both gone (but StarDock now sells SM 3.x for Windows 10!). But archive.org has a very nice copy of the algorithm page, preserved almost pixel-perfect to how it looked a decade ago.
While the code published there doesn’t do anything to explain how SM 2.x’s magical on-the-fly treemap rendering works, or how it could do that and even zoom live-in-and-out on the display even on woefully underpowered hardware like your average Windows 98 box circa 2006, that page still does a good job of explaining the basics of SM 1.x’s layout algorithm.
In an attempt to ensure that history doesn’t lose that explanation (I myself had to do a bit of searching to remember where I published it), I’m going to copy all of the meaningful content below in this blog posting.
Below is a verbatim copy of my entire treemap article, circa 2008. I’d love to put a big quote rectangle around it, but WordPress apparently doesn’t make that easy now.
How SpaceMonger Implements Treemaps*
So how does SpaceMonger’s treemapping algorithm work? While we can’t give you the whole details in a short space, we can give you an overview of it. The text below is somewhat technical in nature, but any second-year computer-science student should be able to follow it without too much difficulty.
The scanner and in-memory database
SpaceMonger maintains the disk’s data as a tree of ScanFile and ScanFolder objects in RAM. A ScanFolder can be thought of as an array that stores multiple ScanFiles. Each ScanFile object contains information about a file, such as its name and size and type, and a pointer to a possible deeper ScanFolder.
ScanFolders, in addition to storing an array of ScanFiles, also store a sum of the sizes of all the data under them. During the scan, or when data changes, SpaceMonger updates higher ScanFolder objects when their descendants get added, changed, or removed to ensure that every ScanFolder in the tree is always valid for the known data.
SpaceMonger maintains a “watch” on the disk (using the ReadDirectoryChangesW API), updates the tree when the disk changes, and notifies anything displaying parts of the disk (the treemap, the pie chart, etc.) that they need to update their displays accordingly.
The treemapper
SpaceMonger’s treemapper is broken into three parts: The layout engine, the renderer, and the widget. The layout engine is responsible for determining where files should be positioned in the treemap. The renderer is responsible for drawing files and folders in the treemap. The widget is repsonsible for accepting and processing user input, such as mouse clicks and key presses.
(This design is vaguely model-view-controller-esque, in that the model is the tree itself, the view is jointly managed by all three parts of the treemapper, and the widget is entirely responsible for the controller aspects of the design.)
These three components represent roughly 4,200 lines of code, so again, we’ll only give a broad overview to its design, mainly focusing on the layout engine (the renderer and the widget have a lot of details, but are conceptually fairly straightforward).
The layout engine is responsible for apportioning the space in the display to files and folders. SpaceMonger does this using a pair of recursive algorithms. Our tests determined that a single good general-purpose algorithm beats special-purpose algorithms, so SpaceMonger uses the same algorithms to divide up space no matter how that space is structured.
The layout algorithm is embodied in a function named LayoutFolder, which is given a folder from the tree and the coordinates of the folder** in the display. If the folder is the root of the tree, then the coordinates are those of the entire display.
LayoutFolder starts by sorting the files within the folder by size (it uses a counting sort and radix sort to do this especially quickly, but any sorting algorithm would work). It then takes the sorted files and passes them and the folder’s coordinates to a function named DivideDisplayArea. DivideDisplayArea does the majority of the hard layout work. If there’s only one file or less, the file obviously consumes all of the folder’s space. If there are two or more, though, its task is a bit more complicated.
DivideDisplayArea takes all of the files it’s given, and, using a greedy algorithm, divides them into two lists, A and B, whose total sizes are nearly equal***. Then the available area is divided in half, with each sub-area proportional to the total size of each list that must fit in it. This split is done horizontally if the available area has a high aspect ratio, and vertically if it has a low aspect ratio. The remaining partial lists and partial areas are then fed back into DivideDisplayArea, which will eventually recurse until each area only has one file to place inside it.
When every file has been positioned, DivideDisplayArea returns back to LayoutFolder, which then goes through each of its file again and calls LayoutFolder on it if the file is actually a folder. This, then, causes every file to be fully positioned in the display.
Once a file is positioned for rendering, rendering it to the display is a relatively simple phase that can be performed after layout. Rendering is left as an exercise for the reader.
Pseudocode for a SpaceMonger-like treemapping engine
The algorithms described above can be approximated by the following (rather lengthy) pseudocode. This pseudocode is written in a vaguely C-ish style, but should be fairly comprehensible to any programmer familiar with C, C++, Perl, Java, Javascript, or PHP (and is likely to be at least somewhat readable to fans of Pascal and Ada too).
// Compute the complete layout for a folder and all of its // descendants. function LayoutFolder(folder_data, folder_rectangle) { // Sort the folder's children by their sizes, largest first. // We leave the result in a separate array along with the // sizes of the files. If you wanted to lay out the data // by a different sizing technique (by logical size, physical // size, count, etc.), this would be the place to assign // "true" sizes to each entry. sorted_array = new array(folder_data.num_children); sort(folder_data.children, sorted_array); // Divide up the display area for the children. DivideDisplayArea(sorted_array, folder_rectangle, 0, folder_data.num_children, folder_data.total_size_of_descendants); // Go within each child, and if it's a folder, lay it out too. foreach (child in folder_data.children) { if (child is a folder) { area_for_children = child.rectangle; // If you want to "steal" pixels from the children // for the folder (for example, to display the // folder's title), simply shrink area_for_children // here before calling LayoutChildren(). LayoutChildren(child, area_for_children); } } } // DivideDisplayArea fully positions a group of children in // the given rectangle. It is very recursive. function DivideDisplayArea(children, area_rectangle, first_child, num_children_to_layout, totalsize) { if (num_children_to_layout <= 0) { // Easy case: No files to position. return; } if (num_children_to_layout == 1) { // Equally easy case: Only one file to position. children[first_child].rectangle = area_rectangle; } // General case: Position two or more files. Start by // creating two imaginary empty lists. We will eventually // move all the files from the original array to these lists // such that the two lists are of approximately equal size. // There are a variety of ways to do this; below, we'll // show the method that produces the 'squarified' layout // used in default installs of SpaceMonger. children_in_list_a = 0; size_a = 0; children_in_list_b = 0; size_b = 0; // Add the first file to the first list. Because the incoming // list is sorted by size, we know this will always be the // largest child. size_a = children[first_child].size; children[first_child].list = LIST_A; children_in_list_a++; // Now the fun part. We want to try to make the two lists // approximately equal in size. We do this by linearly // searching for the midpoint of the sorted list such that // a few large files on one side are approximately equal // in size to many small files on the other side. // Walk through the files and find the "center of gravity" // of the data. Each file before the "center of gravity" // will be put into list A. for (child = first_child + 1; child < first_child + num_children_to_layout && (size_a + children[child].size) * 2 < totalsize && children[child].size > 0; child++) { size_a += children[child].size; children[child].list = LIST_A; children_in_list_a++; } // Keep walking through the list and mark the rest of the // files as being for list B. for (; child < first_child + num_children_to_layout; child++) { size_b += children[child].size; children[child].list = LIST_B; children_in_list_b++; } // Now copy the files from the original array into // two temporary arrays. These will both still be // sorted by size. temp_array_a = new array(children_in_list_a); temp_array_b = new array(children_in_list_b); dest_a = 0; dest_b = 0; for (src = first_child; src < first_child + num_children_to_layout; src++) { if (children[src].list == LIST_A) temp_array_a[dest_a++] = children[src]; else temp_array_b[dest_b++] = children[src]; } // The temporary arrays really aren't necessary, // so now that we've "unsorted" our data into their // separate lists, we can move it back to the // original array. Then we can free the temporary // arrays (note that if you're clever, you can share // the temporary arrays across multiple invocations // of DivideDisplayArea, thus avoiding allocating // or freeing them at all.) dest = first_child; for (src = 0; src < dest_a; src++) children[dest++] = temp_array_a[src++]; delete temp_array_a; for (src = 0; src < dest_b; src++) children[dest++] = temp_array_b[src++]; delete temp_array_b; // We can now divide up the available area into two // parts according to the lists' sizes. x = area_rectangle.x; y = area_rectangle.y; width = area_rectangle.width; height = area_rectangle.height; if (size_a + size_b <= 0) { // Degenerate case: All size-zero entries. midpoint = 0; orientation = HORIZONTAL; } else { // If the available area is wider than it is tall, // split it vertically. Otherwise, split it // horizontally. The position of the split is // proportional to the sizes of the two lists. if (width >= height) { midpoint = (size_a * width) / totalsize; orientation = HORIZONTAL; } else { midpoint = (size_a * height) / totalsize; orientation = VERTICAL; } } // Once we've split, we recurse to divide up the two // new areas further, and we keep recursing until // we're only trying to fit one entry into any // given area. This way, even size-zero entries will // eventually be assigned a location somewhere in the // display. The rectangles below are created in // (x, y, width, height) format. if (orientation == HORIZONTAL) { DivideDisplayArea(children, Rectangle(x, y, midpoint, height), first_child, children_in_list_a, size_a); DivideDisplayArea(children, Rectangle(x + midpoint, y, width - midpoint, height), first_child + children_in_list_a, children_in_list_b, size_b); } else { DivideDisplayArea(children, Rectangle(x, y, width, midpoint), first_child, children_in_list_a, size_a); DivideDisplayArea(children, Rectangle(x, y + midpoint, width, height - midpoint), first_child + children_in_list_a, children_in_list_b, size_b); } }
Conclusion
SpaceMonger uses many, many optimizations on these techniques to ensure that the layout is performed very, very fast. Some of those optimizations are Sixty-Five trade secrets, so we can’t release them. Others include avoiding copies in the loops above by using pointers; caching size lookups in the sort arrays so that sizes can be sampled from a variety of sources (logical, physical, count, etc.); and splitting off the free space and system space specially so that they are always separate from the rest of the display. But even without all these optimizations and tricks, the basic algorithm above will produce a treemap layout that’s very similar to that of SpaceMonger’s, even if it’s not as fast or as flexible as SpaceMonger’s.
Footnotes
* Why are we releasing information about how our treemap algorithm works? Well, we believe in the free exchange of ideas, for one thing; and for another, we have benefited from public programming documents ourselves. It would be hypocritical then to build on knowledge we learned from the public and not return at least some of our work to the public. You’re welcome to use any of the information you see above any way you see fit, if it benefits you.
** LayoutFolder, in SpaceMonger, is not actually given the coordinates of the folder because it doesn’t need them; instead, it’s merely given the aspect ratio of the folder. But most typical implementations would probably want to include the coordinates of the folder instead.
*** It’s impossible for DivideDisplayArea to do this splitting perfectly. First, it’s very unlikely that the lists can be made to have equal sizes. And second, it’s been shown that finding the “most equally-sized sublists” is a problem that’s NP-complete. Thus SpaceMonger uses a greedy algorithm that, while not always perfect, usually achieves very, very good results.