I work with ideas about computers. I think about the things computers can do, and I try to find ways to make computers do those things better or faster, and I write all those ideas down. And I try to find ways to stop computers from ever being slow, so that we don’t have to make them faster. I also think a lot about if there are things that computers can or can’t do, and if it’s important that computers can or can’t do them. It’s bad for people when computers are slow or when computers can’t do things because we want computers to help us with things we want to do. But making computers do things that they can’t do is hard, and making computers go faster can be hard too. So sometimes I use ideas from other people to make the things computers do faster or better, and sometimes I find my own ideas too, and then I write those ideas down and tell everyone about them so all of us can make computers do more things better.
Lately, I’ve been growing increasingly obsessed with this problem. While my solution is very fast (O(n) is pretty fast!), I’ve been concerned about a few possible issues with it:
First, I wasn’t certain it was anything better than locally-optimal. It’s guaranteed not to produce a bad result, but will it produce a good result? I couldn’t be sure.
Second, it relies on floating-point arithmetic, while most other solutions don’t.
It has the nice upside of being able to operate in constant space (not including the O(k) output space), and linear time, but those two caveats are potentially problematic. If it turned out to be a really bad solution, who’d use it? And the floating-point numbers felt too fuzzy to be safe. So I started poking at it again.
Being a computer scientist is a funny thing. You live on the edge of knowledge in a weird realm that’s not quite mathematics and not quite physical machinery. Like most of the sciences, of course, anything you “discover” was likely already discovered several decades before. But every great once in a while, you bump into a problem that seems to have received very little attention, and you see the existing solutions, and you think, “Huh, it seems like there must be a better way.”
I did that, once, a long time ago: I discovered a really neat technique for computing treemaps on-the-fly for arbitrarily large datasets, and that’s why SpaceMonger 2+ have such incredibly fast renderings. I don’t know for sure if it’s a novel solution, but I haven’t seen anyone else do it. One of these years, I need to properly publish a paper on how that works: I haven’t been intentionally keeping that algorithm private; I just haven’t gotten around to writing up a really good description of how it works.
But today, I’m going to talk about another algorithm entirely: Linear partitioning a sequence of numbers.
Twenty years ago today, I got a Knuth Check. Knuth dated it October 14, 2001, and I received it in the mail in early November that year, I think.
I’d heard, of course, about Knuth Checks, but I didn’t really know as much about them then as I do today. I was a kid a few years out of college, just trying to eke out a living building software; and when I found a mistake in an algorithm in his book, I intended to write a note in a short e-mail to Dr. Knuth, but I found out he was (wisely) avoiding the medium, so I sent him a letter. And a few months later, I got back a check.
It’s sat in a frame on my desk for twenty years, singularly my proudest professional accomplishment. I’ve shipped tons of software, I’ve made and discovered amazing things, I’ve won other awards — but nothing holds a candle to having a Knuth Check with your name on it.
(More accurately, the original is in my safe-deposit box at the bank, and a copy has sat in a frame for twenty years, because I’m not leaving something this valuable out on my desk! 😃)
So here’s the story:
The Art of Computer Programming, Volume 1, has an algorithm on page 26 that describes an efficient solution for calculating a fractional logarithm in binary, using pure integer math. The algorithm in question derives from one first published in 1624 by Henry Briggs. At the bottom of page 26 in the 3rd Edition, the book mentions that Richard Feynman (yes, that Feynman) determined an efficient inverse algorithm for calculating a fractional exponent in binary, and it proposes to the reader to solve it as problem #28.
As is typical of all of Knuth’s “problems for the reader,” he includes the answer in the rather large appendix that comprises the last 150 pages of the book. The solution to problem #28 can be found at the bottom of page 470, and, unfortunately, it is not the same algorithm that Feynman discovered: It has a critical error in the first step:
E1. [Initialize.] If 1 – ϵ is the largest possible value of x, set y ← (nearest approximation to b1-ϵ), x ← 1 – ϵ, k ← 1. (The quantity yb-x will remain approximately constant in the following steps.)
TAoCP, V1, 3rd ed., pp. 470
That has a rather unfortunate error in it: x ← 1 – ϵ is potentially a meaningful expression in that context, but it’s wrong, and it will calculate not the correct exponent but garbage data, with, as I recall, mostly zero bits and a few semi-randomly-distributed one bits.
The correct expression is x ← 1 – x. It’s possible the other form was a typographical error on Knuth’s part, but it’s not in the same category as a simple omission or spelling error, since if you follow the mathematics as written, you will simply get garbage, and not get a valid fractional exponent.
(I struggled for a few days with the original: It was Knuth; surely I’d been the one to make the mistake in my code, not the good Doctor in his! But ultimately when I began to experiment with the broken algorithm, the answer became obvious.)
So I wrote Knuth a letter in the spring of 2001 saying that I thought that was wrong as published, since it generated garbage output, and I included that I thought that x ← 1 – x was actually the correct form, since it appeared to be correct. And he sent me back a rather nice letter and a check for $2.56 for having found an error.
(I had also suggested that the inherent parallel execution implied by the commas in step E2 appeared to be wrong, that sequential execution appeared to be needed there; Knuth chastised me in the letter that no, parallel execution was in fact intended. But I still got an error check for a broken algorithm!)
Interestingly, I didn’t get my check by just searching for errors, like many people have: I was trying to build something; I encountered that error merely by remembering that the books contained an answer to the problem I was facing. But it seems that not many people build things by literally reading the textbook and doing exactly what it says; and apparently, nobody had implemented Feynman’s exponentiation algorithm from the book after it appeared in the 3rd Edition in 1997. [ed. note: I checked, and Knuth added it to the 3rd Ed. in April 1995; my error can be seen in his addenda.]
That said, an entertaining question to posit: Why the heck was I trying to implement fractional logarithms and exponents in the first place?
The answer was that it was 2001, and Windows 98 and ’486 PCs were still pretty common, and you couldn’t assume a computer supported floating-point math. And I needed to calculate powers: As simple as xy, just like on your calculator, and it was even constrained: In my case, x was always an integer, and y was always a number between 1 and 2. But I didn’t have guaranteed floating-point hardware available so I could just write pow(x, y). So I improvised: I used the Briggs and Feynman methods to calculate the logarithm and the exponent in pure integer math, so that I could implement calculation of a power as exp(y * log x) instead, and effectively use multiplication — which the hardware supported — to produce a power (which it definitely didn’t).
Today, of course, I’d write pow(x, y), just like you would. But when you’re stuck on super-limited hardware, you have to be a bit more clever.
(For what it’s worth, you can see the code live, in action, if you have a copy of SpaceMonger 2 or 3; just go into the Settings, and turn up the “Exaggeration” slider: Every file’s size will be raised to the power you choose, using exactly this math!)
Feynman, of course, derived that algorithm for exactly the same reason I needed it: They were using mechanical adding machines during WWII as part of the Manhattan Project, and those machines didn’t do powers. But the relativistic math for the atomic bomb needed lots of powers; so he derived the exponent algorithm as the converse to the logarithm algorithm, so that they were able to just use addition, subtraction, and multiplication, which the adding machines could do, to still efficiently calculate powers, which the machines couldn’t.
So that’s my Knuth Check story. It’s unlikely I’ll ever earn another; I certainly haven’t in the last twenty years! I still haven’t even finished reading Volume 4A yet, and I’m far too busy to randomly pore over the other tomes again just in search of a mistake. But I still managed to earn a prize that’s among the most coveted in this field, and twenty years later, I’m still happy to have done it even once.
I’ve had to say this several times over the years, so I thought I’d repeat it in a place where I can just make a link to it: The Dictionary<K,V> class in .NET does not store your data in order.
When you .Add() or .Remove() items from a dictionary, they get stored in whatever order the dictionary finds is most efficient. It just so happens that as long as you never.Remove() an item, the underlying data structure ends up with the items in order when you enumerate them, but this is not guaranteed.
Here’s an example program to demonstrate that it doesn’t work the way you probably expect:
You might think reading this that the first output would be in order of [A, B, C, D, E, F], and the second would go [A, B, C, E, F, G]. But that’s not the output that’s produced:
Fruits:
A -> Apple
B -> Banana
C -> Cherry
D -> Durian
E -> Elderberry
F -> Fig
Fruits:
A -> Apple
B -> Banana
C -> Cherry
G -> Grape
E -> Elderberry
F -> Fig
The new entry gets slotted in where the last one was removed, which is exactly the behavior you’d expect if you had read the underlying code, but it’s very different than what you’d expect if you just assumed everything will always have insertion order.
Moral of the story: If you need dictionary data to be ordered, don’t assume it’s ordered — explicitly order it!
If you need things to be in insertion order, keep them in a list.
If you need things to be ordered by key, consider a SortedDictionary<K,V> instead.
If you only need things sometimes to be ordered by key, use Linq: dictionary.OrderBy(pair => pair.Key)
Comments Off on Reminder: .NET Dictionary isn’t ordered
So 14 months ago, I wrote an essay about what Donald Trump would do to tear apart this country. I’m pleased to say that I was wrong. -ish.
Let’s go down the list and see what my score was:
✓ Trump will send servicemen into a city that doesn’t want them to quell the protests. This happened in both Seattle and Washington, DC.
✗ A shot will be fired by a serviceman, and somebody will die. Well… not really. January 6 doesn’t count.
✗ The protests will switch to “The government is out to get us.” Maybe in some places, but again, not really. Certainly not like the protests of the 1960s.
✗ The protests will get bigger. Actually, they didn’t, after the summer of 2020. If anything, they got smaller.
✓ COVID-19 will get worse. Well past half a million dead in the US, and counting.
✗ Trump will double down on the protests. The protests settled down, so that didn’t happen.
✗ People will fight back, and the summer of 2020 will be violent. The summer of 2020 was the calm before the storm, actually.
✗ The election of November 2020 will be suspended. Thank God it wasn’t. There was a giant mess following it, but we had the election on-time, and it went surprisingly smoothly.
✓✗ Trump will declare himself president “until further notice.” I think we all know that Biden is the President, but some of Trump’s die-hard supporters are convinced he’ll somehow still be reinstated, and I’m not entirely sure even he believes he lost, even though, y’know, those pesky facts say he did. But I’m gonna claim half-credit on this, since, y’know, on January 6, he did try to commit a coup d’état.
✗ The year-zero rule will take out Trump. Didn’t happen. And here’s hoping President Biden stays safe.
✓ Historians will rank Trump as the worst president ever. Okay, there’s debate on this, but among respectable historians, the only real question is whether he’s at the bottom or just in the bottom five. I’m still claiming it, though.
That’s 3½ out of 12! I officially suck at predicting the future!
Of course, Trump did try to commit a coup d’état, and he is still tearing apart the country, and the hard right is still so far off the crazy deep end that every day I expect those lunatics to march on the Capitol again — but at least as regards the “Trump becomes the American Dictator” timeline, I’ve never been happier to be wrong.
Comments Off on Happy to Be Wrong — But I Almost Wasn’t
It’s not a bad piece of software. It does its job. But I’m a coder, and I like having control over how things end up, and ever since I installed WordPress here years ago, I’ve been doing things its way. I don’t really love the look and feel of this site right now, but I do like how little I have to do to maintain it.
But there’s a growing movement of blogs and websites that are pure, static content, like I used to do back in the day. Once upon a time, my website was just a bunch of text files, mashed together in interesting ways by the m4 macro processor to produce static HTML as output. I started doing that in 1996, and 25 years later, the rest of the world has started doing that too, using Go and JavaScript and Ruby, among others: Static content has made a real comeback, now that modern JavaScript and CSS can fill in the interactive parts.
What I liked about using m4 back in the day was that the code simply did what I wanted: There were no database or administrative tools to get in the way, just powerful macros that constructed HTML from single sources of truth. I ran a lot of websites on that technology, for years, and I miss the elegance of it.
So I don’t know. I may spend some time hacking my website into a new shape. I haven’t redesigned things in some years, and maybe it’s time to export all my old content and rebuild this site in a brand new way.
In the old days if you encountered a badly-coded website, you could often fix it with tools like TamperMonkey: Hack a little JavaScript, and you could fix the worst of the site’s abuses, or even make it nicer and add features the original developers didn’t (or couldn’t) add. Everything was fairly open, so you could just drop new script on the page, and even substitute out functions if you had a better implementation.
We now live in a reality driven by WebPack, where every website’s JavaScript is packed, minified, bundled, and, importantly, smashed together inside a closure, making all of its innards and mechanics hidden and encapsulated, hidden even from its own code.
That’s great for application stability, if you have good developers. But it also means that the app you get is the app you get: Every bug, every design flaw, and every misfeature is hidden away inside that bundled ball — which means that no matter how bad it is, you can’t fix somebody else’s broken website anymore.
I switched from Chrome to Firefox about four or five years ago, and I generally haven’t looked back. Firefox has its issues, to be sure, but Mozilla’s pretty open about them, and Firefox is free in a way that Chrome isn’t, and runs everywhere in a way that Chrome doesn’t, and the work they’re doing on Rust in it is nothing short of amazing. I wholeheartedly endorse using Firefox, and I only open other browsers for those bad pages out there that insist on being “designed for Chrome” (shame on you, what year is this, 2002?).
So a pro tip for all you Firefox users out there: Like so many browsers, Firefox can get a little twitchy if you leave it up and running long enough: Its memory footprint grows over time, and sometimes its CPU usage will creep up until it’s eating the universe, especially if you typically have forty to a hundred tabs open (like me!).
So here’s a way to easily put a safe “Restart Firefox” button in the corner of Firefox’s window, without installing any Extensions or Add-ons —
I’m leaving my role at Suvoda. I’ve enjoyed my time there, and I made some good stuff, but thanks to fairly strict nondisclosure policies, I haven’t been able to talk here much about my job like I could when I worked at HomeNet and Cox. I made some new friends at Suvoda, and some of us will stay in touch after I leave. I’ll still be rooting for Suvoda from the sidelines: There’s a need for well-run clinical drug trials, and society is the benefactor.
But — !
IAM Robotics reached out to me a few weeks ago, and after some whirlwind interviews, I accepted a new role with them as a software architect. I’ll be starting in June, and I’m really excited: I have a big responsibility to help scale their robots to the next level, and I absolutely intend to deliver on it. They have really complex challenges for the kinds of software work they do, and that’s absolutely my kind of problem: Crazy sky’s-the-limit challenges where there’s no perfect answer and every option is on the table. I’ll be doing substantial coding for them, but as the title implies, it’s a lot more design-focused than my job has ever been: I always did architectural work in all of my jobs before, just never as my official title.
I hope in the coming months that I’ll be able to post here more often on topics of interest. I’ve been doing a lot of interesting things, and I have a lot to say about them. As always, stay tuned — the best is yet to come!