Journal: Dr. Dobb's Journal Nov 1992 v17 n11 p171(6) ------------------------------------------------------------------------- Title: The good, the bad, and the run-sliced. (Bresenham's run-length slice algorithm) (Graphics Programming) (Column) Author: Abrash, Michael Attached: Program: GP-NOV92.ASC Source code listing. Program: XSHARP21.ZIP Library for 3-D animation. Abstract: Programmers developing graphics applications should never believe that they have developed the fastest possible code; others often have better ideas. Performance-intensive applications should never have code that performs the same calculation more than once. Programmers have a duty to implement the desired functions while enabling the hardware to work as efficiently as possible. Bresenham's run-length slice algorithm is a good model for efficient programming. The approach implemented by the algorithm steps one pixel at a time along the major axis, while an integer term indicating how close the line is to advancing halfway to the next major pixel along the minor axis is maintained. The algorithm provides an automatic decision-making structure that allows infrequent decisions to be made as quickly as possible. ------------------------------------------------------------------------- Full Text: Years ago, I worked at a company that asked me to write blazingly fast line-drawing code for an AutoCAD driver. I implemented the basic Bresenham's line-drawing algorithm; streamlined it as much as possible; special-cased horizontal, diagonal, and vertical lines; broke out separate, optimized routines for lines in each octant; and massively unrolled the loops. When I was done, I had line drawing down to a mere five or six instructions per pixel, and I handed the code over to the AutoCAD driver person, content in the knowledge that I had pushed the theoretical limits of the Bresenham's algorithm on the 80x86 architecture, and that this was as fast as line drawing could get on a PC. That feeling lasted for about a week, until Dave Miller, who these days is a Windows display-driver whiz at Engenious Solutions, casually mentioned Bresenham's faster run-length slice line-drawing algorithm. Remember Bill Murray's safety tip in Ghostbusters? It goes something like this. Harold Ramis tells the Ghostbusters not to cross the beams of the antighost guns. "Why?" Murray asks. "It would be bad," Ramis says. Murray says, "I'm fuzzy on the whole good/bad thing. What exactly do you mean by 'bad'?" It turns out that what Ramis means by bad is basically the destruction of the universe. "Important safety tip," Murray comments dryly. I learned two important safety tips from my line-drawing experience; neither involves the possible destruction of the universe, so far as I know, but they are nonetheless worth keeping in mind. First, never, never, never think you've written the fastest possible code. Odds are, you haven't. Run your code past another good programmer, and he or she will probably say, "But why don't you do this?" and you'll realize that you could indeed do that, and your code would then be faster. Or relax and come back to your code later, and you may well see another, faster approach. There are a million ways to implement code for any task, and you can almost always find a faster way if you need to. Second, when performance matters, never have your code perform the same calculation more than once. This sounds obvious, but it's astonishing how often it's ignored. For example, consider the snippet of code in Example 1. Here, the programmer knows which way the line is going before the main loop begins--but nonetheless performs that test every time through the loop, when calculating the address of the next pixel. Far better to perform the test only once, outside the loop, as shown in Example 2. Think of it this way. A program is a state machine. It takes a set of inputs and produces a corresponding set of outputs by passing through a set of states. Your primary job as a programmer is to implement the desired state machine. Your additional job as a performance programmer is to minimize the lengths of the paths through the state machine. This means performing as many tests and calculations as possible outside the loops, so that the loops themselves can do as little work--pass through as few states--as possible. Which brings us full circle to Bresenham's run-length slice line-drawing algorithm, which just happens to be an excellent example of a minimized state machine. In case you're fuzzy on the good/bad performance thing, that's "good"--as in fast. Run-length Slice Fundamentals First off, I have a confession to make: I'm not sure that the algorithm I'll discuss is actually, precisely Bresenham's run-length slice algorithm. It's been a long time since I read about this algorithm; in the intervening years, I've misplaced Bresenham's article, and I was unable to locate it in time for this column. (Vermont libraries leave something to be desired in the high-tech area.) As a result, I had to derive the algorithm from scratch, which was admittedly more fun than reading about it, and also ensured that I understood it inside and out. The upshot is that what I discuss may or may not be Bresenham's run-length slice algorithm--but it surely is fast. The place to begin understanding the run-length slice algorithm is the standard Bresenham's line-drawing algorithm. (I discussed the standard Bresenham's algorithm at length in the May 1989 issue of the now-defunct Programmer's Journal.) The basis of the standard approach is stepping one pixel at a time along the major axis (the longer dimension of the line), while maintaining an integer error term that indicates at each major-axis step how close the line is to advancing halfway to the next pixel along the minor axis. Figure 1 illustrates standard Bresenham's line drawing. The key point here is that a calculation and a test are performed once for each step along the major axis. The run-length slice algorithm rotates matters 90 degrees, with salubrious results. The basis of the run-length slice algorithm is stepping one pixel at a time along the minor axis (the shorter dimension), while maintaining an integer error term indicating how close the line is to advancing an extra pixel along the major axis, as illustrated by Figure 2. Consider this: When you're called upon to draw a line with an X-dimension of 35 and a Y-dimension of 10, you have a great deal of information available, some of which is ignored by standard Bresenham's. In particular, because the slope is between 1/3 and 1/4, you know that every single run--a run being a set of pixels at the same minor-axis coordinate--must be either three or four pixels long. No other length is possible, as shown in Figure 3 (apart from the first and last runs, which are special cases that I'll discuss shortly). Therefore, for this line, there's no need to perform an error-term calculation and test for each pixel. Instead, we can just perform one test per run, to see whether the run is three or four pixels long, thereby eliminating about 70 percent of the calculations in drawing this line. Take a moment to let the idea behind run-length slice drawing soak in. Periodic decisions must be made to control pixel placement. The key to speed is to make those decisions as infrequently and quickly as possible. Of course, it will work to make a decision at each pixel--that's standard Bresenham's. However, most of those per-pixel decisions are redundant, and in fact we have enough information before we begin to know which are the redundant decisions. Run-length slice drawing is exactly equivalent to standard Bresenham's, but it pares the decision-making process down to a minimum. It's some-what analogous to the difference between finding the greatest common divisor of two numbers using Euclid's algorithm and finding it by trying every possible divisor. Both approaches produce the desired result, but that which takes maximum advantage of the available information and minimizes redundant work is preferable. Run-length Slice Implementation We know that for any line, a given run will always be one of two possible lengths. How, though, do we know which length to select? Surprisingly, this is easy to determine. For the following discussion, assume that we have a slope of 1/3.5, so that X is the major axis; however, the discussion also applies to Y-major lines, with X and Y reversed. The minimum possible length for any run in an X-major line is int(XDelta/YDelta), where XDelta is the X-dimension of the line and YDelta is the Y-dimension. The maximum possible length is int(XDelta/YDelta) + 1. The trick, then, is knowing which of these two lengths to select for each run. To see how we can make this selection, refer to Figure 4. For each one-pixel step along the minor axis (Y, in this case), we advance at least three pixels. The full advance distance along X (the major axis) is actually three-plus pixels, because there is also a fractional portion to the advance along X for a single-pixel Y step. This fractional advance is the key to deciding when to add an extra pixel to a run. The fraction indicates what portion of an extra pixel we advance along X (the major axis) during each run. If we keep a running sum of the fractional parts, we have a measure of how close we are to needing an extra pixel; when the fractional sum reaches 1, it's time to add an extra pixel to the current run. Then we can subtract 1 from the running sum (because we just advanced one pixel), and continue on. Practically speaking, however, we can't work with fractions because floating-point arithmetic is slow and fixed-point arithmetic is imprecise. Therefore, we take a cue from standard Bresenham's and scale all the error-term calculations up so that we can work with integers. The fractional X (major axis) advance per one-pixel Y (minor axis) advance is the fractional portion of XDelta/YDelta. This value is exactly equivalent to (XDelta % YDelta)/YDelta. We'll scale this up by multiplying it by YDelta*2, so that the amount by which we adjust the error term up for each one-pixel minor-axis advance is (XDelta % YDelta)*2. We'll similarly scale up the one pixel by which we adjust the error term down after it turns over, so our downward error-term adjustment is YDelta*2. Therefore, before drawing each run, we'll add (XDelta % YDelta)*2 to the error term. If the error term turns over (reaches one full pixel), then we'll lengthen the run by 1, and subtract YDelta*2 from the error term. (All values are multiplied by 2 so that the initial error term, which involves a 0.5 term, can be scaled up to an integer, as discussed below.) This is not a complicated process, involving only integer addition and subtraction and a single test, and it lends itself to many and varied optimizations. For example, you could break out hardwired optimizations for drawing each possible pair of run lengths. For the aforementioned line with a slope of 1/3.5, for example, you could have one routine hardwired to blast in a run of three pixels as quickly as possible, and another hardwired to blast in a run of four pixels. These routines would ideally have no looping, but rather just a series of instructions customized to draw the desired number of pixels at maximum speed. Each routine would know that the only possibilities for the length of the next run would be three and four, so they could increment the error term, then jump directly to the appropriate one of the two routines depending on whether the error term turned over. Properly implemented, it should be possible to reduce the average per-run overhead of line drawing to less than one branch, with only two additions and two tests (the number of runs must also be counted down), plus a subtraction half the time. On a 486, this amounts to something on the order of 150 nanoseconds of overhead per pixel, exclusive of the time required to actually write the pixel to display memory. That's good. Run-length Slice Details A couple of run-length slice implementation details yet remain. First is the matter of how error-term turnover is detected. This is done in much the same way as it is with standard Bresenham's: The error term is initialized to a value equivalent to -1 pixel and incremented for each step; when the error term reaches 0, we've advanced one full pixel along the major axis, and it's time to add an extra pixel to the current run. This means that we only have to test the sign of the error term after advancing it to determine whether or not to add an extra pixel to each run. The second and more difficult detail is balancing the runs so that they're centered around the ideal line, and therefore draw the same pixels that standard Bresenham's would draw. If we just drew full-length runs from the start, we'd end up with an unbalanced line, as shown in Figure 5. Instead, we have to split the initial pixel plus one full run as evenly as possible between the first and last runs of the line, and adjust the initial error term appropriately for the initial half-run. The initial error term is simply one-half of the normal fractional advance along the major axis, because the initial step is only one-half pixel along the minor axis. This half-step gets us exactly halfway between the initial pixel and the next pixel along the minor axis. All the error-term adjusts are scaled up by two times precisely so that we can scale up this halved error term for the initial run by two times, and thereby make it an integer. The other trick here is that if an odd number of pixels are allocated between the first and last partial runs, we'll end up with an odd pixel, since we are unable to draw a half-pixel. This odd pixel is accounted for by adding half a pixel to the error term. That's all there is to run-length slice line drawing; the partial first and last runs are the only tricky part. Listing One (page 190) is a run-length slice implementation in C. This is not an optimized implementation, nor is it meant to be; this listing is provided so that you can see how the run-length slice algorithm works. Next month, I'll move on to an optimized version, but this month's listing will make it much easier to grasp the principles of run-length slice drawing, and to understand next month's code. Notwithstanding that it's not optimized, Listing One is reasonably fast. If you run Listing Two (page 191), a sample line-drawing program that you can use to test-drive Listing One, you may be as surprised as I was at how quickly the screen fills with vectors, considering that Listing One is entirely in C and has some redundant divides. Or perhaps you won't be surprised, in which case I suggest you check back next month. Next Time Next month, I'll switch to assembly language and speed up run-length slice lines considerably. I'll also spend some time discussing the limitations of run-length slice drawing, and I'll look at possible further optimizations. After that, perhaps we'll have a look at seed fills, or more 3-D animation, or some new 2-D animation topics--or maybe something completely different. Your suggestions are, as always, welcome.