Improving performance by moving the inner loop to integer accumulators, and replacing per-pixel division with a precomputed reciprocal that runs as a multiply and a shift.