The Magic Wand: A GPU/CPU Hybrid Flood Fill
The magic wand tool selects regions of similar color, and it has two modes: non-contiguous (select all matching pixels everywhere) and contiguous (flood fill from a seed point). Non-contiguous is trivially parallel — each GPU thread independently checks its pixel against the seed color. Contiguous is the hard one.
Classic flood fill is inherently sequential: you expand outward from a seed, and each step depends on the previous one. Running BFS on the GPU is possible but complex (wavefront expansion with synchronization barriers). We tried a pure CPU approach first, but reading back the full composited texture and scanning pixels in Swift took ~9 seconds on large images.
The solution was a two-phase hybrid. Phase 1 runs on the GPU: a compute shader marks every pixel as "eligible" (within color tolerance of the seed) or "barrier" (too different) and writes the result to a shared MTLBuffer as a uint8 array. Phase 2 runs on the CPU: a standard BFS starting from the seed pixel, but operating on the eligibility buffer instead of the raw image data. The BFS only visits eligible pixels reachable from the seed, skipping barriers.
The key insight is that the GPU does the expensive work (color comparison across millions of pixels) while the CPU does the topological work (connectivity via BFS) which is fast because it only visits each pixel once and the eligibility buffer is already in shared memory — no texture readback needed.
On top of this, the magic wand supports live drag-to-adjust: hold the mouse button and drag to change tolerance in real-time. Each drag event re-runs the entire pipeline (GPU eligibility + CPU BFS), restoring the original mask first. An antialiasing pass runs afterward that softens corner boundaries while keeping straight edges crisp — it checks whether a boundary pixel has neighbors in both horizontal and vertical directions before smoothing.