A decent heuristic to think about is early work on edge detection and how that sort of thinking affects hands and feet. How many edges does an arm have? Just the outside, relative to your body, and the inside. How many does a boob? One more than the arm if you consider the boob to be three sides of a square. How about something more complicated like a nose? The outside boundaries where the nose meets the face and the nostrils. Now take fingers, how many edges are there? At least 10 per hand. Now consider that predicting each edge has some error probability and that these probabilities aren't independent. And so you end up with a magnitude of an error greater than other edge representation of body parts.
Now modern AI isn't edge detection but latent feature detection and there are many more latent features than edges. And so the compounding of error is greater relative to features representing other body parts.