Yeah this smells like a bug in Caddy or something. I agree to try nginx or something else to see if it’s Caddy or if it’s something with the configuration of the host. The only thing I could think of is if caddy isn’t caching DNS responses and maybe is getting rate limited so it appears slower while it’s waiting on the DNS request but I am shooting in the dark as I haven’t spent much time with caddy.
I mentioned above- the subdomains were using HTTP/3, and the root entry does not. I don't know if it's something I have mis-configured or just HTTP/3 being new and maybe buggy. Either way, i disabled it globally and performance is the same.
I can't remember if it's enabled by default or not, but it's easy enough to enable pprof and get a helpful performance profile from /debug/pprof. See https://caddy.community/t/hangs-on-reload/12010/18 for an example.
I've found that even being unfamiliar with the codebase, it's often pretty easy to identify what part of the call stack is being slow and file a very useful performance but report in GitHub. Check out the profile and see if it leads to any obvious conclusions about why domains are so much slower. There may be some function that's trivial to cache the results of that brings things back to the expected performance.
I tried to dig into that but couldn't come up with a good test. But if NAT hairpinning wasn't working right, I'd be limited to my ISPs 50Mbit, right? I could get 200+ Mbit on wifi. I also tested this from work (50 Mbit sym fiber) and subdomains always were slower. I figured out today it's HTTP/3 causing my problems. I don't know if I care to troubleshoot anymore since it's working great with http 1 & 2.