It shouldn't be surprising that text that's repeatedly fed into the LLMs for training is flagged as similar to what comes out of them. The constitution, the bible, and other works commonly spread across the internet are going to likewise be flagged.
If you’re serious about using a particular method to detect something, and you’re worried about false positives/negatives, you have to dig a bit deeper into the statistics. Just like with the covid tests, you need to be aware of the sensitivity and specificity of the method instead of just calling it unreliable.
Interesting article that goes into some specific detail on what things AI detectors look out for.
Interestingly, after reading the article I was able to get ChatGPT to write an essay that both GPTZero and ChatGPT classified as human-written just by asking it to write with "very high perplexity" (and then with "more perplexity" after the first one failed to pass the test).
Maybe writing assignments should be done in class, instead of at home. Anything you let students complete in their own time has always been open to cheating, via calculators, excessive help from parents, or straight up paying someone online to write it for you. This isn't really any different, albeit a bit faster and cheaper. You always need to stand behind a kid and watch them work if you want to be sure they're really doing it themselves.