It's a question that is based on a purposeful misunderstanding of the technology, it's like expecting a bee keeper to know each bees name and bedtime. Really it's like asking a bricklayer where each brick came from in the pile, He can tell you the batch but not going to know this brick came from the forth row of the sixth pallet, two from the left. There is no reason to remember that it's not important to anyone.
The don't log it because it would take huge amounts of resources and gain nothing.
Compiling quality datasets is enormously challenging and labour intensive. OpenAI absolutely knows the provenance of the data they train on as it's part of their secret sauce. And there's no damn way their CTO won't have a broad strokes understanding of the origins of those datasets.
To be fair, these datasets are one of their biggest competitive edge. But saying in to interviewer "I cannot tell you", is not very nice, so you can take the americal politician approach and say "I don't know/remember" which you cannot ever be hold accountable for.