The answer is that it doesn't have to be but it's simpler that way.
A transcript has to be written for a YouTube video essay, and there's not really a major reason you couldn't turn your transcript and footage into an article with a series of embedded gifs or graphics. It's just much simpler for creators to dedicate their time to one medium and not worry about losses in translation or similar issues.
For so many YouTube videos, I just end up expanding the video description, and that gives me all the information contained in what would otherwise be 15 minutes of my time.