Many modern theories in cognitive science posit that the brain's objective is to be a kind of "prediction machine" to predict the incoming stream of sensory information from the top down, as well as processing it from the bottom up. This is sometimes referred to through the aphorism "perception is controlled hallucination".
In a sense... yes! Although of course it's thought to be across many modalities and time-scales, and not just text. Also a crucial piece of the picture is the Bayesian aspect - which also involves estimating one's uncertainty over predictions. Further info: https://en.wikipedia.org/wiki/Predictive_coding
It's also important to note the recent trends towards so-called "Embodied" and "4E cognition", which emphasize the importance of being situated in a body, in an environment, with control over actions, as essential to explaining the nature of mental phenomena.
But yeah, it's very exciting how in recent years we've begun to tap into the power of these kinds of self-supervised learning objectives for practical applications like Word2Vec and Large Language/Multimodal Models.
We can have robots with bodies that talk and form relationships with people now. Not deep intimate relationships, but simple things like maintaining conversations with people. You wouldn’t need much more software on top of the LLM to make a really functional person.
in which strategies are proposed to decouple the AI's internal "world model" from its language capabilities, to facilitate hierarchical planning and mitigate hallucination.