LocalLLaMA @sh.itjust.works hok @lemmy.dbzer0.com 2 wk. ago

Llama 3.3 70b - End of open-weight pretrained models from Meta or just a better Llama 3.1 405b finetune?

People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1's 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3

However, something to note:

Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.

Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?

Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

The same knowledge cutoff, same amount of training data, and same training time give me hope that it's just a better finetune of maybe Llama 3.1 405b.

You're viewing a single thread.

6 comments

On Huggingface, someone said it's still the same base model: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10

And I remember watching some interview with Zuckerberg this year, where he said releasing the models to the public, including base models, is what he wants and part of their strategy.
- Thank you so much, that exactly answers my question with the official response (that guy works at Meta) that confirms it's the same base model!
  
  I was concerned primarily because in the release notes it strangely didn't mention it anywhere, and I thought it would have been important enough to mention.

You've viewed 6 comments.