7mo ago

New two-stage PixArt ensemble of experts (2x 900M)

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-07-21 03:28:39+00:00.

why

As the original 900M PixArt project went well, a recurring thought was something like, "what if the model only had to deal with half of the tasks?"

we're discrete here, m'am

Image generation for the PixArt model is technically 1000 discrete tasks it is learning to do. That's a lot. I can't do that many things! We can split the workload in two, similar to SDXL's failed attempt at the same thing. The first stage is composition and depth and then we've got minor repairs and fine details to augment and finish off.

this is just sdxl, right?

SDXL's approach to e-Diffi was incomplete to the point of being flawed: the base model was trained on all 1000 timesteps. The original e-Diffi setup from NVIDIA had 7 experts all trained exclusively on their portion of the timestep schedule.

Additionally, SDXL was just plain inefficient - its base model had 2.6B parameters and its refiner has more than 3B parameters.

PixArt 900M, a strong base model platform

PixArt 900M expands the original PixArt Sigma 600M model with an additional 14 layers, but that's really not enough to help it learn things like typography.

PixArt Sigma 600M and 900M both display very strong fine details - it's actually harder to get them to learn composition! Theoretically, the 2nd stage of image generation (the final 400 timesteps) could be holding it back, as they are such different objectives.

Thus, to train these models, you have to restrict their learned timestep ranges in the same way.

shut up, give us weights already

You'll need to use an SDXL refiner-like workflow for PixArt. For Diffusers folks, check out the main python script in the HF Space.

For people who just want to give it a whirl, you can access it for free via this space:

The weights:

https://huggingface.co/ptx0/pixart-900m-1024-ft-v0.7-stage1

https://huggingface.co/ptx0/pixart-900m-1024-ft-v0.7-stage2

training details

I used simpletuner for this with the following flags:

--refinertraining --refinertrainingstrength=0.4 [--refinertraininginvertschedule]

You only set the inverted schedule on the stage 1 model. See the MIXTUREOFEXPERTS document and the PixArt Sigma training quickstart guide for more information.

MIT license

allows for commercial use
freely redistribute
requires attribution (Terminus Research Group)