Stable Diffusion
- Samachiy/Qinti: An A1111 frontend focused on ease of use, packaged as a click-and-run executable.
YouTube Video
Click to view this content.
https://github.com/Samachiy/Qinti
- Evaluating how different keywords affect realism in SDXL | Magicflow.aiwww.magicflow.ai Evaluating how different keywords affect realism in SDXL | Magicflow.ai
SDXL is great for realistic photos, learn how some keywords make it even better while some do not.
- rupeshs/fastsdcpu Release v1.0.0-beta.33 - Aura SR(4x upscale GigaGAN based) supportgithub.com GitHub - rupeshs/fastsdcpu: Fast stable diffusion on CPU
Fast stable diffusion on CPU. Contribute to rupeshs/fastsdcpu development by creating an account on GitHub.
- An Update and FAQ on the Open Model Initiative — Redditold.reddit.com /r/StableDiffusion/comments/1dp2as9/update_and_faq_on_the_open_model_initiative_your/
Quoted from Reddit:
Hello r/StableDiffusion --
A sincere thanks to the overwhelming engagement and insightful discussions following our announcement yesterday of the Open Model Initiative. If you missed it, check it out here.
We know there are a lot of questions, and some healthy skepticism about the task ahead. We'll share more details as plans are formalized -- We're taking things step by step, seeing who's committed to participating over the long haul, and charting the course forwards.
That all said - With as much community and financial/compute support as is being offered, I have no hesitation that we have the fuel needed to get where we all aim for this to take us. We just need to align and coordinate the work to execute on that vision. -----
We also wanted to officially announce and welcome some folks to the initiative, who will support with their expertise on model finetuning, datasets, and model training:
- AstraliteHeart, founder of PurpleSmartAI and creator of the very popular PonyXL models
- Some of the best model finetuners including Robbert "Zavy" van Keppel and Zovya
- Simo Ryu, u/cloneofsimo, a well-known contributor to Open Source AI
- Austin, u/AutoMeta, Founder of Alignment Lab AI
- Vladmandic & SD.Next
- And over 100 other community volunteers, ML researchers, and creators who have submitted their request to support the project
Due to voiced community concern, we’ve discussed with LAION and agreed to remove them from formal participation with the initiative at their request. Based on conversations occurring within the community we’re confident that we’ll be able to effectively curate the datasets needed to support our work.
-----
Frequently Asked Questions (FAQs) for the Open Model Initiative
We’ve compiled a FAQ to address some of the questions that were coming up over the past 24 hours.
How will the initiative ensure the models are competitive with proprietary ones?
We are committed to developing models that are not only open but also competitive in terms of capability and performance. This includes leveraging cutting-edge technology, pooling resources and expertise from leading organizations, and continuous community feedback to improve the models.
The community is passionate. We have many AI researchers who have reached out in the last 24 hours who believe in the mission, and who are willing and eager to make this a reality. In the past year, open-source innovation has driven the majority of interesting capabilities in this space.
We’ve got this.
What does ethical really mean?
We recognize that there’s a healthy sense of skepticism any time words like “Safety” “Ethics” or “Responsibility” are used in relation to AI.
With respect to the model that the OMI will aim to train, the intent is to provide a capable base model that is not pre-trained with the following capabilities:
- Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts
- Generating the likeness of unconsented individuals
- The production of AI Generated Child Sexual Abuse Material (CSAM).
There may be those in the community who chafe at the above restrictions being imposed on the model. It is our stance that these are capabilities that don’t belong in a base foundation model designed to serve everyone.
The model will be designed and optimized for fine-tuning, and individuals can make personal values decisions (as well as take the responsibility) for any training built into that foundation. We will also explore tooling that helps creators reference styles without the use of artist names.
Okay, but what exactly do the next 3 months look like? What are the steps to get from today to a usable/testable model?
We have 100+ volunteers we need to coordinate and organize into productive participants of the effort. While this will be a community effort, it will need some organizational hierarchy in order to operate effectively - With our core group growing, we will decide on a governance structure, as well as engage the various partners who have offered support for access to compute and infrastructure.
We’ll make some decisions on architecture (Comfy is inclined to leverage a better designed SD3), and then begin curating datasets with community assistance.
What is the anticipated cost of developing these models, and how will the initiative manage funding?
The cost of model development can vary, but it mostly boils down to the time of participants and compute/infrastructure. Each of the initial initiative members have business models that support actively pursuing open research, and in addition the OMI has already received verbal support from multiple compute providers for the initiative. We will formalize those into agreements once we better define the compute needs of the project.
This gives us confidence we can achieve what is needed with the supplemental support of the community volunteers who have offered to support data preparation, research, and development.
Will the initiative create limitations on the models' abilities, especially concerning NSFW content?
It is not our intent to make the model incapable of NSFW material. “Safety” as we’ve defined it above, is not restricting NSFW outputs. Our approach is to provide a model that is capable of understanding and generating a broad range of content.
We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM.
What license will the model and model weights have?
TBD, but we’ve mostly settled between an MIT or Apache 2 license.
What measures are in place to ensure transparency in the initiative’s operations?
We plan to regularly update the community on our progress, challenges, and changes through the official Discord channel. As we evolve, we’ll evaluate other communication channels.
Looking Forward
We don’t want to inundate this subreddit so we’ll make sure to only update here when there are milestone updates. In the meantime, you can join our Discord for more regular updates.
If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI.
Thank you for your support and enthusiasm!
Sincerely,
The Open Model Initiative Team
- Generative AI: Learn How to Control Image Generation with Stable Diffusionwww.sicara.fr Stable Diffusion: A Powerful Tool to Control Image Generation
Explore the power of Stable Diffusion for AI-driven image generation. Master image control and enhance creativity with our expert guide and tips.
- Civitai Joins the Open Model Initiative | Civitaicivitai.com Civitai Joins the Open Model Initiative | Civitai
Today, we’re excited to announce the launch of the Open Model Initiative, a new community-driven effort to promote the development and adoption of ...
- Stability AI Secures Significant New Investment — Stability AIstability.ai Stability AI Secures Significant New Investment — Stability AI
Stability AI Secures Significant New Investment from World-Class Investor Group and Appoints Prem Akkaraju as CEO.
- The Open Model Initiative - Invoke, Comfy Org, Civitai and LAION, and others coordinating a new next-gen model. - r/StableDiffusionold.reddit.com Error: Failed to parse page JSON data: EOF while parsing a value at line 1 column 0 | /r/StableDiffusion/comments/1do5gvz/the_open_model_initiative_invoke_comfy_org/.json?&raw_json=1
View on Redlib, an alternative private front-end to Reddit.
Quoted from Reddit:
Today, we’re excited to announce the launch of the Open Model Initiative, a new community-driven effort to promote the development and adoption of openly licensed AI models for image, video and audio generation.
We believe open source is the best way forward to ensure that AI benefits everyone. By teaming up, we can deliver high-quality, competitive models with open licenses that push AI creativity forward, are free to use, and meet the needs of the community.
Ensuring access to free, competitive open source models for all.
With this announcement, we are formally exploring all available avenues to ensure that the open-source community continues to make forward progress. By bringing together deep expertise in model training, inference, and community curation, we aim to develop open-source models of equal or greater quality to proprietary models and workflows, but free of restrictive licensing terms that limit the use of these models.
Without open tools, we risk having these powerful generative technologies concentrated in the hands of a small group of large corporations and their leaders. From the beginning, we have believed that the right way to build these AI models is with open licenses. Open licenses allow creatives and businesses to build on each other's work, facilitate research, and create new products and services without restrictive licensing constraints. Unfortunately, recent image and video models have been released under restrictive, non-commercial license agreements, which limit the ownership of novel intellectual property and offer compromised capabilities that are unresponsive to community needs.
Given the complexity and costs associated with building and researching the development of new models, collaboration and unity are essential to ensuring access to competitive AI tools that remain open and accessible.
We are at a point where collaboration and unity are crucial to achieving the shared goals in the open source ecosystem. We aspire to build a community that supports the positive growth and accessibility of open source tools.
For the community, by the community
Together with the community, the Open Model Initiative aims to bring together developers, researchers, and organizations to collaborate on advancing open and permissively licensed AI model technologies.
The following organizations serve as the initial members:
- Invoke, a Generative AI platform for Professional Studios
- ComfyOrg, the team building ComfyUI
- Civitai, the Generative AI hub for creators
- LAION, one of the largest open source data networks for model training
To get started, we will focus on several key activities:
•Establishing a governance framework and working groups to coordinate collaborative community development.
•Facilitating a survey to document feedback on what the open-source community wants to see in future model research and training
•Creating shared standards to improve future model interoperability and compatible metadata practices so that open-source tools are more compatible across the ecosystem
•Supporting model development that meets the following criteria:
- True open source: Permissively licensed using an approved Open Source Initiative license, and developed with open and transparent principles
- Capable: A competitive model built to provide the creative flexibility and extensibility needed by creatives
- Ethical: Addressing major, substantiated complaints about unconsented references to artists and other individuals in the base model while recognizing training activities as fair use.
We also plan to host community events and roundtables to support the development of open source tools, and will share more in the coming weeks.
Join Us
We invite any developers, researchers, organizations, and enthusiasts to join us.
If you’re interested in hearing updates, feel free to join our Discord channel.
If you're interested in being a part of a working group or advisory circle, or a corporate partner looking to support open model development, please complete this form and include a bit about your experience with open-source and AI.
Sincerely,
Kent Keirsey CEO & Founder, Invoke
comfyanonymous Founder, Comfy Org
Justin Maier CEO & Founder, Civitai
Christoph Schuhmann Lead & Founder, LAION
- SD.Next Release for 2024-06-23github.com GitHub - vladmandic/automatic: SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models - vladmandic/automatic
Highlights for 2024-06-23
Following zero-day SD3 release, a 10 days later here's a refresh with 10+ improvements including full prompt attention, support for compressed weights, additional text-encoder quantization modes.
But there's more than SD3:
- support for quantized T5 text encoder FP16/FP8/FP4/INT8 in all models that use T5: SD3, PixArt-Σ, etc.
- support for PixArt-Sigma in small/medium/large variants
- support for HunyuanDiT 1.1
- additional NNCF weights compression support: SD3, PixArt, ControlNet, Lora
- integration of MS Florence VLM/VQA Base and Large models
- (finally) new release of Torch-DirectML
- additional efficiencies for users with low VRAM GPUs
- over 20 overall fixes
- Update on SD3 on Civitai | Civitaicivitai.com Update on SD3 on Civitai | Civitai
Standard disclaimer; This post does not constitute legal advice. How you interact with SAI and their product is up to you. You should seek your own...
- MackinationsAi/Upgraded-Depth-Anything-V2github.com GitHub - MackinationsAi/Upgraded-Depth-Anything-V2: Upgraded repo includes more capabilities, converted the cmd .py scripts to function more intuitively, added 147 different depth output colour map methods, introduced batch image as well as video processing, everything is automatically saved to an outputs folder (w/ file-naming conventions) & I've converted the .pth models to .safetensors.
Upgraded repo includes more capabilities, converted the cmd .py scripts to function more intuitively, added 147 different depth output colour map methods, introduced batch image as well as video pr...
- The Next Step for ComfyUIblog.comfy.org The Next Step for ComfyUI
As some of you already know, I have resigned from Stability AI and am starting a new chapter. I am partnering with mcmonkey4eva, Dr.Lt.Data, pythongossssss, robinken, and yoland68 to start Comfy Org. We will continue to develop and improve ComfyUI with a lot more resources. As you might
- Comfy Sigma Portable - Standalone PixArt Sigma solution for beginners - v0.1civitai.com Comfy Sigma Portable - Standalone PixArt Sigma solution for beginners. - v0.1 | Stable Diffusion Workflows | Civitai
Important Note: Comfy Sigma Portable - is a portable standalone package based on comfyUI and designed for beginners. Main moto - easy to unzip, eas...
- The future of Comfy UI - Comfy Orgwww.comfy.org Comfy Org
Creators of ComfyUI. We are a team dedicated to iterate and improve ComfyUI, support the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation.
- MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Abstract
>Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. > >To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. > >The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.
Paper: https://arxiv.org/abs/2406.10163
Code: https://github.com/buaacyw/MeshAnythin
Project Page: https://buaacyw.github.io/mesh-anything/
- Licensing Questions Plague Stability AI as SD3 Image Generator Gets Banneddecrypt.co Licensing Questions Plague Stability AI as SD3 Image Generator Gets Banned - Decrypt
A leading AI image generator is mired in controversy, and community hub CivitAI is shutting it out for now.
- Help with Running the Pixel Art Generator Locallyperchance.org AI Pixel Art Generator (free, no sign-up, no limits)
AI pixel art maker. Create sprites, landscapes, portraits, characters, scenes. Make pixel art OCs, villains, RPG/DnD/fantasy/fictional characters from text, via Stable Diffusion - it's completely free, no sign-up needed. Can do pixel art, and various other styles. Get the AI to draw your pixel art c...
Hello,
I’m trying to run the pixel art generator (https://perchance.org/ai-pixel-art-generator) locally on my machine so that I can run it programmatically from Python. From what I’ve gathered (mainly from this post: https://lemmy.world/post/5926365), the model behind the generator is SD 1.5, however, I’ve tried running it locally (downloaded from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main, tried both emaonly and pruned) and I can’t seem to be able to replicate the results. It would be helpful to know the exact configuration used to prompt SD 1.5, or provide some more help on how to set up some API, or even point to some Github with the code that I haven’t been able to find so that might not exist. I’ve tried to read all the documentation I could find but was not able to use any of the provided resources (like this one: https://perchance.org/diy-perchance-api, or this one: https://perchance.org/text-to-image-plugin). If anyone could help with any of the above, I would be eternally grateful <3. I will also list everything I’ve pieced together so far on how the generator works in case someone else might find it useful. The images generated with this configuration are similar in style to what pops out of the generator but they are just fundamentally different in quality. The ones from the generator are much better at depicting the prompt.
What I’ve gathered so far:
- Model used: SD 1.5
- Width x Height: 512x512
- Sampling method (not a clue what this should be): DPM++ SDE
- Prompt: <prompt>, best pixel art, neo-geo graphical style, retro nostalgic masterpiece, 128px, 16-bit pixel art , 2D pixel art style, adventure game pixel art, inspired by the art style of hyper light drifter, masterful dithering, superb composition, beautiful palette, exquisite pixel detail
- Negative Prompt: glitched, deep fried, jpeg artifacts, out of focus, gradient, soft focus, low quality, poorly drawn, blur, grainy fabric texture, text, bad art, boring colors, blurry platformer screenshot
- Temporary Stable Diffusion 3 Ban | Civitaicivitai.com Temporary Stable Diffusion 3 Ban | Civitai
Unfortunately, due to a lack of clarity in the license associated with Stable Diffusion 3 , we are temporarily banning: All SD3 based models All mo...
- How British tech star Stability AI imploded with debt and lawsuitswww.thetimes.com How British tech star Stability AI imploded with debt and lawsuits
Unpaid bills, key talent leaving — the fall of Emad Mostaque’s billion-dollar company is a cautionary tale of AI mania
Without paywall: https://archive.ph/QD9v1
- The developer of Comfy, who also helped train some versions of SD3, has resigned from SAI
Excerpt from the relevant “ComfyUI dev” Matrix room:
matt3o\ and what is it then?
comfyanonymous\ "safety training"
matt3o\ why does it trigger on certain keywords and it's like it's scrambling the image?
comfyanonymous\ the 2B wasn't the one I had been working on so I don't really know the specifics
matt3o\ I was even able to trick it by sending certain negatives
comfyanonymous\ I was working on a T5 only 4B model which would ironically had been safer without breaking everything\ because T5 doesn't know any image data so it was only able to generate images in the distribution of the filtered training data
comfyanonymous\ but they canned my 4B and I wasn't really following the 2B that closely
[…]
comfyanonymous\ yeah they did something with the weights\ the model arch of the 2B was never changed at all
BVH\ weights directly?\ oh boy, abliteration, the worst kind
comfyanonymous\ also they apparently messed up the pretraining on the 2B so it was never supposed to actually be released
[…]
comfyanonymous\ yeah the 2B apparently was a bit of a failed experiment by the researchers that left\ but there was a strong push by the top of the company to release to 2B instead of the 4B and 8B
Additional excerpt (after the Reddit post) from Stable Diffusion Discord “#sd3”:
comfy\ Yes I resigned over 2 weeks ago and Friday was my last day at stability
- EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Video
Click to view this content.
Abstract
>Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image generation model accepting multi-modal prompts built upon the state-of-the-art text-to-image (T2I) diffusion model, ELLA. EMMA seamlessly incorporates additional modalities alongside text to guide image generation through an innovative Multi-modal Feature Connector design, which effectively integrates textual and supplementary modal information using a special attention mechanism. By freezing all parameters in the original T2I diffusion model and only adjusting some additional layers, we reveal an interesting finding that the pre-trained T2I diffusion model can secretly accept multi-modal prompts. This interesting property facilitates easy adaptation to different existing frameworks, making EMMA a flexible and effective tool for producing personalized and context-aware images and even videos. Additionally, we introduce a strategy to assemble learned EMMA modules to produce images conditioned on multiple modalities simultaneously, eliminating the need for additional training with mixed multi-modal prompts. Extensive experiments demonstrate the effectiveness of EMMA in maintaining high fidelity and detail in generated images, showcasing its potential as a robust solution for advanced multi-modal conditional image generation tasks.
Paper: https://arxiv.org/abs/2406.09162
Code: https://github.com/TencentQQGYLab/ELLA
Project Page: https://tencentqqgylab.github.io/EMMA/
- The Evolution of Image Gen with SD3civitai.com The Evolution of Image Gen with SD3 | Civitai
What a day! SD3 just landed and it's already been quite a ride , so gather 'round and take a moment to reflect on where we've been, where we are, a...
- Towards Pony Diffusion V7... I mean V6.9! | Civitaicivitai.com Towards Pony Diffusion V7... I mean V6.9! | Civitai
Hello everyone, In my latest update on Pony Diffusion , I expressed my interest in leveraging SD3 for the upcoming V7, so let’s talk about it! Toda...
- Stable Diffusion 3 Medium — Stability AIstability.ai Stable Diffusion 3 Medium — Stability AI
We are excited to announce the launch of Stable Diffusion 3 Medium, the latest and most advanced text-to-image AI model in our Stable Diffusion 3 series.
Stable Diffusion 3 Medium Weights: https://huggingface.co/stabilityai/stable-diffusion-3-medium
Stable Diffusion 3 Medium TensorRT Weights: https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt
ComfyUI Example Workflows: https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main/comfy_example_workflows
- Hackers Target AI Users With Malicious Stable Diffusion Tool on Github to Protest 'Art Theft'www.404media.co Hackers Target AI Users With Malicious Stable Diffusion Tool on Github to Protest 'Art Theft'
An extension for a popular Stable Diffusion graphical user interface on Github appears to have been stealing users’ login credentials.
- Regions - Linking Prompts to Paint Layers - Krita AI Plugin
YouTube Video
Click to view this content.
>Regions are new in Krita Diffusion Plugin v1.18.0! Set up area-specific text prompts and control layers. They are linked to layer groups, and can be re-used throughout your entire workflow.
Website: https://www.interstice.cloud
GitHub: https://github.com/Acly/krita-ai-diffusion
- Project Odyssey - Announcing $28,000 AI filmmaking Competitionwww.projectodyssey.ai Project Odyssey
A worldwide initiative to unite creators, communities, and companies across the AI and Film industries.
- lllyasviel stable-diffusion-webui-forge Announcement (June 8)github.com Forge Announcement (June 8) · lllyasviel stable-diffusion-webui-forge · Discussion #801
Hi forge users, Today the dev branch of upstream sd-webui has updated many progress about performance. Many previous bottlenecks should be resolved. As discussed here, we recommend a majority of us...
> Hi forge users, > > Today the dev branch of upstream sd-webui has updated many progress about performance. Many previous bottlenecks should be resolved. As discussed here, we recommend a majority of users to change back to upstream webui (directly use webui dev branch or wait for the dev branch to be merged to main). > > At the same time, many features of forge (like unet-patcher and modern memory management) are considered to be too costly to be implemented in the current webui’s ecosystem. > > Forge will then be turned into an experimental repo to mainly test features that are costly to integrate. We will experiment with Gradio 4 and add our implementation of a local GPU version of huggingface space’ zero GPU memory management based on LRU process scheduling and pickle-based process communication in the next version of forge. This will lead to a new Tab in forge called “Forge Space” (based on Gradio 4 SDK
@spaces.GPU
namespace) and another Tab titled “LLM”. > > These updates are likely to break almost all extensions, and we recommend all users in production environments to change back to upstream webui for daily use. > > We invite a small group of users to stay here to test Gradio 4, since feedback and extensions for Gradio 4 are also necessary for upstream’s considerations or adaptations, with regard to gradio’s recent advancement in LLM interface and streaming system, image editors and displays, and Gradio sdk’s seamless integration of zero-gpu computation management system. > > Finally, we recommend forge users to backup your files right now (or just change back to upstream webui if possible). If you mistakenly updated forge without being aware of this announcement, the last commit before this announcement is 29be1da - AIrjen/OneButtonPrompt Adds Anime Model Prompt Supportgithub.com GitHub - AIrjen/OneButtonPrompt: One Button Prompt
One Button Prompt. Contribute to AIrjen/OneButtonPrompt development by creating an account on GitHub.
Readme: https://github.com/AIrjen/OneButtonPrompt/blob/main/user_guides/anime_model_mode.md
- pOps: Photo-Inspired Diffusion Operators
Video
Click to view this content.
Abstract
>Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image embeddings, we demonstrate that it can be tuned to accommodate new input conditions, resulting in a diffusion operator. Working directly over image embeddings not only improves our ability to learn semantic operations but also allows us to directly use a textual CLIP loss as an additional supervision when needed. We show that pOps can be used to learn a variety of photo-inspired operators with distinct semantic meanings, highlighting the semantic diversity and potential of our proposed approach.
Paper: https://arxiv.org/abs/2406.01300
Code: https://github.com/pOpsPaper/pOps
Demo: https://huggingface.co/spaces/pOpsPaper/pOps-space
Project Page: https://popspaper.github.io/pOps/
- Higher quality images by prompting individual UNet blocks
YouTube Video
Click to view this content.
- BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
YouTube Video
Click to view this content.
Abstract
>Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.
Paper: https://arxiv.org/abs/2406.04333
Code: https://github.com/snap-research/BitsFusion (coming soon)
Project Page: https://snap-research.github.io/BitsFusion/
- jhj0517/stable-diffusion-webui-MusePosegithub.com GitHub - jhj0517/stable-diffusion-webui-MusePose: MusePose extension for stable-diffusion-webui
MusePose extension for stable-diffusion-webui. Contribute to jhj0517/stable-diffusion-webui-MusePose development by creating an account on GitHub.
MusePose extracts poses video from input video and images to be used in animating generated characters.
- Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
Abstract
>Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model's precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet. The project link and code is available at this https URL.
Paper: https://arxiv.org/abs/2406.02485
Code: https://github.com/ai-med/StablePose