Misplaced Pages

Talk:Stable Diffusion: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 00:18, 1 November 2022 editBenlisquare (talk | contribs)Autopatrolled, Extended confirmed users, File movers, Pending changes reviewers, Rollbackers48,208 edits Regarding Square Brackets as Negative Prompt: clarify that this is not exhaustive← Previous edit Revision as of 23:43, 11 November 2022 edit undoSmeagol 17 (talk | contribs)Extended confirmed users56,210 edits Image varietyNext edit →
Line 65: Line 65:
:::Nonsense. The consensus takes time and thorough discussion, there is no ] to rush anything through without dialogue. Also, consider reading ] to see what spam actually is. Your allegations are bordering upon ] here. --]<sub>]•]•]</sub> 18:22, 8 October 2022 (UTC) :::Nonsense. The consensus takes time and thorough discussion, there is no ] to rush anything through without dialogue. Also, consider reading ] to see what spam actually is. Your allegations are bordering upon ] here. --]<sub>]•]•]</sub> 18:22, 8 October 2022 (UTC)
:: If certain types of images are characterisic of the usage of AI in general or this paricular program in particular, why should this artcle pretend otherwise? Of course, it would be ideal if they were published in some RS first, but this is to be balanced with other concerns, like permissive licenses. See ] article illustrations, for example. ] (]) 10:12, 16 October 2022 (UTC) :: If certain types of images are characterisic of the usage of AI in general or this paricular program in particular, why should this artcle pretend otherwise? Of course, it would be ideal if they were published in some RS first, but this is to be balanced with other concerns, like permissive licenses. See ] article illustrations, for example. ] (]) 10:12, 16 October 2022 (UTC)
:So, how about returning at least Inpainting and Outpainting images? They were very illustrative. ] (]) 23:43, 11 November 2022 (UTC)


== Using images to promote unsourced opinions == == Using images to promote unsourced opinions ==

Revision as of 23:43, 11 November 2022

This article has not yet been rated on Misplaced Pages's content assessment scale.
It is of interest to the following WikiProjects:
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
WikiProject iconComputing: Software / CompSci Low‑importance
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Misplaced Pages. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
LowThis article has been rated as Low-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (assessed as Mid-importance).
Taskforce icon
This article is supported by WikiProject Computer science (assessed as Low-importance).
Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
WikiProject iconTechnology
WikiProject iconThis article is within the scope of WikiProject Technology, a collaborative effort to improve the coverage of technology on Misplaced Pages. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TechnologyWikipedia:WikiProject TechnologyTemplate:WikiProject TechnologyTechnology

Not Open Source

The license has usage restrictions, and therefore does not meet the Open Source Definition (OSD):

https://opensource.org/faq#restrict
https://stability.ai/blog/stable-diffusion-public-release

Nor is the "Creative ML OpenRAIL-M" license OSI-approved:

https://opensource.org/licenses/alphabetical

It would be correct to refer to it as "source available" or perhaps "ethical source", but it certainly isn't Open Source.

Gladrim (talk) 12:40, 7 September 2022 (UTC)

This is my understanding as well, and I thought about editing this article to reflect this. However I'm not sure how to do this in a way that is compliant with WP:NOR, as the Stability press release clearly states that the model is open source and I have been unable to find a WP:RS that clearly contradicts that specific claim. The obvious solution is to say "Stability claims it is open source" but even that doesn't seem appropriate given the lack of sourcing saying anything else (after all, the explicit purpose of that language is to cast implicit doubt on the claim). I have a relatively weak understanding of Misplaced Pages policy and would be more than happy if someone can point to evidence that correcting this claim would be consistent with Misplaced Pages policy, but at the current moment I don't see a way to justify it.
It's also worth noting that the OSI-approved list hasn't been updated since Stable Diffusion came out, and SD is the first model to be released with this license as far as I can tell. Thus the lack of endorsement is not evidence of non-endorsement. Perhaps we could say "Stability claims it is open source, though OSI has not commented on the novel license" (this is poorly worded but you get my point)
Stellaathena (talk) 17:41, 7 September 2022 (UTC)
According to the license which is adapted from the Open RAIL-M(Responsible AI Licenses) which the 'M' means the usage restrictions only applies to the published Model or derivative of the Model, not source code.
Open RAIL has various types of licenses available: RAIL-D(Use restriction applies only to the Data), RAIL-A(Use restriction applies only to the application/executable), RAIL-m(Use restriction applies only to the Model), RAIL-S(Use restriction applies only to the Source code) and it can combined in D-A-M-S order e.g. RAIL-DAMS, RAIL-MS, RAIL-AM
The term 'Open' can be added to the licenses to clarify the license is royalty-free and the works/subsequent derivative works can be re-licensed 'as long as the Use Restrictions similarly apply to the relicensed artifacts'
"
Open RAIL Licenses
Does a RAIL License include open-access/free-use terms, akin to what is used with open source software?
If it does, it would be helpful for the community to know upfront that the license promotes free use and re-distribution of the applicable artifact, albeit subject to Use Restrictions. We suggest the use of the prefix "Open" to each RAIL license to clarify, on its face, that the licensor offers the licensed artifact at no charge and allows licensees to re-license such artifact or any subsequent derivative works as they choose, as long as the Use Restrictions similarly apply to the relicensed artifacts and its subsequent derivatives. A RAIL license that does not offer the artifact royalty-free and/or does not permit downstream licensing of the artifact or derivative versions of it in any form would not use the “Open” prefix." source
so technically the source code is 'Open Source'
Maybe a useful link:
https://huggingface.co/blog/open_rail
https://www.licenses.ai/ai-licenses
https://www.licenses.ai/blog/2022/8/26/bigscience-open-rail-m-license Dorakuthelekor (talk) 23:04, 17 September 2022 (UTC)
It is definitely not open source, and to describe it that way is misleading. Ichiji (talk) 15:56, 3 October 2022 (UTC)

Gallery of examples?

The Japanese language page has a gallery of various examples that Stable Diffusion can create, perhaps we should do the same to showcase a few examples for people to see. I'd be curious to hear others weigh in. Camdoodlebop (talk) 00:57, 11 September 2022 (UTC)

The built-in Batch, Matrix and XY plot functions are great for this. Please feel free to use this example for the Img2Img section to explain parameters: https://i.imgur.com/I6I4AGu.jpeg Here I've used an original photo of a dirty bathroom window and transformed it using the prompt "(jean-michel basquiat) painting" using various CFG and denoising 73.28.226.42 (talk) 16:42, 8 October 2022 (UTC)

Image variety

Benlisquare, I appreciate all the work you've done expanding this article, including the addition of images, but I think the article would be improved if we could get a greater variety of subject matter in the examples. To be honest, I think any amount of "cute anime girl with eye-popping cleavage" content has the potential to raise the hackles of readers who are sensitive to the well-known biases of Misplaced Pages's editorship, so it might be better to avoid that minefield altogether. At the very least though, we should strive for variety.

I was thinking about maybe replacing the inpainting example with figure 12 from the latent diffusion paper, but that's not totally ideal since it's technically not the output of Stable Diffusion itself (but rather a model trained by LMU researchers under very similar conditions, though I think with slightly fewer parameters). Colin M (talk) 21:49, 28 September 2022 (UTC)

My rationale for leaving the examples as-is is threefold:
  1. Firstly, based on my completely anecdotal and non-scientific experimentation from generating over 9,500+ images (approx. 11GB+) of images using SD at least, non-photorealistic images play best with the ability for img2img to upscale images and fill in tiny, closer details without the final result appearing too uncanny for the human eye, which is why I opted for working with generating a non-photorealistic image of a person for my inpainting/outpainting example. Sure, we theoretically could leave all our demonstration examples as 512x512 images (akin to how the majority of example images throughout that paper were small squares), but my spicy and highly subjective take on this is, why not strive for better? If we can generate high detail, high resolution images, then I may as well should. The technology exists, the software exists, the means to go above and beyond exists. At least, that's how I feel.
  2. Specifically regarding figure 12 from that paper, it makes no mention as to whether or not the original inpainted images are generated through txt2img which were then inpainted using img2img, or whether they used img2img to inpaint an existing real-world photograph. If it is the latter, then we'd run into issues concerning Commons:Derivative works. At least with all of the txt2img images that I generate, I can guarantee that there wouldn't be any concern in this area, as long as I don't outright prompt to generate a copyrighted object like the Eiffel Tower or Duke Nukem or something.
  3. Finally, I don't particularly think the systemic bias issue on this page is that severe. Out of the four images currently on this article, we have a photorealistic image of an astronaut, an architectural diagram, and two demonstration images containing artworks featuring non-photorealistic women. From my perspective, I don't think that's at the point of concern. Of course, if you still express concern in spite of my assurances, give me time I could generate another 10+ row array of different txt2img prompts featuring a different subject, but it'll definitely take me quite some time to finetune and perfect to a reasonable standard (given the unpredictability of txt2img outputs). As a sidenote, the original 13-row array I generated was over 300MB+ with dimensions of 14336 x 26624 pixels, and the filesize limit for uploading to Commons was 100MB, hence why I needed to split the image into four parts.
Let me know of your thoughts, @Colin M. Cheers, --benlisquareTCE 03:08, 29 September 2022 (UTC)
Actually, now that I think about it, would you be keen on a compromise where I generate a fifth image, either containing a landscape, or an object, or a man, to demonstrate how negative prompting works, as a counterbalance to the images already present? The final result would be something like this: Astronaut in the infobox, diagram under "Architecture", the 13-row matrix comparing art styles under "Usage" (right-align), some nature landscape or urban skyline image under "Text to image generation" (left-align), the inpainting/outpainting demonstration under "Inpainting and outpainting" (right-align). I'm open to adjustments if suggested, of course. --benlisquareTCE 03:33, 29 September 2022 (UTC)
Regarding your point 1:
  1. I don't think we're obliged to carefully curate prompts and outputs that give the nicest possible results. We're trying to document the actual capabilities of the model, not advertise it. Seeing the ways that the model fails to generate photorealistic faces, for example, could be very helpful to the reader's understanding.
  2. Even if we accept the reasoning of your point 1, that's merely an argument for showing examples in a non-photorealistic style. But why specifically non-photorealistic images of sexualized young women? Why not cartoonish images of old women, or sharks, or clocktowers, or literally anything else? It's distracting and borderline WP:GRATUITOUS.
Colin M (talk) 04:19, 29 September 2022 (UTC)
Creating the inpainting example took me quite a few hours worth of trial-and-error, given that for any satisfactory img2img output obtained one would need to cherrypick through dozens upon dozens of poor quality images with deformities and physical mutations, so I hope you can understand why I might be a bit hesitant with replacing it. Yes, I'm aware that's not a valid argument for keeping or not keeping something, I'm merely providing my viewpoint. As for WP:GRATUITOUS, I don't think that particularly applies, the subject looks like any other youthful woman one would find on the street in inner-city Melbourne during office hours, but I can understand the concern that it may reflect poorly on the systemic bias of Misplaced Pages's editorbase. Hence, my suggested solution to that issue would be to balance it out with more content, since there's always room for prose and image expansion. --benlisquareTCE 06:01, 29 September 2022 (UTC)
I've gone ahead and added the landscape art demonstration for negative prompting to the article. When generating these, this time I've specifically left in a couple of visual defects (e.g. roof tiles appearing out of nowhere from inside a tree, and strange squiggles appearing on the sides of some columns), because what you mentioned earlier about also showcasing Stable Diffusion's flaws and imperfections does also make sense. There are two potential ways we can layout these, at least with the current amount of text prose we have (which optimistically would increase, one would hope), between this revision and this revision which would seem more preferable? --benlisquareTCE 06:05, 29 September 2022 (UTC)
+1 on avoiding the exploitive images. The history of AI is rife with them, let's not add to that. Ichiji (talk) 15:59, 3 October 2022 (UTC)
I agree with Ichiji that editors should not be adding "exploitive images". I also agree with Colin M above in questioning why editors would be adding "images of sexualized young women" to this article." And I agree with Ovinus below that "we shouldn't have a preponderance of pictures appealing to the male gaze." In a moment I will be removing four unsourced, unnecessary, user-generated images created with "Prompt: busty young girl ..." We are not obligated to host anyone's "busty young girl" image collection. PLEASE NOTE: We don't need four editors to take over a week to disagree with someone spamming into this article 95 user-generated images from their "busty young girl" image collection. The obligation to provide sources and gain consensus is on the editor who wants their content included. An editor adding 90+ unsourced, user-generated images to an article is obvious spam and can be just removed on sight. Elspea756 (talk) 17:09, 8 October 2022 (UTC)
Nonsense. The consensus takes time and thorough discussion, there is no WP:DEADLINE to rush anything through without dialogue. Also, consider reading WP:SPAM to see what spam actually is. Your allegations are bordering upon personal attacks here. --benlisquareTCE 18:22, 8 October 2022 (UTC)
If certain types of images are characterisic of the usage of AI in general or this paricular program in particular, why should this artcle pretend otherwise? Of course, it would be ideal if they were published in some RS first, but this is to be balanced with other concerns, like permissive licenses. See Visual novel article illustrations, for example. Smeagol 17 (talk) 10:12, 16 October 2022 (UTC)
So, how about returning at least Inpainting and Outpainting images? They were very illustrative. Smeagol 17 (talk) 23:43, 11 November 2022 (UTC)

Using images to promote unsourced opinions

I've removed two different versions of editors trying to use images to promote unsourced legal opinions and other viewpoints. Please, just use reliable sources that support that these images illustrate these the opinions, if those sources exist. You can't just place an image and claim that it illustrates an unsourced opinion. Thanks. Elspea756 (talk) 15:54, 6 October 2022 (UTC)

"But Stable Diffusion’s lack of safeguards compared to systems like DALL-E 2 poses tricky ethical questions for the AI community. Even if the results aren’t perfectly convincing yet, making fake images of public figures opens a large can of worms." - TechCrunch. "And three weeks ago, a start-up named Stable AI released a program called Stable Diffusion. The AI image-generator is an open-source program that, unlike some rivals, places few limits on the images people can create, leading critics to say it can be used for scams, political disinformation and privacy violations." - Washington Post. I don't know what additional convincing you need. As for your edit summary of "Removing unsourced claim that it is a "common concern" that this particular image might mislead people to believe this is an actual photograph of Vladimir Putin", nowhere in the caption was that ever mentioned, that's purely your own personal interpretation that completely misses the mark of what the caption meant. --benlisquareTCE 16:02, 6 October 2022 (UTC)
Thank you for discussing here on the talking page. I see images of Barack Obama and Boris Johnson included in that Tech Crunch article, so those do seem to illustrate the point you are trying to make and are supported by the source you are citing. Can we agree to replace the previously used unsourced image with either that Barack Obama image or series of Boris Johnson images? Elspea756 (talk) 16:06, 6 October 2022 (UTC)
That would not be suitable, because those images of Boris Johnson and Barack Obama are copyrighted by whoever created those images in Stable Diffusion and added those to the TechCrunch article. Per the WP:IUP and WP:NFCC policies, we do not use non-free images if a free-licence image is already available. A free licence image is available, because I literally made one, and released it under a Creative Commons licence. --benlisquareTCE 16:09, 6 October 2022 (UTC)
OK, now I understand why you feel so strongly about this image, it's because as you say you "literally made" an image and now you want to include your image in this wikipedia article. I hope you can understand you are not a neutral editor when it comes to decisions about this image you "literally made", that you have a conflict of interest here, and shouldn't be spamming your image you made into this article. Your image you are spamming into this article does not accurately illustrate the topic, so it should be removed. Elspea756 (talk) 16:15, 6 October 2022 (UTC)
It's your perogative to gain WP:CONSENSUS for your revert, given that you are the reverting party. If you can convince myself, and the wider community of editors, that your revert is justified, then I will by all means happily agree with your revert. --benlisquareTCE 16:17, 6 October 2022 (UTC)
Nope, it is your obligation to provide sources and gain consensus for your "image made literally 27 minutes ago ffs." We have no obligation to host your "image made literally 27 minutes ago ffs." Elspea756 (talk) 16:21, 6 October 2022 (UTC)
Point to me the Misplaced Pages policy that says this. Almost all image content on Misplaced Pages is user self-created, anyway; your idea that Misplaced Pages editors cannot upload their own files to expand articles is completely nonsensical. All of your arguments have not been grounded in any form of Misplaced Pages policy; rather, they are exclusively grounded in subjective opinion, and a misunderstanding of how Misplaced Pages works. "We" "Your" - my brother in Christ, you joined Misplaced Pages on 2021-06-14, it's wholly inappropriate for you to be condescending as if you were the exclusive in-group participant here. --benlisquareTCE 16:23, 6 October 2022 (UTC)
As, I've said you have a very clear conflict of interest here. It is very evident from your language choices here, writing "ffs," "Christ," etc, that you not a neutral editor and that you feel very strongly about spamming your unsourced, user generated content here. I understand very clearly where you are coming from now There is no need for you to continually restate your opinions with further escalating profanity. Elspea756 (talk) 16:34, 6 October 2022 (UTC)
I totally agree with Elspea756 and removed some images. This is clearly original research; while it is reasonable to be more lax with WP:OR as it applies to images, WP:OI (quite reasonably) states: so long as they do not illustrate or introduce unpublished ideas or arguments. Commentary on the differences between specific pictures is very different than something like File:Phospholipids aqueous solution structures.svg, which is inspired by existing diagrams and does not introduce "unpublished ideas". Yes, the idea that "AI images are dependent on the input", is published; no, no one independent has analyzed these specific pictures. Also, using AI-generated art with prompts asking to emulate specific artists' styles is not only blatantly unethical, but also potentially a copyright violation; that it is legally acceptable is not yet established. Finally, we shouldn't have a preponderance of pictures appealing to the male gaze. There are thousands, millions of potential subjects, and there is nothing special about these. Ovinus (talk) 01:47, 8 October 2022 (UTC)
This thread is specifically in reference to the Putin image used in the "Societal impacts" section, however. The disagreement here is whether or not it's appropriate to use the Putin image to illustrate the ethics concerns raised in the TechCrunch article; my position is that we cannot use the Boris Johnson image from the TechCrunch article as that would fall afoul of WP:NFCC. As discussed in a previous thread, I had already planned to replace a few of the sample images in the article with ones that are less entwined with the female form and/or male gaze, I just haven't found the time to do so yet, since creating prompts of acceptable quality is more time-consuming than most might actually assume. --benlisquareTCE 02:07, 8 October 2022 (UTC)
I understand, but the images in this article are broadly problematic, not just the Putin image. It's quite arguably a WP:BLP violation, actually. A much less controversial alternative could be chosen; for example, using someone who's been dead for a while. Ovinus (talk) 02:12, 8 October 2022 (UTC)
In that case, that's definitely an easy job to fix. I'll figure out an alternative deceased person in due time. --benlisquareTCE 02:15, 8 October 2022 (UTC)
Thank you, Ovinus, for "total agree"ment that these spam images are a problem and that they are "clearly original research." In a moment I will be once again removing the unsourced, inaccurate image of Vladimir Putin from this article that has been repeatedly spammed into this article. Besides being obvious spam and unsourced original research, it is also a non-neutral political illustration created to express an individual wikipedia editor's personal point of view, and its subject is a living person so this violates our policy on biographies of living persons. Once again, the obligation to provide sources and gain consensus is on the editor who wants their content included. We do not need a week-long discussion before removing an unsourced user-generated spam image expressing a personal political viewpoint about a living person. Elspea756 (talk) 13:22, 13 October 2022 (UTC)
I wouldn't call it spam. Benlisquare is clearly here in good faith. Ovinus (talk) 14:42, 13 October 2022 (UTC)

External links

The AUTOMATIC1111 fork of Stable Diffusion is indubitably the most popular client for Stable Diffusion. It should definitely have its place in the external links section. Thoughts? Leszek.hanusz (talk) 16:26, 7 October 2022 (UTC)

Reddit comments aren't reliable for anything, and Misplaced Pages is WP:NOT a link directory. We should not be providing links to clients at all. MrOllie (talk) 16:30, 7 October 2022 (UTC)
This is just one metric, it has more than 7K stars on GitHub, what more do you want? Do you actually use Stable Diffusion yourself? It is now THE reference. Leszek.hanusz (talk) 16:37, 7 October 2022 (UTC)
GitHub stars (or any form of social media likes) are also indicative of precisely nothing. What I want is that you do not advertise on wikipedia by adding external links to your own project. - MrOllie (talk) 16:38, 7 October 2022 (UTC)
Automatic1111 is not my own project, it has nothing to do with me. http://diffusionui.com is my own project and I agree it should not be in the external links. Leszek.hanusz (talk) 16:51, 7 October 2022 (UTC)
I agree with MrOllie. Nothing here (reddit comments, GitHub stars) is the type of sourcing that would suggest this should be included in this article. Elspea756 (talk) 16:53, 7 October 2022 (UTC)
I think its evident in that most, nearly all, published/shared prompts for SD use the parentheses/brackets/prompt-editing syntactical sugar, which is a feature exclusively from Automatic1111-webui's version. That should be a good indicator of its popularity if you can't use github stats for some reason. 73.28.226.42 (talk) 13:41, 10 October 2022 (UTC)

Wiki Education assignment: WRIT 340 for Engineers - Fall 2022 - 66826

This article was the subject of a Wiki Education Foundation-supported course assignment, between 22 August 2022 and 2 December 2022. Further details are available on the course page. Student editor(s): Bruhjuice, Aswiki1, Johnheo1128, Kyang454 (article contribs).

— Assignment last updated by 1namesake1 (talk) 23:38, 17 October 2022 (UTC)

Developer(s)

The infobox and article present Stability AI as developer. However, this is incorrect:

- Stable Diffusion is essentially the same approach as the Latent Diffusion Models (LDM) developed by the CompVis group at LMU Munich and a coauthor from Runway. Patrick Esser, one of the license holders of Stable Diffusion (https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) „When we wrote that paper we showed that it actually works, nicely! Then it was like - can we scale this up? And that led us to Stable Diffusion. It is really the same model, slight changes but not too essential. Just on a bigger scale in terms of our resources.“, https://research.runwayml.com/the-research-origins-of-stable-difussion

Comparing the code from the LDM and SD github confirms this. Moreover, the depictions and explanations of the approach on this Misplaced Pages article are all discussing the ideas of the original approach.

- Both the source code as well as the models had so far been released by the CompVis group at LMU Munich (https://github.com/CompVis/stable-diffusion). The license is issued on their Github (https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) by the original authors of https://arxiv.org/abs/2112.10752, Rombach et al.

- The CompVis Github lists the contribution of Stability AI as providing funding for compute, stating that “Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database.“

- Cristobal Valenzuela (CEO of Runway): “This version of Stable Diffusion is a continuation of the original High-Resolution Image Synthesis with Latent Diffusion Models work that we created and published (now more commonly referred to as Stable Diffusion) we thank Stability AI for the compute donation to retrain the original model“, https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/1

- The github and huggingface of CompVis, Runway, etc. cite https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

as the reference to the Stable Diffusion approach.


TL;DR

The approach was actually developed by the CompVis group at LMU Munich (leading authors) + a coauthor from Runway. The references above show that the team then made only minor modifications to retrain essentially their original approach on a larger dataset. For this retraining, Stability AI donated the compute on AWS servers. However, donating compute for a model built by another research team does not make them the (sole) developer. Instead, all repositories are crediting the original authors. 89.206.112.10 (talk) 19:40, 21 October 2022 (UTC)

Regarding Square Brackets as Negative Prompt

This is my first time contributing to a discussion, so please be understanding if I'm not following etiquette properly. I read the talk guidelines but it assumes a rather high level of familiarity of these systems, which I do not have.

In any case, I just wanted to start a discussion regarding the interpretation of citation 16. If I am reading the source material correctly, brackets around a keyword actually creates emphasis around the keyword rather than a negative correlation. It states: "The result of a qauntitative analysis is that square brackets in combination with the aforementioned prompt have a small but statistically significant positive effect. No effect or a negative effect can be excluded with 99.98% confidence." It goes on to state that with very specific prompt engineering square brackets can be used to create an inconsistent and negligible effect.

It then discusses exclamation points and that they do seem to have some negative effect on reducing the appearance of certain keywords in images. Since I am new to both contributing to Misplaced Pages and to Stable Diffusion I wanted to see if someone smarter than me could confirm my interpretation of the source material before making the corrections to the article. Thank you. — Preceding unsigned comment added by Abidedadude (talkcontribs) 19:58, 30 October 2022 (UTC)

By design (as mentioned here, here and here as examples; note these are just for use as examples, I'd strongly recommend not citing them in any serious work), parentheses, for example (Tall) man with (((brown))) hair, increases emphasis, while square brackets, for example car with ]], decreases emphasis. Gaessler's findings suggest that while attempting to decrease the occurrence of something via architecture, it actually has a "small but statistically positive effect" yet also "not perceptible to humans" based on his data and methodology; meanwhile, the use of rainy!! weather to emphasise (i.e. increase the occurrence of; since ! can only be used for emphasis and not de-emphasis) was not very coherent and resulted in a high chi2/NDF, and the use of negative prompts to decrease the occurrence of keywords resulted in a highly statistically significant change in outcome. --benlisquareTCE 02:31, 31 October 2022 (UTC)
I probably should also point out that some Stable Diffusion GUI implementations might use different formatting rules for emphasis; for example, NovelAI (which is a custom-modified, anime/furry-oriented model checkpoint of Stable Diffusion hosted on a SaaS cloud service) uses curly brackets, for example {rusty} tractor, for positive emphasis instead of parentheses. Not all Stable Diffusion implementations will process prompt formatting in the same way. --benlisquareTCE 02:56, 31 October 2022 (UTC)
X/Y plot demonstrating how deemphasis and emphasis markers work

In case it satisfies your curiosity, I've just generated this X/Y plot in Stable Diffusion to demonstrate to you how (emphasis) prompting works in practice. In my personal opinion, you can barely see any visual difference at all among the , so usually I don't see much point in using them at all while prompting. Of course, all of this is original research, so this is just a FYI explanation, nothing more and nothing less. --benlisquareTCE 03:48, 31 October 2022 (UTC)

That's definitely interesting. I do like experimenting with prompts and learning about other people's experiences, you're saying that your info is based on NovelAI's custom implementation, yes? If so, perhaps it would be better to put the emphasis/de-emphasis info in the article about NovelAI? Because it seems the default checkpoint for Stable Diffusion hasn't been trained in any specific way to handle square brackets, and the citation in question doesn't really seem to support the assertion in the article about emphasis. In any case, perhaps it's all moot if everybody seems to more or less agree that the effect is imperceptible, even if it does exist. Maybe this sort of granular prompt fine tuning just shouldn't be mentioned at all, given that it's all pretty unreliable and results can be unpredictable with any machine learning? As a side note, with regards to anecdotal FYI info, I have been experimenting with JSON to input prompts (with default checkpoints) and the results have been pretty interesting. It's obvious that it hasn't been trained to interpret that in any way, but it really seems to make minor changes to the prompt result in much more significant differences vs natural language in terms of the resulting image. I definitely haven't experienced brackets the way anybody else is describing them, but again, it's all anecdotal. Abidedadude (talk) 07:58, 31 October 2022 (UTC)
No, 100.000000% (9 significant figures) of what I've covered above has nothing to do with NovelAI. I just mentioned in passing that NovelAI uses curly brackets instead of parentheses.given that it's all pretty unreliableThat's precisely what the citation says: that using emphasis markers is less reliable than using negative prompts. And that's what's asserted in the Misplaced Pages article as well: "The use of negative prompts has a highly statistically significant effect... compared to the use of emphasis markers" (bold emphasis mine).it really seems to make minor changes to the prompt result in much more significant differencesYes, that is correct, and it's because even slight adjustments to things like word order, punctuation, and spelling, adds additional noise to the prompt, which will lead to a different output. The model doesn't parse the prompt like a human would, and we see this when big red sedan driving in highway and highway, with red big sedan, driving on it results in different outputs even with the same seed value. --benlisquareTCE 10:54, 31 October 2022 (UTC)
I have to be honest, I'm even more confused now than when we started this discussion, so I'm probably just going to go ahead and bow out at this point. I'm pretty sure your own original research in the example above was intended to show me that negative prompting was less effective than emphasis, but right now you're telling me that the assertion in the article - about negative prompts being more effective - is correct. Even though the original research you conducted is consistent with the citation (which is then inconsistent with the statement in the article). Perhaps it's all a joke of some kind, because the 9 significant digits bit is pretty funny. All that extra emphasis on significance, and yet it doesn't change the end result one bit, much like the topic of discussion, no? Anyway, I did enjoy the discussion, but I'm afraid it's either going over my head or that the contradictions are just becoming too much for me to care about. I thought I was just helping out with a quick fix. I do appreciate you taking the time to engage with me either way. Abidedadude (talk) 17:34, 31 October 2022 (UTC)
An example of one of many different UI implementations of Stable Diffusion. This particular one is built upon the Gradio web frontend library, but there are non-web frontends for Windows and macOS as well. These UI frontends allow the user to interact with the model checkpoint without needing to type commands into a python console, making the barrier to entry easier for new users. All of these UIs have separate text fields for "prompt" and "negative prompt", as seen above. You enter what you want to see in the output image into the "prompt" text field; you enter what you don't want to see in the "negative prompt" field.
My bad, I should definitely be more clearer in my explanation. My first question to you is, are you using a UI frontend while using Stable Diffusion, or are you directly inputting the settings (e.g. sampler steps, CFG, prompts, etc.) into a command-line interface? If you are using a UI frontend, which one are you using? Are you running the model locally on your own computer, or are you using a cloud service via a third-party website?As mentioned in the article, these features are provided by open-source UI implementations of Stable Diffusion, and not the 3.97GB model checkpoint itself. The UI implementation acts as an interface between the user and the model, so that the user doesn't need to punch parameters into a python console window. There are many different open-source UI implementations for Stable Diffusion, including Stable Diffusion UI by cmdr2, Web-based UI for Stable Diffusion by Sygil.Dev, Stable Diffusion web UI by AUTOMATIC1111, InvokeAI Stable Diffusion Toolkit, and stablediffusion-infinity by lkwq007, among others. All of the aforementioned implementations utilise both negative prompting features and emphasis features. In fact, almost every single Stable Diffusion user interface now has these features; it is now the norm, rather than the exception, for Stable Diffusion prompts to feature negative prompting and emphasis marking given that they significantly reduce the quantity of wasteful, low quality generations to sift through; go to any prompt sharing website or Stable Diffusion online discussion thread, and the vast majority of shared prompts will feature negative prompts or emphasis markers, or even both. Since this is a common question raised by someone else above, I should point out it is inappropriate for the Misplaced Pages article itself to list all of these UI implementations, as Misplaced Pages is WP:NOT a repository of external links; the examples I've provided above are just to make sure you have full context on what's going on.Just like how the original CompVis repo provided a collection of python scripts that allow the user to interact with the model checkpoint file (the 3.97GB *.ckpt file that does much of the heavy lifting, so to speak), and those python scripts aren't "part" of the model checkpoint, open-source user interfaces likewise implement their own interface between the user and the 3.97GB *.ckpt; this space has been rapidly evolving and improving over the past few months, mostly as an open-source community driven effort, and the "norm" for the range of configurable settings available to make prompts has shifted considerably since September.If you have any additional questions relating to this topic in particular, or if you would like assistance on how to set up any of the aforementioned UI implementations or how to improve your prompting to obtain better outputs, feel free to let me know. As someone who has generated over 33GB of images though experimentation in Stable Diffusion and is quite passionate in fine-tuning prompts to find the most perfect outputs, I'd be quite glad to help out. --benlisquareTCE 22:20, 31 October 2022 (UTC)
Categories: