[00:00:02.010] - Nathan Wrigley
Welcome to the WP Builds podcast, bringing you the latest news from the WordPress community. Now welcome your host, David Waumsley and Nathan Wrigley. Hello there and welcome once again to the WP Builds podcast. You have reached episode number 314 entitled imagine adding images to your website with AI. It was published on Thursday, the 9 February 2023. My name is Nathan Wrigley and I'll be joined by two guests in a few moments so that we can talk about Imagine, which is a SaaS service, as well as a WordPress plugin for creating AI art. But before we do that, just a couple of bits of housekeeping. The page builder summit is back. We've officially launched the website. This is version five of the page builder summit. It's happening between the 20th and the 24 February. So pretty soon, if you go to Page Builder Summit.com once More pagebuildersummit.com, there are lots of orange buttons entitled Get Your Free Ticket. And if you click that button, well, that's what you're going to do. Get your free ticket. We will keep you updated as the gates open and as the event unfolds during the week. If you go to that page, you can see a whole list of what's happening.
[00:01:32.590] - Nathan Wrigley
You can see information about some of the speakers and what they're talking about. So I would strongly encourage you to join us on that wonderful free event in a couple of weeks time. pagebuildersummit.com. The only other thing to mention is that we have our Masterdon install WP Builds social. A few of you joined up during the last week or so, and so feel free to do that. It's a completely free hosted version of Mastodon, and if you're a bit fed up of Twitter, well, feel free to join us over there.
[00:02:05.050] - Nathan Wrigley
The WP builds. Podcast is brought to you today by GoDaddy Pro. GoDaddy Pro, the home of managed WordPress hosting that includes free domain, SSL and 24/7 support. Bundle that with The Hub by GoDaddy Pro to unlock more free benefits to manage multiple sites in one place, invoice clients and get 30% off new purchases. You can find out more by visiting go.me/wpbuilds. Once more, go.me/wpbuilds. And it is with sincere thanks that I say thank you to GoDaddy Pro for their ongoing support of the WP Builds podcast.
[00:02:49.590] - Nathan Wrigley
Okay, what have we got on the show for you today? Well, it's going to be a fascinating introduction to AI art. We have the founders of Imaginn, which is a SaaS app, as well as a WordPress plugin. The two people are Josh Dailey and Aaron Edwards. They've been in the WordPress space, as you'll hear, for a very long time. And we're talking about their product, which is called Imagine. It's got a quirky spelling, it's im ajinn and it will create AI art for you. So we get into the subject of how that art is created and honestly, it is utterly fascinating how the images pop up. I really didn't understand how the process worked and how it kind of spreads like a ripple on a pond out from a prompt that you give it. It's been trained on 8 billion images and it does a pretty good job. But Aaron and Josh explain how sometimes it can be tripped up and goes in a bit of a weird direction and creates some quirks. We explain how it works, how it puts things into your WordPress media library, how you can tweak images that you think need a bit of adjustment. And of course, we get into the subject of how much it costs to use the service as well, because it uses a lot of computer power.
[00:04:03.040] - Nathan Wrigley
It's a really, really interesting episode and I hope that you enjoy it. I am joined on the podcast today by two two fine gentlemen coming to us from North America. One, I believe is in Arizona and the other, I believe, is in Texas. We're joined by Josh. We are joined by Josh Daley and Aaron, or Aaron Edwards. Which one is it? Aaron? I apologise.
[00:04:29.430] - Aaron Edwards
Depends where you're from. I guess aaron is how I say it.
[00:04:33.670] - Nathan Wrigley
Well, that'll do. I'll go with Aaron in that case. How are you both doing?
[00:04:38.070] - Aaron Edwards
We're great. Thanks for having us. Nathan.
[00:04:40.730] - Nathan Wrigley
Yeah, you're very well.
[00:04:41.760] - Josh Dailey
[00:04:42.930] - Nathan Wrigley
Yeah, it's very nice. Josh has been on the podcast several times. I'm not sure if we've had Aaron on before, but yeah, Josh has been on because if you want to listen to a previous episode with Josh, there's another product which these guys are involved in called Infinite Uploads. So Go and cheque that out. But today we're talking about a new product that they have launched relatively recently. By the time this episode comes out, it'll have probably been on the shelves for a couple of months. It's all about image generation through AI modelling. It's all the hotness at the moment. It's genuinely really interesting and it's called let me try and get the name pronounced correctly. I think I've got it right. It's called Imagine, but it's not how you would imagine how it would be spelled. It's spelled IMA J-I-N. It's on the wordpress.org repo. It is WordPress. Orgplugins. Imaginea. Go and cheque it out. Pause the podcast. Go and cheque it out. And when you've checked it out, you can come back and you'll be far more school than what's going on. A couple of things. First, let's deal with you one at a time.
[00:05:48.350] - Nathan Wrigley
I'm going to go to Josh first, if that's all right. Very briefly, that sort of 1 minute potted history of Josh's life and WordPress. On your marks, get set, go.
[00:05:59.410] - Josh Dailey
Oh, my. This is a challenge, actually. It's not too bad. If you roll back to about 2012. I was doing videos, a lot of content creation for local businesses and small nonprofits in the state of Arizona, and they needed a way to distribute that content and so they asked if I could come up with a way to put it on the web. And that's when I found WordPress actually from Aaron, kind of showing me how to install it on my own server and get that out there. And so then I found out real quick that I needed additional help because of all the security, pharma, hacks, that kind of stuff at the time, and got really involved in the WordPress community after that time frame. Yeah, that's a little bit about me and how I found WordPress and the rest is history.
[00:07:01.260] - Nathan Wrigley
Perfect. Thank you. And you did it in an admirable 58 seconds, which is just on the money. And Aaron. 1 minute. Go.
[00:07:10.650] - Aaron Edwards
Sure. I started with WordPress version 2.6, I think, way back 2008 maybe, and basically I really loved the multisite aspect of it. And I was building a site for myself and joined some online communities and started learning how to develop plugins for it. And then I ended up becoming my full time job when I started working for WPMU Dev, which is still my day job. I'm the CTO now there many years later, and we've grown to quite a big company, so have a lot of experience building different products for WordPress. And in the last few years, me and Josh have been doing some side projects and building some things on our own, starting with Infinite Uploads and Web three WP, and now our Imagine plugin that we just launched.
[00:08:04.730] - Nathan Wrigley
Thank you very much. I don't know whether it's a win, but that was slightly shorter, but neither here nor there. Okay, so Imagine is the focus of today's podcast and it's all about AI image generation. Now, I genuinely think that you have had to have been under a rock for the last couple of years to not be aware that AI is taking over. Well, almost everything. You've got AI models in all sorts of things. We've got WordPress connecting up with things like GPT-3 so that you can create text based posts and titles and all of that with the click of a button. Really kind of trimming down the amount of time that you spend on things. But rewinding a couple of years ago, I was just saying to the guys before we hit record, when people said to me when I got into that conversation that you do from time to time, when you've sort of sitting around and you say, how are computers different from humans? My constant reply was, well, humans can do art and computers cannot do art. They never will be able to do art. It's a function of a human because we've got this unique spark, we've got consciousness, whatever that is, and we can create it.
[00:09:16.480] - Nathan Wrigley
And there's no way that a computer can ever do that. Well, blow me. Two years or so down the road, it turns out I was completely wrong. Because computers, not only can they do art, they can do sublimely good art and they can do it in the amount of time it takes me to basically get out the chair to pick up the brush. They can create multiple pieces of incredible art in just split seconds. So I guess I don't know which one of you wants to tackle this question. My first one is more on the technical side of things. It's magical. When you push the bottom, you push the button, you give it some prompts and outcomes. A piece of art. But how on God's green earth does it even do it? So how is the technology built? What's going on in the background when I click that button? What pieces of technology are linked together? And so whoever wants to tackle that, I would really genuinely love to know that's.
[00:10:15.090] - Aaron Edwards
Probably me. Aaron being the technical side of things, it is very mysterious. I think just starting from the beginning is all these AI models are built by collecting a huge amount of data. So usually that's from scraping the internet. So in this case, with the model that we're using, it's an open data set that just was just released called lion, and it has about 5 billion images that were scraped just from all over the internet and then categorised in different categories like watermark, no watermark, adult or not, different things like that. And they take the text that went along with it. So if it had alt text or there's text on the web page that was associated with that image that was scraped, so there's kind of that relation. And then they took that and they used a model called Clip, which basically was a model that I can't remember who generated it first. I think it might have been Google. But basically it's a way of turning an image into a text description. So that's been out for a while and those have gotten better at humans at describing an image using those kind of AI models.
[00:11:29.970] - Aaron Edwards
So they basically took that to expand their data. So it's just kind of building on, building on building on. Each development creates a new innovation. So in this case, so now they have a whole bunch of text and a whole bunch of images that are associated. So then they train the AI, basically saying this colour, this pixel, is associated like a certain amount of weight with this word or this phrase. The thing is different about this is to collect this amount of data and then to train an AI with this amount of data costs a whole lot of money, a whole lot of compute time. So it probably cost a few hundred thousand dollars to build the model that we're using. So we didn't do that ourselves. There was actually an organisation that was funded by some tech millionaire and he basically said, let's try to do a more libertarian approach to AI. Instead of having Google or Facebook, whatever that has all this money and compute power to create these models and keep them private. Let's first create this model and release it publicly of all the images and then let's train it. And they kind of copied the way Dolly did it.
[00:12:43.120] - Aaron Edwards
So daly if you use that, that's the model by OpenAI and that's the one that kind of was announced first and impressed everyone, just seeing the amazing kind of images that could be created through it. But it's still very closed. There's no API that you can use. You have to use their site and they're very restrictive on the things that you can do with it. So they kind of copied some of the ways that they published, how they're going to create this, and they trained their own model, which took hundreds of thousands of dollars in compute time to train. And then very recently, just in August, they've actually publicly released this called the Weights, which is basically just a big file that has all the training weights for all that data set. And that's what made it open, that's what made us able to build this product, basically.
[00:13:36.390] - Nathan Wrigley
So somebody went around and found images on the web which are completely open, freely available, and I think you said did you say 8 billion with a B?
[00:13:49.290] - Aaron Edwards
Yeah, I think the original data set is 8 billion. They filtered it down to like 5 billion of the most quality like images and filtering out bad stuff as much as possible before they actually did the training.
[00:14:02.930] - Nathan Wrigley
Okay, model. And then the intention of the AI is to so as an example, if there was an image with a cat in it, and there was alt text in the image, for example, that said there's a cat in this image, or something along those lines, and then the AI is then presented with countless other images with cats. It learns that, okay, this is kind of like the shape and the look and the texture, if you like, of a cat. Am I on kind of the right lines? Which then could be extrapolated for a cow and a human and a car and a road and a dog and a cloud and whatever noun we might pick.
[00:14:39.350] - Aaron Edwards
Yeah, it doesn't really understand what it is. It just understands that once it creates this pixel from random, starting with random noise, then the next pixel next to it, what is for this given set of text, what is this most likely colour, basically. And it basically does that and it starts at very blurry and then it just runs it through a whole bunch of iterations and steps until the picture becomes clear and clearer based on the original random noise that it started with.
[00:15:07.000] - Nathan Wrigley
Okay, so if we were to watch the AI in progress, and I know that it's in a fraction of a second, so we'd be slicing time into thousandths of a second, we would start with a random noise picture just almost like the background universe noise. Just garbage. Just exactly static of the yeah. And then slowly but surely, almost as if you were sort of slowly opening your eyes, things would start to appear and pixels would be generated. And then it would make decisions about what the adjacent pixels would be based upon. It was taught previously and then again, if you could just follow through that journey, the slice of thousands of seconds, you would see it take shape and other particles. And there's a human appearing over there, and there's a cow appearing over there. And look, now we can see a cloud appearing until finally it says, I'm done. That's it. That's as much as I can do.
[00:16:07.990] - Aaron Edwards
Yeah. And there isn't really any done. It's just a matter of how much time you want to wait and compute power before it gets a usable picture. There's always a trade off of how much money you want to spend, basically, compared to what's good enough for a human.
[00:16:25.070] - Nathan Wrigley
Okay, we've just said all of that. I'm just utterly flabbergasted. I don't know if he used that word. I'm completely flabbergasted that any of this works. I mean, it's completely remarkable. But the intention with plugins like imagine though, is that I have to in a sense, the AI has been taught by the metadata and the images it's been given. But in order to create an image, I then have to feed it some seed text. And I'm guessing it is text. So I have to say to your plug in, okay, I would like a variety of pictures with, I don't know, sheep and wheelchairs and computer monitors and headphones. And then it just gets to work and it decides where they should be. But there's no defined structure. The headphones could be top left or bottom right or in the middle or large or small or blue or anything goes. There's no rule set which is going to dictate what it will look like once it's finished. It's not a formula. I couldn't predict the output from the input.
[00:17:35.650] - Aaron Edwards
Right. Not perfectly, but it's obviously following the weights that it's getting from the text that you enter.
[00:17:42.800] - Nathan Wrigley
[00:17:43.600] - Aaron Edwards
It actually is becoming a new art form in itself in the fact that we call it a prompt, which is the text description of what you want to generate. It's actually quite the learned skill to figure out how to prompt it in such a way that it generates what you want.
[00:18:02.600] - Nathan Wrigley
So akin to how when everybody started using search engines, they had to adapt the language to get the best out of Google because you just figure out over time that, okay, I don't need a whole load of, oh, I don't know, there's just all sorts of words in a sentence that are superfluous. They don't actually make the search better. You don't need the word it and ah and so on. You just need the important language. If you like, is it a bit like that?
[00:18:29.650] - Aaron Edwards
Yeah, that's a great example, because, for example, in the past, if you asked, what is corn? Or something like that, well, you're not going to find anywhere on the web where someone actually wrote, what is corn? And then gives the answer. They just write about corn. So Google had to adapt its algorithms and stuff to actually adapt it to more the way people search to return the right information that they want. So right now, we're kind of in those early stages to where if you just ask, create a picture of this or whatever, it's not going to do what you expect. So there's a bit of an art form of knowing okay. How to request what you want and what styles you want and different things like that to generate the art that you want.
[00:19:15.220] - Nathan Wrigley
Yeah. This is utterly mind blowing.
[00:19:18.200] - Josh Dailey
Yeah, I think that the Google analogy is really quite interesting, because you think about how SEO has improved over time because people were trying to game it over and over and over, and Google itself got smarter, the way that the algorithm was written, the way that the data was being taken in. And now there's a lot less, even an attempt at, say, gaming that system and more. So just kind of what is the end user experience? What's the best end user experience that we can give? And so we just need to be clear. We need to be concise. We need to be right about how we're writing. And I think AI is going to have that a similar journey that it has to go on to, where in these early days, stuff might look different or is more confusing, but there's some already advances in what that looks like. Every day, Aaron's sending over some kind of crazy new innovation around this stuff, and we're thinking, how do we integrate this into what we're doing and stuff? But a lot of times the limitations is either compute power or it's just like Aaron's of the learning curve around what it takes for somebody to be able to appropriately communicate with the tool.
[00:20:39.550] - Nathan Wrigley
Yeah, I guess over time, the humans will become better at just for themselves, judging what the output is like, okay, I tried writing it this way, and that's really a poor experience. I didn't enjoy or didn't really like the aesthetic of what I got out, whereas when I tried it this way yeah, that's that's really interesting. On the wordpress.org repo, you mentioned that there are tools, and you mentioned three, which people may have heard of, and these are other engines which can do this same kind of work. You mentioned Darley, which I think people have. That's probably the most likely one that people have heard of. There's another one called Mid Journey, which, to my understanding, it's more into creating pieces of artwork. So the kind of stuff you might see in a kind of high end gallery. And then there's one called Stable Diffusion, which, if memory serves, is soon to be available on desktop computers and things like that as well. What's the one that you're using called? Is it one of those three, the engine that you're using, or is it a different one?
[00:21:40.110] - Aaron Edwards
Yeah, so we're using the Stable Diffusion model. That's the only model that's been released publicly. And, in fact, Mid Journey itself, most people would believe that they're actually using Stable Diffusion as kind of the underlining model, some version of that, with just some customization to make the output more beautiful or more artistic. So that's what you're using, stable Diffusion.
[00:22:09.550] - Nathan Wrigley
Yeah. Thank you. I remember seeing somebody post a picture of Mid Journey, in fact, a whole variety of pictures of Mid Journey. This was probably going back about two months now, so not that long ago. And I was literally speechless. I was seeing things which, as far as I was concerned and I made the sort of slightly glib analogy earlier, that computers will never be able to make art. Well, having looked at this stuff, genuinely was caught between, no, this has to be sorry, not a computer, this has to have been done by a human. And of course, you read the article and you realise that not only was it not done by a human, but it was done in fractions of a second with very little input. It was just told to get on with it, and a few seconds later, there it is. But also I detected something which I didn't anticipate, which was kind of artistry. I expected it to be in the same way that you expect a computer like Data out of Star Trek to be this sort of automaton who doesn't have much creativity and is bound by the constraints of what he's been taught, and he can only binary, basically.
[00:23:16.700] - Nathan Wrigley
But I was looking at images and I thought, wow, there's real artistry there. There's, like, subtle things going on, like the hair is being blown in a certain direction. And in this case, there was a picture of a lady standing in what looked like some sort of Tolkien esque land with a great big moon in the background. And she was standing there and the wind was blowing her clothing, and it was just phenomenal. And the attention to detail was the thing that took me back. It genuinely looked like somebody had poured hours and hours and hours, days, weeks and so on into it, but sort of drawn up short by the shock of figuring out, okay, no, it was a couple of seconds. It's remarkable. It must take you back as well when you see this stuff.
[00:24:00.270] - Aaron Edwards
[00:24:02.110] - Nathan Wrigley
Yeah. Anything to add, Josh?
[00:24:06.110] - Josh Dailey
Yeah, we've spent a lot of time playing with it and using it every time. It's kind of a stunning reality, right? You're sitting there and you type in a few words, and then it comes out with something and you're just mind blown every time. And you're like, oh, I got to share this one, too. And I got to share this one, too. So people get tired of your social media real fast because it's always another image where you're like, look at this one.
[00:24:35.670] - Nathan Wrigley
Can you imagine a future where the Stick man drawing is suddenly back in vogue? Because people are just sick and tired of extraordinarily good drawings, they just want stick man. What a really plain, boring drawing again. But yeah, okay, so we've got some kind of impression of how it works. So now we know that there's a computer doing this, and we've got this idea that it's over time, although it's a very small slice of time, it builds up based upon what it's been taught and so on and so forth. But you've obviously implemented this into our favourite CMS WordPress. How does that work? So we've got the Imagine plug in by infinite uploads. Go and look for that. URL will be in the show notes. Like I mentioned, we install the plug in. One of you take on the sort of experience of how the workflow works, what we're doing in the UI and how we get these images into our posts and pages and media library.
[00:25:33.230] - Josh Dailey
So essentially, if you go into the repository and you install the plugin, it's a very seamless experience of getting it set up and activated. You just are able to create your account by putting in your email address and kind of getting ready to roll at it. And we wanted to give people the opportunity to really get a sense of it. So even though the computing costs can get really expensive really fast, we wanted to make sure that everybody had a chance to play around with it, get that sense. But you go in, you just start putting prompts in and it will generate four images at the same time. And we have different variations of those images to help you be able to see what kind of output you could get. But one of the things that I think is important to note when you're playing with it for the first time, we saw that it was hard for people like we had already talked about to understand the prompt work. So we actually added another way for you to summon the genie that will come in and give a better prompt based off of your prompt.
[00:26:55.340] - Josh Dailey
So you can click the button and it will expand out on that prompt. And then you could select the prompt to help you create better images right out of the gate. And if you get an image that you really like and it has some variation that you wanted to do, you can actually then go in and draw over the top of that image to select a different space, like you would in Photoshop or in Canvas or whatever tool you would normally use to edit an image with. You could select out an area, it will mask it, and then you can tell the prompt to just rewrite in that one space. So you could actually edit your image in some way and have another image over the top of that come into place that's going to be based off of the pixels around it.
[00:27:50.550] - Nathan Wrigley
Can I just make sure that josh, just quickly so if I type some text in, there's also an opportunity within the plug in to look at what you're recommending as possible improvements upon that text. Is that what you were mentioning at the beginning a moment ago?
[00:28:10.510] - Josh Dailey
Yeah. So you have your image that's been generated and you go, I actually wanted there to be a cat in this spot of the image. So you could select that got it. Space of the image, add a cat over the sky here and it will use the pixels around it to determine the style and then reinsert what you've asked for, essentially. So you can.
[00:28:47.770] - Nathan Wrigley
It can be an iterative process. So what you're saying is that you can just chuck out four images and then run with the best one, select the bit that you like best and then sort of modify and rinse and repeat if you like. Yeah. That's phenomenal.
[00:29:02.580] - Aaron Edwards
Exactly. Yeah. So we've extended it beyond just enter description and then generate pictures. So you have different steps that you can do to modify that further, whether it's being generating additional variations that follow the same style with the one you like, or whether it's using the touch up tool that he's talking about to paint out a part of the image and adjust it. For example, I had a picture of a queen and she wasn't wearing a crown. So I painted like the top of her head out and then added crown to the text. So it generated different variations with crowns on her head.
[00:29:38.140] - Nathan Wrigley
So the technology knows that the area that you've painted out, that's the focus for the crown. That's where we want the crown.
[00:29:45.060] - Aaron Edwards
[00:29:45.610] - Nathan Wrigley
And if you wanted top left, sun or bottom right, I don't know anything car. It knows that, okay, we need a car in that bit. We need a crown on that bit. We need a son over there. That's just phenomenal, isn't it? Yeah.
[00:30:00.080] - Aaron Edwards
And that can be really useful because it's not like if you want a picture that has a lot of different objects in it, like say, okay, I want a boy flying a kite on the left, and I want a girl sitting reading a book on the right, and I want whatever it is. So you might only be able to generate you couldn't put those all in the initial prompt and expect them all to show up.
[00:30:21.880] - Nathan Wrigley
[00:30:22.950] - Aaron Edwards
Because it's only good at really drawing one main subject or idea at a time. So you can start with that and then use our touch up. Tool to paint out a section and then add in more features.
[00:30:36.400] - Nathan Wrigley
Basically, can you do the same thing but in reverse? Let's say, for example, that a crown appeared and you wished to not have a crown. Is it possible to sort of highlight things? Well, maybe not highlight, but is there a process for saying, actually, can we take a second pass at this, but this time without the crown? Do you have to do that language, remove crown, or whatever it may be?
[00:30:58.610] - Aaron Edwards
Yeah, it works really good for that. Especially I've used a lot for removing sometimes pictures you generate have you can tell that there's, like a watermark there because it's been trained on, let's say, stock photos that have watermarks. So you might see text that doesn't really make sense if you see it there. So you can just use the touch up tool to select that area and then just change the prompt to plain background or something like that.
[00:31:24.990] - Nathan Wrigley
[00:31:26.170] - Aaron Edwards
And then it erases it pretty well.
[00:31:28.520] - Nathan Wrigley
Okay, so let's imagine that we've got our first picture, we've got these four variations, and we've iterated it a little bit, and we finally come up with the one that we like. What's the process there? Do we then click a button and download it to the media library? Can we immediately insert it into a post? Can we generate it within a post? Further questions might be, do we get to keep all of the variations that we created, even though none of them initially were what we wanted, do we still get ownership of those? So just talk us through how we once we've gone through the process of creating stuff, how do we sort of own them and implement them and put them into our page posts?
[00:32:10.890] - Aaron Edwards
Well, basically, when you see that grid of the image results pop up, there's a save button on each. And if you just click that, then it will actually pass it back to our AI, where we run it through another algorithm that upscales it to a high resolution using a completely different AI model. And then it saves it into your media library with no watermark or anything. And the cool thing about that is it also saves the original prompt you add to it. So you have that forever, like in the alt text or description in your media library. So when you add the image later, it's there.
[00:32:46.730] - Nathan Wrigley
[00:32:47.910] - Aaron Edwards
And then you can also click the Insert button. So if you're using josh didn't mention this, but basically our plug in is a Gutenberg block, but it also works in other editors because we expose it under just the regular media tab in WordPress. And so there's actually a page where you can generate images, save them to your library for use in any page builder or theme. If you're not a Gutenberg fan, but if you're in Gutenberg, you can just click the Insert button and it does the same thing, saving it to your library, upscaling it, and then it inserts it as an image block into your poster page that you're editing at that moment.
[00:33:25.840] - Nathan Wrigley
Okay? So we have to deliberately select one that we like, we click Save, we send it back to you, you make it a better version, strip out watermarks, stick it in the media library, and that all works inside of a block. But if you're not a user of Gutenberg, I don't know, Elementor or Beaver Builder or whatever it may be, there is an interface within the media library. I'm presuming one of the tabs across the top where it'll say Imagine or something like that, and you can go and generate this stuff and save it directly to the media library and go to your Elementor page and insert it that way.
[00:33:59.630] - Aaron Edwards
Yes, it's currently a separate page under Media.
[00:34:03.250] - Josh Dailey
I would just say that the most wild part of all of this and that I still can't wrap my brain around. And Nathan, you and I are like each other. I still get mind blown every time I see this stuff, but it's like the fact that this is all the attribution. You don't have to worry about attribution because it is a one of a kind image every time. And that's what I keep trying to figure out, is like, where is this image coming from? And it's coming from nowhere. It's its own image. And if you're using it in your post or you're using it as a featured image, or you're using it for even your social media or whatever, there's no attribution to give because it doesn't connect back to anybody. So you can use these images wherever you want. You can give them away, you could sell them, you could do whatever you want with them because they're completely yours, owned by you.
[00:35:05.170] - Nathan Wrigley
Okay, so you have ownership of them. No, let me rephrase that. You don't need to worry about using them, but let's say, for example, that somebody came and used the image that you had created through Imagine and they put it on their website. Do you have any copyright ownership yourself of it that you know of? As a person who has paid to have that go through the AI, be put into your media library, do you know if you've got a leg to stand on there, or does it just go into the public domain as soon as it's created and is out there on the web?
[00:35:42.350] - Aaron Edwards
Well, you created it, so you can licence it however you want.
[00:35:45.890] - Nathan Wrigley
[00:35:46.700] - Aaron Edwards
Even though they're generated on our API, you're the one generating them. And the actual licence that we gave you from our API is basically public domain. But it's not public, it's private to you, so you're able to relicense it however you want. You can make it create a common you can just make it completely yours and your copyright that no one's allowed to use.
[00:36:12.230] - Nathan Wrigley
Do you have a sort of cloud setup here. In other words, if I wanted to save a whole bunch of them, but not necessarily to my media library, do you take care of any of that? So like an archive of stuff that I've created in Imagine? I don't know that you do, but I'm just curious about that.
[00:36:28.670] - Aaron Edwards
Yeah, whenever you generate a prompt, it actually gets saved in history. So in the sidebar in the editor, you actually see a history of all your generations and all the variations that created, and we save those in our cloud. And so at any point in time, you can just click load and load those back up into the editor so that you can save them to your media library or edit the prompt or do the touch up feature to adjust them.
[00:36:56.010] - Nathan Wrigley
And is that we see mobiles in our cloud? Well, the words add in for an item are obviously a bit ridiculous, but do you have a time period in which you'll maintain that or you've got a year to download it, or thus far, do you intend to just keep them for all time?
[00:37:12.650] - Aaron Edwards
We don't currently. We might have to put a limit at some point, but it's a great opportunity to plug our other product, Infinite Upgrades.
[00:37:20.550] - Nathan Wrigley
Come on, you took the bait. Well done.
[00:37:24.590] - Aaron Edwards
Which allows you to connect your WordPress media to the cloud so you have infinite storage. And so if you have that active, then when you save it to your library, it just goes straight to the cloud.
[00:37:34.850] - Nathan Wrigley
Exactly. There you go. And that's actually a really genuinely cool, like, add on service, isn't it? The fact that if you are creating loads of these me personally using this service, I'd be creating a couple of dozen a month tops, probably. But if I was a major website, something like TechCrunch, where they're producing dozens and dozens and dozens every single day, the storage of that in the end, especially if you're iterating through it because you want to get to the perfect one, would start to add up. And so having a sort of nice cloud backup with something like Infinite Uploads would be really cool. I'm just curious. One thing that occurs to me is that we all have different constraints about what it is, the dimensions and so on that we want these images to be. We might want them to be small or large or letterbox or, I don't know, portrait, landscape and so on. Are there options to do that at the point of creation? In other words, can I say I just want to see square ones, please? Or 19 by six, whatever it might be. Can I define all of that at the point of creation rather than having to tweak it later on?
[00:38:40.070] - Aaron Edwards
Yeah, we default to square images, but we provide an option for like a wide format or tall format. Two by three or three by two.
[00:38:49.310] - Nathan Wrigley
[00:38:50.490] - Aaron Edwards
The actual AI model was trained on square images, so that gives you the best results. So, for example, if you're generating, say, a tall image and it's like a portrait of a person's face that's very closely zoomed in, then what can happen is, since the AI is trained on a squirrel image, it'll draw, like, the face, but then it kind of loses it's called coherence. It loses track of what it's drawing and it starts adding another face on top of it with two sets of eyes or two mouths or double or like, I drew a portrait of Donald Trump just for fun, and it made like a little mini Trump floating above his head. It was pretty funny, you know that.
[00:39:31.190] - Nathan Wrigley
So, yeah, when the when the dictionary was designed, you know, I can't remember who who it was. Some guy in the UK just put together this dictionary, dr J dr Johnson, I believe his name was. He sort of had this notion that it'd be used for all these highfaluting purposes. And of course, everybody presented with their first dictionary just starts to look up rude words, don't they? You can imagine the sort of fun you could have here, just creating sort of all sort of anarchic things and deliberately trying to trip up the AI to create comedy. And I imagine it would be quite actually quite useful for creating hysterically weird photographs. I've not explored that. Yeah, like the double eyes that you.
[00:40:13.810] - Aaron Edwards
Mentioned, famous or famous people that there's a lot of training data for. You can generate, like, really cool stuff of celebrities or politicians.
[00:40:25.770] - Nathan Wrigley
That's a perfect segue, then, into constraints that might be around this, because obviously celebrities might be one example of it. Perhaps a particular celebrity wouldn't take very kindly to you making images where they've got extra pairs of eyes or their head is distended or whatever it may be. And of course, I'm sure that we can all imagine scenarios where this could be used for all sorts of purposes that really nobody wants to see on the internet. I don't need to develop that anymore. We can all imagine what that might mean. Does the AI take any of that into account? In other words, if I was to go in and deliberately try to create images of this kind of nature, are there guardrails, are there things which make sure that the content that is spat out is to use a word I don't know. The word that's coming into my head is wholesome, but I'm sure you know what I mean.
[00:41:22.600] - Aaron Edwards
Right, well, that's always the challenge with AI. So the original, most popular kind of tool for this image stuff is dolly, which is produced by OpenAI, and they're kind of famously very restrictive, what they allow you to do. And they don't have an API or anything like that, and they're very restrictive of what you can enter in and what kind of output you can generate. And then you have other places like Google. They have their own tool called Image, I think, and they don't even let their employees publish images created with it without specific approval because they're just afraid of all the implications. So we've definitely taken a much more libertarian approach. We have some philtres on the input and the output of what is generated. So both filtering like the prompt that you enter to try to remove adult content or hateful content and then the output the images generated are actually run through another AI that tries to cheque for adult content or violence violent hateful stuff and then it will actually block it and return error if it generates an image that appears to be that. So we do have some guardrails in place, but ultimately I think we've definitely taken a more libertarian approach, as I said.
[00:42:50.580] - Aaron Edwards
And it's more about the licence restrictions that we have. We kind of leave it on the user. So it's not something that's automated that you're just going to allow a bot to go free and do this. It takes human interaction. So it's no different in my mind than Photoshop. Anyone can create the worst images in Photoshop, right, and then publish them. So it's kind of the same thing. It's just another tool and it's up to you ultimately to make sure you follow by our use restrictions and you follow by applicable laws and morality and all that.
[00:43:22.450] - Nathan Wrigley
Okay, so just an interesting point at this moment, if you go to the plug in repo, there is an FAQ section, as there always is at the bottom. And the last of the FAQs is what restrictions are there? And you can see on there that you've obviously given this a lot of thought. There's all sorts of categories. There's probably about 20 or more bullet points about things that you really shouldn't be doing. That leads me to believe sorry, not leads me to believe, it leads me to ask do you find yourselves so you and Josh, are you in any way culpable for what the AI produces? Because it's done on your infrastructure and your hardware and delivered to other people who I don't know. Let's say, for example, that I type in a phrase and I am suddenly shown an image which shocks me to the core. It's really troubling and I can't get over it. And I decide, well, you know what? Those guys imagine they are to blame. They supplied this image to me. Have you had to give that any thought and protect yourself against that?
[00:44:25.270] - Aaron Edwards
I have. Josh, do you have any input?
[00:44:30.310] - Josh Dailey
No. I'm interested to hear what you say about that.
[00:44:34.890] - Aaron Edwards
I think in general I view this whole area as it's just another tool for artists. So just like you can see all these like newspaper articles about people being scared to death when photography was invented that that was going to kill all art and all these kind of things and they thought the same thing when Photoshop was invented, right? Oh, you can edit it and make a picture look real. That's not real. You could put someone's head on a different person's body, whatever it is. So my mind is no different than that. It's just another iteration on technology, enabling artists to create things easier with more powerful tools. So is Photoshop culpable when someone uses that to generate harmful content? I don't think so. I don't think anyone would argue that. And I argue that in the same way, except in our case, at least, since it's a cloud service, we do have legal restrictions that we put on people for what they can generate usage of our service. So we do have a way to try to protect ourselves legally and hopefully help protect the world from it being used in harmful ways.
[00:45:54.130] - Nathan Wrigley
Yeah. Okay. Thank you. A lot of the images that I have seen have been how to describe it fairly whimsical. So this comes from all the different AIS, the Daly and the stable diffusion and the mid journey and so on. And they seem to have this sort of like quality about them, this sort of light, and a lot of them look almost like fantasy, which initially led me to believe that that was really all it was capable of. But more recently, I've been finding a lot of images which are generating what looked to be like real humans and so on. And so I just want to speak about that, really. Is it confined to a particular style? Can it do, I don't know, Monet style and can it do photorealistic style? And I'm really dredging the barrels of my own knowledge here. I don't really know a lot about art, but you get the point. Can it do a whole bunch of different styles? And can you request that style given a text prompt? So, for example, if I include the keyword Monet, am I going to receive things back, that look of that style?
[00:47:01.910] - Josh Dailey
Yeah, I can try to dive at this one a little bit. So yes to that in terms of the types of prompts. And that was one of the things that we wanted to expand people's toolset and creativity with in their minds, was by giving a whole set of prompts. So we have two or three different or three different drop downs where you could select anything from an artist's name to a style type to the medium that it came from. So whether that's acrylic or watercolour or photorealistic or all of those things, so you have different drop downs to help guide through that. But one thing we have noticed is there are styles that it excels at, not whimsical style is one, paintings is another one, sometimes like the really historic looking type paintings that you would see if you were travelling around in different monasteries or castles or whatever. That style type, it flourishes in that space when you start getting into photorealistic, and this is where some of the stuff for how is this going to work out? And blog posts and in the future, if you go at it and you're like, well, I'm going to create a photorealistic stock image of a human, there's usually some kind of deformation that takes place or something doesn't just look right when it's a photo.
[00:48:43.200] - Josh Dailey
When you're trying to replicate a photo, it often doesn't look just right. There's something that looks a little off. You have to definitely human specifically, but other stuff, it seems to work well with, like, technology or a car or something like that. But again, this is like the very beginnings of this technology, and it's super interesting to see some of the more elaborate pieces. And our minds are still grappling with how do we give this type of functionality to people who might have a use case for this? So an example of that is Aaron trained it to himself, upload 20 images on his own, and he can create any image of himself that he wants to in any style. And it does an amazing job, actually. Like, you're sitting there going, like, he could have had to hire somebody to do this. So there's some of that kind of stuff where it's still not affordable to do that for everybody. But it is an interesting technology and something that we're looking at. How could we implement this? What use cases does it make sense for? So for, like, blogs, poetry, I think of people writing short stories, poetry, like, this is a fantastic mode of getting some kind of image up there, if you're using WordPress as your publishing platform.
[00:50:22.840] - Nathan Wrigley
Yeah. So a couple of questions coming from that. The first one is, do you know how to describe it? The defects, I think, was the word that you used with humans. Is that deliberate? Is that a deliberate sort of attempt to make it so that you can't just have a perfectly accurate rendition of a human? Or is that just a byproduct of something? That's the first question.
[00:50:45.860] - Aaron Edwards
No, it's not deliberate. Okay. Part of it is humans, by their nature, are very good at, if you think about it, if you look at pictures of, say, dogs, you won't be able to tell one dog from the other very easily. They all look the same to you.
[00:51:03.110] - Nathan Wrigley
[00:51:03.520] - Aaron Edwards
But our human brain is trained to really recognise faces.
[00:51:07.760] - Nathan Wrigley
[00:51:09.110] - Aaron Edwards
In a photo of face, if there's any kind of small thing that's a little bit off, like, it screams to us, it's called the uncanny valley compared to that concept. The same thing with the AI. So it may generate you try to generate a photorealistic human face and you just know that there's something off. You might not be able to place it.
[00:51:30.720] - Nathan Wrigley
[00:51:31.550] - Aaron Edwards
And and also, it really hates, like, all our fingers and arms and legs as it's drawing those. I think the AI kind of loses tracks. You end up with too many fingers or fingers coming out of weird parts of your body, stuff like that.
[00:51:48.820] - Nathan Wrigley
Oh, of course. That speaks to the whole process that's gone on in the background. It's just doing adjacent pixels, so it doesn't really know where the finger is going to end up where it begins. So get these white banana like the finger drifts off in the wrong direction. Yeah, it's fascinating.
[00:52:05.530] - Aaron Edwards
[00:52:06.020] - Nathan Wrigley
My understanding was that's one of his.
[00:52:07.660] - Aaron Edwards
Biggest weakness with with full body humans, you can do like close up portraits of a human face that look really realistic because then it kind of doesn't lose track as much. But if you're trying to do like a stock photo of people at a business meeting or something like that, then they're all going to look horribly deformed.
[00:52:30.190] - Nathan Wrigley
Limitations still abound. The other question that I was going to ask is so let's say, for example, I'm a brand and I've got a team of graphic designers and over the years we've come up with this colour palette and we've come up with this feel. You've got that Nike feel and you've got that Coke feel. Maybe not in the implementation you've got because it sounds like it's more generic. Maybe you can can you train it to sort of be on brand for you, in other words, so that it always gives you that Coke red or it gives you that kind of feel. And I'm struggling to put it into words, but do you understand what I mean? Can it be trained uniquely so that what you get is different from what everybody else gets?
[00:53:16.190] - Aaron Edwards
Right. Currently it can't be trained, but for example, once you find a style you like or that goes along in your prompts so the text that you're prompting is a red background or whatever it may be. So you could save that style that you're using for all your prompts and just continue with that for different subjects.
[00:53:40.690] - Nathan Wrigley
[00:53:41.080] - Aaron Edwards
That's something that we're looking into is like the ability to save a style that you really love so you can kind of keep that certain style through all your featured images that you're generating or whatever.
[00:53:52.540] - Nathan Wrigley
Yeah, I feel that for brands that would be a really important thing because they spend great deals of money getting people to recognise their brand. And you just know at Christmas, when the Coke ad is 2 seconds in, there's just something about it. You have no idea that it's Coke, but you know it's Coke. There hasn't been nobody's presented the Coke logo, nobody's presented the Coke the drink. You just think this is a Coke ad and you're 2 seconds the polar bears. Yeah.
[00:54:19.780] - Aaron Edwards
Polar bears, yeah.
[00:54:22.100] - Nathan Wrigley
Or Train or Father Christmas. Exactly. Yeah. Anyway, so that was just another thought. Okay. A couple of times you've alluded to the fact that obviously this is done inside a WordPress, but it's shooting the things over back to your infrastructure. We know that computing in the cloud, albeit much cheaper than it ever was, it is not free. So there's a cost burden which you must bear. And in order to make these images, at some point, people were going to have to dip into their pockets. So we're talking about pricing now. How does it work? Lay out the basics of the pricing, whether that's I don't know, can you get like, a plan where you buy certain credits or is it a one time fee? How does it all hang together?
[00:55:02.770] - Aaron Edwards
Yeah, sure. Well, first off, just talk about the computing so we can't run this on your own, like WordPress hosting server. It takes like a $20,000 graphics card to be able to generate these in a decent amount of time. So it's not cheap. It uses some pretty expensive resources. So we have to run that in the cloud. So that's why when you submit your prompt, it actually goes to our API. And then we run it on these super powerful cloud servers that have really expensive graphics cards from Nvidia or Tesla that are designed for running these AI models efficiently. So there definitely is some costs that we have to pass on to users. Currently, we do it as a credit system. So basically we simplified and made it every time that you click Generate and it creates the four images, then that uses one of your credits. So our plans are essentially just based on that, so you can subscribe to how many credits you need. And to get started, we give you ten credits currently for free. So all you do is enter your email address to connect to our cloud within the Imagine Block itself, and then you're immediately connected and you get your free credits.
[00:56:23.500] - Aaron Edwards
With no credit card or anything like that, you can try out the service and generate a bunch of images and try the different features, see if you like it. And then from there, when you run out of credits, you can upgrade to a paid plan, depending on how many credits you need.
[00:56:41.470] - Nathan Wrigley
So we're recording this in late 2022. What are the current pricings per credit?
[00:56:50.610] - Aaron Edwards
So currently the lowest plan is $9 a month, and I believe that is for 25 credits, I think. Okay, so that's 100 images, basically, that you could create.
[00:57:08.630] - Nathan Wrigley
If I submit one credit, I get four images back, and then there's this process where I'd pick one of them, and then that gets scaled up by does that consume other credits or is that part of the bundle that got the first four?
[00:57:19.370] - Aaron Edwards
No, that's part of the bundle you can save or any of the images that you've created as much as you want.
[00:57:28.620] - Nathan Wrigley
[00:57:30.910] - Aaron Edwards
Those credits work across all your websites too, so we don't limit you on the number of WordPress sites you have.
[00:57:36.370] - Nathan Wrigley
[00:57:37.330] - Aaron Edwards
So say you buy a bundle of credits and then you can sign into a thousand WordPress sites and your clients can be generating images as much as they want.
[00:57:45.080] - Nathan Wrigley
Yeah. So the client thing, is there a mechanism? Would they have to go through your account or do you have I don't know, maybe there is already? Or are there plans for like a client version of it? So, in other words, I've got 50 websites. Each of them wants to sign up, but I'm going to sign them up and then distribute their tokens, their credits to them.
[00:58:04.810] - Aaron Edwards
Currently, you would just sign them all up under your account, but as people need it, we'll probably add a budget system. You could say within our control panel, you could say, okay, this site has allowed this many so that your clients don't use up all your credits.
[00:58:23.890] - Nathan Wrigley
I got to say, I find this area absolutely fascinating. And although we haven't really touched on it a little bit scary at the same time, there's the whole argument about whether or not it's going to make it difficult for people in graphic design to continue in that same vein, or whether or not it's just going to create thousands of images, which we can't tell from the real thing. The whole fake news. Did he really do that? I have absolutely no idea. So that's sort of a slight other side of the seesaw a little bit just before the end, but totally fascinating. I must admit, this has got me like my ears completely pricked up. It really does seem like this is going to be here to stay and you've got right in at the beginning. So bravo for pulling this off almost. I think you are probably the first to do this. I could be wrong, maybe there's other plugins, but it feels certainly, from my point of view, it feels like you're the first guys that have come along and offered this. Well done.
[00:59:23.890] - Aaron Edwards
[00:59:24.720] - Josh Dailey
Well, thanks for having us on today and kind of talk through it.
[00:59:29.750] - Nathan Wrigley
Yeah, of course. Where can we find you apart from going to the repo? Are there any Twitter handles you want to drop or email addresses or maybe the Infinite Uploads website or something like that? Just feel free, one at a time to just tell us where we can get in touch.
[00:59:46.170] - Aaron Edwards
Yeah, the Infiniteuploads.com is our website and we have Twitter at infinite. Uploads. But you can follow me. I'm always tweeting about the development and the new stuff that we're doing. And as I play around with this AI stuff, you can find me on Twitter at Ugly Robot Dev.
[01:00:05.490] - Nathan Wrigley
I like that.
[01:00:08.130] - Aaron Edwards
That's actually the name of our parent company is Ugly Robot, and we're building products under that.
[01:00:14.280] - Nathan Wrigley
Okay, thanks. And Josh.
[01:00:15.690] - Josh Dailey
And my twitter is at Joshdaley dailey. And the Infinite Uploads website has all three of our plugins on there. The information about them, the support, and everything else is from there. So infinite. Uploads. Big file. Uploads and imagine now.
[01:00:41.150] - Nathan Wrigley
Thank you very much, Josh and Aaron. Thanks for chatting to us today. I really appreciate it.
[01:00:46.020] - Aaron Edwards
Yeah, thanks for having us.
[01:00:47.570] - Josh Dailey
Yes, fantastic. Thank you.
[01:00:50.050] - Nathan Wrigley
Well, I hope that you enjoyed that. It was absolutely fascinating chatting to both Josh and Aaron about how Imagine creates AI images. If you got through that episode and you weren't utterly flabbergasted by how this technology works, well, I don't understand you. It was completely amazing to me. I really was somewhat humbled in the presence of such amazing technology. You thoroughly enjoyed that. I hope that you did too. If you've got any comments, if you thought it was interesting, if you think the sky is falling in Chicken Little style and this is the end of all things, let us know in the comments on the website. Search for episode number 314. If you think that this is the best thing since sliced bread, well, you can go to the website and tell us that as well.
[01:01:36.110] - Nathan Wrigley
The WP Builds podcast was brought to you today by GoDaddy Pro. GoDaddy Pro, the home of Manage WordPress hosting that includes free domain, SSL and 24/7 support. Bundle that with The Hub by GoDaddy Pro to unlock more free benefits to manage multiple sites in one place, invoice clients and get 30% off new purchases. You can find out more by going to go.me/WPBuilds.And again, sincere thanks to GoDaddy Pro for helping us keep the lights on at the WP Builds podcast.
[01:02:13.050] - Nathan Wrigley
One last quick plug for the Page Builder Summit, pagebuildersummit.com 20th to the 24 February. Go there now, hit that pink button and let us know that you want to be involved in the Summit in the next few weeks. I really would appreciate your attendance. There's lots of good stuff for you to see. Okay, that's it. That's all we've got for you this week. I hope that you stay safe and have a good week coming up. We'll be back on Monday for the this Week in WordPress show and back next Thursday for a chat with David Waumsley and I. So, as I said, stay safe. Bye bye for now. And here comes some cheesy music.