nostalgebraist:

IMO, generative ML models are overhyped right now.

(Generative ML models are things like GPT, PaLM, DALLE-2, Imagen, etc. – the models that make what people call “AI generated” content.)

I find myself almost wanting to “short” them, in the financial sense.  Maybe I should?  Like, on prediction markets, or just in personal bets.

—-

These models tend to have a large “impressiveness-to-usefulness gap.”

Remember the GPT-2 staged release?  OpenAI claimed they were worried about people using it to automatically generate disinformation, and things like that.  And other people worried about this too.

But when it finally did get released, even the scary full GPT-2 just … wasn’t very useful for anything?

It was really cool, and did get used for lots of internet art-and/or-comedy projects.

But those are exceptions that prove the rule, because they always hinge on “hey, look at this thing that an ‘AI’ made!”  If you stop caring whether the text was machine-generated or not, and just ask whether it works on its own for some practical purpose – then the magic disappears.

Likewise for GPT-3.  For the amount of buzz it got when it came out, it’s pretty striking how little effect it’s had on the world in the two years since then.  There are a bunch of GPT-3 startups, but unless you use one of their relatively niche products (you probably don’t), none of the text you read in your everyday life was written by GPT-3.  Except the stuff where the point is that it was written by GPT-3.

(And even some of those startups are doing the “hey, an AI made this!” approach, like AI Dungeon.  In OpenAI’s 2021 blog post about GPT-3 apps, one of the two examples they chose to highlight was Fable Studio, a company making AI for VR characters – an application where “wow, this AI is so advanced!” can do a lot of the work.

…and where is Fable Studio now?  They’ve, uh, pivoted to NFTs.  Specifically AI for NFTs, and more specifically a behavioral, non-linguistic kind of AI inspired by The Sims.   “GPT-3 is frustratingly useless” is a core part of their pitch for the new project.)

And now people are freaking out about the idea that DALLE-2 will replacing human artists, and it just feels like the same story again.

The pictures are pretty.  They’re technically competent by human standards,  as GPT-3 output often is.  Yet they’re rarely actually good by human standards: I haven’t seen a single one that would make me click “follow” if it had been posted by a human artist on tumblr.

And as with GPT-3, it’s virtually impossible to tell the system what to create in the fine-grained way you would expect when collaborating with another human.  Mostly, it just makes a competent version of whatever thing it randomly happens to end up making.

—-

Most of the impressiveness-to-usefulness gap is about that last point.  Collaboration is crucial in creative work, and these models can only collaborate with us in a very primitive way.

You can’t impose a house style on them.  You can’t tell them “this is a good start, now here’s what I want in the next draft.”  (Although you kinda can with GPT-3 now, but that raises the meta version of the issue – can you rely on it to respond to your feedback in a predictable way?)

You can’t ask them to make 500 different video game objects that all look the way the objects in your game are supposed to look.  They can only make 500 objects that look convincingly like they’re from some video game.

If models like DALLE-2 could be told “this is a great art style, now hold it fixed,” and then told to generate many pictures in the same style, then IMO they would be much closer to doing the work of human artists.

But it’s hard to do this kind of thing with generative models without spoiling their magic.

The models get their magic by learning from immense amounts of data, mostly scraped from the internet.  What they are good at is imitating the structure of information that already exists ubiquitously on the internet.

When we “tell them to do things,” what we’re really doing is filling in a piece of some kind of information-structure that can be found all over the internet.  When you prompt DALLE-2 or Imagen, you’re saying something like “make a picture that would have this alt-text,” or “make a picture that would have this caption on Instagram.”

The reason you get such coarse-grained control is because real stuff-on-the-internet is only coarsely predictable from its constituent pieces.

And what’s on the internet is simply what it is.  You can’t just go out and tell millions of Instagrammers to write captions that are more informative in some specific way.  If people are making the data with you in mind, the data isn’t big enough to be relevant.

So with something like varying the style and content independently, there are 2 basic approaches:

- Reify “style” as some specific thing about the image that can be computed automatically, as in the style transfer literature.  This works reliably, but loses the magic: you can control “style” very precisely, but at the cost of using a limited, hardcoded notion of “style” that can’t make itself less limited by learning from massive data.

- Use the model’s innate conditioning mechanism, adding “in the style of Salvidor Dali” or whatever to your prompt.  This gets you a much more nuanced, impressive concept of “style” that gets only more so with more data … except that you can’t put anything in that wasn’t in the data to begin with.  You get the information-structures that the Instagrammers (et. al.) gave you, and you get only that, and no more.

I think some narrow versions of the problem will get resolved over time, like the one that is the topic of the Scott-Vitor bet.

But the “Instagram bound,” so to speak, seems like a fundamental limit.  We might overcome it, but only by doing something fundamentally different from anything you’ve seen yet.

  1. whatrottenwork reblogged this from argumate
  2. stainlesssteellocust reblogged this from fipindustries and added:
    P
  3. katalgue said: Still don’t get why people are still in the NFTs since it’s an overhyped ponzi scheme with fancy tech words
  4. katalgue said: Thank you for sharing your thoughts, I’m new to the whole AI neural network generated outputs discussion, so it’s always nice to see what others have to tell
  5. honorarycassowary reblogged this from nostalgebraist
  6. acitymadeofsong reblogged this from nostalgebraist
  7. lowercase-morass reblogged this from nostalgebraist
  8. hozierfucks reblogged this from nostalgebraist
  9. phenoct reblogged this from nostalgebraist
  10. colorfulunderstanding reblogged this from curlicuecal
  11. valentinsylve reblogged this from nostalgebraist
  12. onwardmotley reblogged this from curlicuecal
  13. scarfanon reblogged this from theorigamiphoenix
  14. questbedhead reblogged this from anonymousalchemist
  15. krazybomb reblogged this from theorigamiphoenix
  16. nostalgebraist posted this