Takeaways from GPT-3, CLIP, and DALL•E

January 2021 (original twitter thread)

Some takeaways from OpenAI's impressive recent progress, including GPT-3, CLIP, and DALL·E:

1) The raw power of dataset design.

These models aren't radically new in their architecture or training algorithm

Instead, their impressive quality is largely due to careful training at scale of existing models on large, diverse datasets that OpenAI designed and collected.

Why does diverse data matter? Robustness.

Can't generalize out-of-domain? You might be able to make most things in-domain by training on the internet

But this power comes w/ a price: the internet has some extremely dark corners (and these datasets have been kept private)

As Shreya Shankar puts it, the "data-ing" is often more important than the modeling

And OpenAI put *painstaking* effort into the data-ing for these models.

2) The promise of "impact teams."

Teams that are moderately large, well-resourced, and laser-focused on an ambitious objective can accomplish a lot

This is hard in academia, not just because of resources but also incentives—academia often doesn't assign credit well to members of large teams

3) Soul-searching for academic researchers?

A lot of people around me are asking: what can I do in my PhD that will matter?

Chris Manning has a useful observation—we don't expect AeroAstro PhD students to build the next airliner

I'm also optimistic here—I think there a lot of opportunity for impact in academia, including advancing:

  • efficiency

  • equity

  • domain-agnosticity

  • safety + alignment

  • evaluation

  • theory

…and many other as-of-yet undiscovered phenomena!

Companies don't want to release these models, both because of genuine safety concerns but also potentially because of their bottom line

Models are locked behind APIs, datasets are kept internal, and the public may only get to see a polished (but restricted) demo + blog post

Limiting API access has safety benefits, but could also be an extra advantage to the well-connected: established researchers or those with large Twitter followings

Even when papers are published, important details are missing (e.g. key details of GPT-3's architecture or the data collection process)

It's becoming increasingly hard to study/improve these methods—just as they're edging closer and closer to widespread productionization.

Ultimately, no one lab can do this alone—

We need smart new frameworks and mutual trust to overcome coordination challenges and ensure positive outcomes for society