Hacker News new | past | comments | ask | show | jobs | submit login
GPT-Neo – Building a GPT-3-sized model, open source and free (eleuther.ai)
725 points by sieste on Jan 18, 2021 | hide | past | favorite | 252 comments



In my experience, the output from GPT-3, DALL-E, et al is similar to what you get from googling the prompt and stitching together snippets from the top results. These transformers are trained on "what was visible to google", which provides the limitation on their utility.

I think of the value proposition of GPT-X as "what would you do with a team of hundreds of people who can solve arbitrary problems only by googling them?". And honestly, not a lot of productive applications come to mind.

This is basically the description of a modern content farm. You give N people a topic, ie "dating advice", and they'll use google to put together different ideas, sentences and paragraphs to produce dozens of articles per day. You could also write very basic code with this, similar to googling a code snippet and combining the results from the first several stackoverflow pages that come up (which, incidentally, is how I program now). After a few more versions, you could probably use GPT to produce fiction that matches the quality of the average self published ebook. And DALL-E can come up with novel images in the same way that a graphic designer can visually merge the google image results for a given query.

One limitation of this theoretical "team of automated googlers" is that what they search for content is cached on the date of the last GPT model update. Right now the big news story is the Jan 6th, 20201 insurrection at the US Capitol. GPT-3 can produce infinite bad articles about politics, but won't be able to say anything about current events in real time.

I generally think that GPT-3 is awesome, and it's a damn shame that "Open"AI couldn't find a way to actually be open. At this point, it seems like a very interesting technology that is still in desperate need of a Killer App


I don't necessarily see the "team of automated googlers" as a fundamental or damning problem with GPT-like approaches. First I think people may have a lot fewer truly original ideas then they are willing to admit. Original thought is sought after and celebrated in arts as a rare commodity. But unlike in arts, where there are almost not constraints, when it comes to science or engineering almost every incremental step is of form Y = Fn(X0,..,Xn) where X0..Xn are widely known and proven to be true. With sufficient logical reasoning and/or experimental data, after numerous peer reviews, we can accept Fn(...) to be a valid transform and Y becomes Xn+1, etc. Before internet or Google one had to go to a library and read books and magazines, or ask other people to find inputs from which new ideas could be synthesized. I think GPT-like stuff is a small step towards automating and speeding up this general synthesis process in the post-Google world.

But if we are looking to replace end-to-end intelligence at scale it's not just about synthesis. We need to also automate the peer review process so that it's bandwidth is matched to increased rate of synthesis. Most good researchers and engineers are able to self-critique their work (and the degree to which they can do that well is really what makes one good IMHO). And then we rely on our colleagues and peers to review our work and form a consensus on its quality. Currently GPT-like systems can easily overwhelm humans with such peer review requests. Even if a model is capable of writing the next great literary work, predicting exactly what happened on Jan 6, or formulating new laws of physics the sheer amount of crap it will produce alongside makes it very unlikely that anyone will notice.


I call it the "Prior-Units" theorem. Given that you are able to articulate an idea useful to many people, there exists prior units of that idea. The only way then to come up with a "new idea", is to come up with an idea useful only to yourself (plenty of those) (or small groups), or translate an old idea to a new language.

The reason for this is that if your adult life consists of just a tiny, tiny, tiny fraction of the total time of all adults, and so if an idea is relevant to more people, odds decrease exponentially that no one thought of it before.

There are always new languages though, so a great strategy is to take old ideas and bring them to new languages. I count new high level, non programming languages as new languages as well.


Art (music, literature, ...) involves satisfaction of constraints. For instance you need to tune your guitar like the rest of the band, write 800 words like the editor told you, tell a story with beginning, middle, and end and hopefully not use the cheap red pigments that were responsible for so many white, blue, and gray flags I saw in December 2001.


"team of automated googlers" where google is baked-in. Google results, and content behind it, changes. Meaning, GPT would have to be updated as well. Could be a cool google feature, a service.


Have you tried conversing with it, after a few lines of setting a proper context? Like two scientist talking or something like that. It can provide very interesting outputs that are not googlable.

Yes, every time you see something that for human obviously doesn't make sense it makes you dismiss it. You would look at that output differently though if you were talking with a child. Just like a child can miss some information making it say something ridiculous it may miss some patterns connections.

But have you ever observed carefully how we connect patterns and make sentences? Our highly sophisticated discussions and reasoning is just pattern matching. Then most prominent patterns ordered in time also known as consciousness.

Watch hackernews comments and look how after somebody used a rare adjective or cluster of words more commenters tend to use it without even paying conscious attention to that.

Long story short, give it a try and see what examples of what people already did with it even in it's limited form.

To me you are looking at an early computer and saying that it's not doing anything that a bunch of people with calculators couldn't do.


No, it provides something that superficially looks interesting. That's a big difference from being actually interesting.


GPT-3 is trained on text prediction, and there's been a lot of commentary about the generation aspect, but some of the applications that excite me most are not necessarily about the generation of text, instead, GPT-3 (and also other language models) create very useful vector representations of natural language as a side effect that can then be used for other tasks with much less data, or with too much extra data. Using the text prediction task as a way to supervise learning this representation without having to create an expensive labelled dataset is very helpful, and not just to language tasks. See for example the CLIP work that came out recently for image classification, using GPT-3 and captions to supervise training. There is other work referred to in that blog post that also exploits captions or descriptions in natural language to help understand images better. More speculatively, being able to use natural language to supervise or give feedback to automated systems that have little to do with NLP seems very very useful.


"More speculatively, being able to use natural language to supervise or give feedback to automated systems that have little to do with NLP seems very very useful."

I agree with this, and it isn't entirely speculative. One of the most useful applications I have seen that goes beyond googling is generating css using natural language. ie, "change the background to blue and put a star at the top of the page". There are heavily sample selected demos of this on twitter right now https://twitter.com/sharifshameem/status/1282676454690451457...

This is definitely practical, though I wouldn't design my corporate website using this. could be useful if you need to make 10 new sites a day for something with seo or domains


> I think of the value proposition of GPT-X as "what would you do with a team of hundreds of people who can solve arbitrary problems only by googling them?". And honestly, not a lot of productive applications come to mind.

Damn, this could replace so many programmers, we're doomed!


*realizing GPT-3 was probably created by programmers who's job really is mostly googling for stackoverflow answers*

#singularity


We were worried that the singularity was going to involve artificial intelligences that could make themselves smarter, and we were underwhelmed when it turned out to be neural networks that started to summarize Stack Overflow neural network tips, to try to optimize themselves, instead.

GPT-∞: still distinguishable from human advice, but it contains a quadrillion parameters and nobody knows how exactly it’s able to tune them.


Curses. We've been found out.


The current problem is that we don’t have a reliable, scalable way to merge in features of knowledge engines that have ontological relationships of entities with generative engines that are good for making more natural looking or sounding qualitative output. There’s certainly research going on to join them together but it’s just not getting the kind of press releases as the generative and pattern recognition stuff that’s much easier comparatively. The whole “General AI Complete” class of problems seems to be ones that are trying to combine multiple areas of more specific AI systems but that’s exactly where more practical problems for the average person arise.


Agreed, but that's because they're hard to integrate together: one is concerned with enumerating over all facts that humans know about (a la Cyc) and the other is concerned with learning those directly from data. Developing feedback systems that combine these two would be quite exciting.


> honestly, not a lot of productive applications come to mind

Not so convincing when you enumerate so many applications yourself.

> but won't be able to say anything about current events

There are variants that use transformer + retrieval, so they got unlimited memory that can be easily extended.


I've mentioned this in another thread, but a GPT-3 that could reliably generate quizbowl questions like the ones on https://www.quizbowlpackets.com would be great in this domain. My experience with it indicates it's no where near being able to do this, though.


Content farms are hardly a productive application.


You missed the forest for the trees. If you got a tool that can use StackOverflow to solve simple programming tasks, or to generally solve any simple task with Google, then you're sitting on a gold mine.


Yes and no.

It may be useful to hire less low skilled employees and keep a few senior ones that take input from machine and decide what to keep and what to throw away. I'm not sure if a senior engineer would be more productive patching up code written by a bot or writing it from scratch. It's going to be a hard sell while you still need human supervisors.

You can't trust a machine that can't reason with code implementation, or even content creation. You need a human to supervise or a better machine.

We already have AI based auto-completion for code, gpt-3 can be useful for that (but at what cost? Storing a huge model on your disk or making a slow / unsafe http request to the cloud?)


> if a senior engineer would be more productive patching up code written by a bot or writing it from scratch.

I have no doubt writing from scratch would win hands down. The main reason we patch wonky legacy code is because it's already running and depended on. If you remove that as a consideration, a senior engineer writing the equivalent code (rather than debugging code generated randomly from Google searches) would -IMO- would be more efficient and produce a higher quality program.


That's a big if though.

GPT-3 is much more interesting autocomplete based on most commonly used patterns than something which figures out that Problem X has a lot of conceptual similarities with Solved Problem Y so it can just reuse the code example with some different variable names.


> And honestly, not a lot of productive applications come to mind

So, can't go into too many details, since I haven't started yet, I'm thinking about mixing a flavor of GPT with DETR for OCR tasks where the model then must predict categorization vectors, the chief difficulty of the task being that it must identify and classify arbitrary length content in the OCR.


I work in a niche sector of the insurance industry. Based on what it can already do, I can see it doing half my job with basically no learning curve for the user. Based on this alone, I could see it reducing headcount in the company and sector by 5%. This is massive when you consider the low margins in the industry and high costs of "skilled" staff.


It's not even commercially available afaik, it's only been made available to some developers who applied for access, through a completely opaque process-- this after soliciting for application ideas from every applicant. So I heartily support finding a way to truly open source it.


Did anyone try to ask GPT-3 something about the insurrection? I can't find any free/open text prompts for GPT-3, this is what GPT-2 has to say:

After Trump lost the 2020 election his supporters stormed the Capitol

> They smashed windows, tore apart the Republican Party's headquarters, destroyed a statue of Ronald Reagan and set fire to the Texas statehouse.

> A month later, after the Democrats took control of the Senate and the House, Trump's supporters stormed their own offices. They attacked the offices of California Democratic Senator Kamala Harris and New York Democratic Representative Joe Crowley. They ransacked offices in the Minnesota and Texas statehouses and burned down the Republican Party headquarters in Tennessee. The damage was estimated at


GPT-3

> The Trump supporters were armed with guns and knives. They were also carrying torches. They were chanting “Trump 2020” and “Build the Wall.” The Trump supporters were also chanting “Lock her up.” But they were referring to Hillary Clinton.


This is hilarious, more please :)


You can play with GPT3 in a custom world at AIdungeon.io The responses are biased towards giving you RPG second person narrative, but the corpus of data, mastery of syntax and more uncertain grasp of events and relationships is all there

Example with the prompt You are Donald Trump. The recent election results have been a disappointment to you.

https://pastebin.com/dSYZypCw

Props for turns of phrases like "Your opponent is a typical liberal. He hails from the right wing of the Democratic Party, but has been trying to appeal to the left to gain more support.", but poor marks for apparently not having grasped how elections work. (There's a joke in there somewhere)

If you don't pick a custom world and your own prompt, you get something more like this:

> You are Donald Trump, a noble living in the kingdom of Larion. You are awakened by a loud noise outside the gate. You look out the window and see a large number of orcish troops on the road outside the castle.

I'd like 'orcish troops' better if I thought it was inspired by media reports of Capitol events rather than a corpus of RPGs.


my god it should be called CNN-2


Or not even googling, but pre-googling. Using its predictive typing in the text box at google.com Because you are giving to something to complete.


>"what would you do with a team of hundreds of people who can solve arbitrary problems only by googling them?"

What would you do with a team of hundreds of people who can instantly access an archive comprising the sum total of digitized human knowledge and use it to solve problems?


We have that now, it's called googling. you could easily hire 100 people to do that job, but you'd have to pay them at least $15/hr now on the US. Say equivalent gpt-3 servers cost a fraction of that. How do you make money with that resource?


Well, they can use it to write text. Not to solve problems directly.


In science, some amazing discoveries are made years or even centuries before some practical applications for them are found. I believe in humanity, sooner or later we'll find some actually useful applications for GPT-X.


It's a wonderful co-writer of fiction, for one. Maybe the better authors wouldn't need it, but as for everyone else -- take a look at https://forums.sufficientvelocity.com/threads/generic-pawn-t..., and compare the first couple of story posts to the last few.

One of the ways in which people get GPT-3 wrong is, they give it a badly worded order and get disappointed when the output is poor.

It doesn't work with orders. It takes a lot of practice to work out what it does well with. It always imitates the input, and it's great at matching style -- and it knows good writing as well as bad, but it can't ever write any better than the person using it. If you want to write a good story with it, you need to already be a good writer.

But it's wonderful at busting writer's block, and at writing differently than the person using it.


I think this is exactly right, and indeed this is a lot of the value. "Content-generation" is already a thing, and yes it doesn't need to make much sense. Apparently people who read it don't mind.


People don’t read it, search engines do.


BTW, we should mandatorily tag generated content for search engines in order to exclude it from future training sets.


Apart from that, hopefully the people building training sets use gltr or something similar to prevent training on generated text.

http://gltr.io/


Well, hopefully, someone will come up with a language model that picks words based on GLTR purpleness. Automated data collection deserves automated data.


I feel your stance [1] is demonstrably false in two challenges.

1) Please play a winning game of Go against Alpha Zero, just by googling the topic.

2) Next please explain how Alpha Zero’s game’s could forever change Go opening theory[2], without any genuine creativity.

[1] that “the output from GPT-3, DALL-E, et al is similar to what you get from googling the prompt and stitching together snippets from the top results.”

[2]”Rethinking Opening Strategy: AlphaGo's Impact on Pro Play” by Yuan Zhou


Op was clearly not talking about Alpha Zero, a different technology made by different people for a different purpose. Instead, they were noting that despite displaying some truly excellent world modeling, GPT-3 is trained on data that encourages it to vomit up rehashes. It's very possible that the next generation will overcome this and wind up completely holding together long-run concepts and recursion, at least if scaling parameters keeps working, but for now it is a real limitation.

GPT-3 writes like a sleepy college student with 30 minutes before the due date; with shockingly complete grasp of language, but perhaps not complete understanding of content. That's not just an analogy, I am a sleepy college student. When I write an essay without thinking too hard it displays exactly the errors that GPT-3 makes.


GPT-3 can’t play Go.


It almost definitely can to some extent given that gpt2 could play chess [0].

0. https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-...


Retrained (or to be precise: fine-tuned) GPT2 can play chess after training on additional data.


> I think of the value proposition of GPT-X as "what would you do with a team of hundreds of people who can solve arbitrary problems only by googling them?". And honestly, not a lot of productive applications come to mind.

If I was Xi Jinping, I would use it to generate arbitrary suggestions for consideration by my advisory team, as I develop my ongoing plan for managing The Matrix.


The intention behind it is pretty good. Best of luck to them.

I wonder if I can donate computing power to this remotely. Like the old SETI or protein folding things. Use idle CPU to calculate for the network. Otherwise the estimates I have seen on how much it would take to train these models are enormous.


Not directly related, but the Learning@home [1] project aims to achieve precisely that goal of public, volunteer-trained neural networks. The idea is that you can host separate "experts," or parts of your model (akin to Google's recent Switch Transformers paper) on separate computers.

This way, you never have to synchronize the weights of the entire model across the participants — you only need to send the gradients/activations to a set of peers. Slow connections are mitigated with asynchronous SGD and unreliable/disconnected experts can be discarded, which makes it more suitable for Internet-like networks.

Disclaimer: I work on this project. We're currently implementing a prototype, but it's not yet GPT-3 sized. Some issues like LR scheduling (crucial for Transformer convergence) and shared parameter averaging (for gating etc.) are tricky to implement for decentralized training over the Internet.

[1] https://learning-at-home.github.io/


Your project looks so interesting. Have u thught of putting the experts on a distributed market where their expertise and work can be exchanged for some token (obvlsy using a blockchain).

This would encourage people to host experts in your network and would create value.


Thank you! This is definitely something we should look into in the future (hopefully with community help); as of now, training infrastructure and model convergence are the highest priorities. That said, we welcome all ideas of ways to motivate more volunteers to join the experiments, because Learning@home team comes from a distributed DL background with limited volunteer computing expertise.

Also, I believe that for some projects (e.g. GPT-3 replication effort) people would want to join the network regardless of the incentive mechanism, as demonstrated by Leela Chess Zero [1].

[1] http://lczero.org/


How do you deal with adversarial/byzantine updates that attempt to degrade performance or even install a backdoor? Do you use plain averaging, or some other aggregation algorithm like Multi-Krum?


For now, the only separation we have is that each worker is responsible for its own weights, since network security has not been our top priority. Still, we've been thinking about adding some security measures like proof-of-work for each node and detection of anomalous inputs/gradients (or simply NaN values). Right now we're running experiments on internal hardware, but before a public launch we'll make sure that malicious participants won't put everybody else's work to waste :)


This is also what I was thinking about. Considering that making up bad data does not require any GPU work as opposed to honest calculating nodes, the model can fall quickly if without taking some measures to deal with them (adverserial nodes).

A draft solution would be for the central server to measure the goodness of each update and drop the ones that don't perform well. This could somehow work since inference is much cheaper than gradients computing.


Do you have a personal Twitter account I can follow? Your career is one I'd like to follow.


Sure! It's @m_ryabinin


Thanks! :D


The primary issue is that large scale GPU training is primarily dominated by communication costs. Since to some approximation things need to be synchronized after every gradient update, it becomes very quickly quite infeasible to increase the communication cost.


Yeah! sounds great, if I could easily run a SETI-at-home style thing to contribute to training a (future) model similar to GPT-x, but with the result freely available to play with, I reckon i'd do it. Could even be made into a game I'd say! I am totally aware that gpt3 itself can't be run on a regular workstation, but maybe hosting for instances of models from the best/most-interesting training runs could be worked out by crowd-funding?


I was just gonna propose something like this. Democratizing large ML models.


They get this suggestion a lot. There's a section in their FAQ that explains why it's infeasible.

https://github.com/EleutherAI/info


TensorFlow supports distributed training with a client server model.


Does it also solve the problem of everyone having different hardware?


It does.

For most models, your home broadband would be far too slow though.


Is it because they will have to communicate back errors during training? I forgot that training these models is more of a global task than proteins folding. In that sense this is less parallelizable over the internet.


Yes, and also activations if your GPU is too small to fit the whole model. The minimum useful bandwidth for that stuff is a few gigabits...


What about some kind of sharding, parts of the computation that could be executed in isolation for a longer period of time?


An ongoing research problem. OpenAI would certainly like being able to use smaller GPUs, instead of having to fit the entire model into one.


GPT-3 does not fit in any one GPU that exists at present. It's already spread out across multiple GPUs.


You're a genius! How do we get this started. Wouldn't there be so many bitcoin rigs already that could be used for this already?

I'd be very willing to spend time and money on this!

Could someone set up a chat room or something?


Serious question: is there a warez scene for trained models yet?

(I don't know how the model is accessed - are users of mainline GPT-3 given a .pb and a stack of NDAs, or do they have to access it through access-controlled API?)

Wherever data is desired by many but held by a few, a pirate crew inevitably emerges.


I think this also might be an interest to you

https://the-eye.eu/public/AI/pile_preliminary_components/


Those are datasets though, not models.


I am aware I just thought you might find them interesting.



GPT-3 users are given an API link which routes to Azure, full blackbox.


Checkpoint is not shared with customers, you only get access to an API endpoint.


The model is huge and is currently run in the cloud on many machines.


It's only 175 billion parameters, so presumably it can fit on a single computer with 1024 GB RAM.


Wouldn't you need this model to be in GPU RAM instead of regular RAM, though?


On CPU the latency would be absolutly prohibitive to the point of being useless.


For training yes, but not for inference.


From 2019: https://heartbeat.fritz.ai/deep-learning-has-a-size-problem-...

> Earlier this year, researchers at NVIDIA announced MegatronLM, a massive transformer model with 8.3 billion parameters (24 times larger than BERT)

> The parameters alone weigh in at just over 33 GB on disk. Training the final model took 512 V100 GPUs running continuously for 9.2 days.

Running this model on a "regular" machine at some useful rate is probably not possible at this time.


Sorry, but I don't see the link between the quote and your sentence.


Inference on GPU is already very slow on the full-scale non-distilled model (in the 1-2 sec range iirc), on CPU it would be an order of magnitude more.


The inference latency would also be prohibitive.


Once Microsoft became the sole and exclusive licensee of GPT-3, a credible open source effort was bound to emerge, and I hope it really is an effective alternative.


Eleuther works with Google. I'd rather prefer the Microsoft demon rather than the Google Demon.


The resources for the GPT-3 replication will be provided by the cloud company CoreWeave.


“Works with Google” and “gets free compute from TFRC” are not really the same thing.


I love the initiative, but I'm starting to get scared of what a post-GPT-3 world will look like. We are already struggling to distinguish fake news from real ones, automated customer request replies from genuine replies, etc. How will I know that I have a conversation with a real human in the future?

On the other side, the prospect of having an oracle that answers all trivia, fixes spelling and grammar, and allows humans to focus on higher level information processing is interesting.


The fake news thing is a real problem (and may become worse under GPT3 but certainly exists already). As for the others - to quote Westworld, "if you can't tell the difference, does it really matter?"


Most human communications between humans have some physical world purpose, and so an algorithm which is trained to create the impression that a purpose has been fulfilled whilst not actually having any capabilities beyond text generation is going to have negative effects except where the sole purpose of interacting is receiving satisfactory text.

Reviews that look just like real reviews but are actually a weighted average of comments on a different product are negative. Customer service bots that go beyond FAQ to do a very convincing impression of a human service rep promising an investigation into an incident but can't actually start an investigation into the incident are negative. An information retrieval tool which has no information on a subject but can spin a very plausible explanation based on data on a different subject is negative.

Of course, it's entirely possible for humans to bullshit, but unlike text generation algorithms it isn't our default response to everything.


> "if you can't tell the difference, does it really matter?"

It indeed does. The problem is that societies and cultures are heavily influenced and changed by communication, media, and art.

By replacing big portions of these components with artificial content, generated from previously created content, you run the risk of creating feedback cycles (e.g. train future systems from output of their predecessors) and forming standards (beauty, aesthetics, morality, etc.) controlled by the entities that build, train, and filter the output of the AIs.

You'll basically run the risk of killing individuality and diversity in culture and expression; consequences on society as a whole and individual behaviour are difficult to predict, but seeing how much power social media (an unprecedented phenomenon in human culture) have, there's reason to at the very least be cautious about this.


This problem affects all types of agents - natural or artificial. Agent acts in the environment, this causes experience and learning, and thus conditioning the future. The agent has no idea what other opportunities are lost behind past choices.


What scares me personally is the idea that I might be floating in a sea of uncanny valley content. Content that's 98% human-like, but then that 2% sticks out like a nail and snaps me out of it.

Sure, I might not be able to tell the difference the majority of the time, but when I can tell the difference it's gonna bother me a lot.


To me, a lot of content seems to be digital marketing horseshit tbh.


If you ask GPT-3 for three Lord of the Rings quotes it might give you two real ones and one fake one, because it doesn’t know what truth is and just wants to give you something plausible.

There are creative applications for bullshit, but something that cites its sources (so you can check) and doesn’t hallucinate things would be much more useful. Like a search engine.


Genuine question, why is this a problem? Sure, someone may be able to generate thousands of real-sounding fake news articles, but it's not like they will also be able to flood the New York Times with these articles. How do you worry you will be exposed to these articles?


If recent times have told us anything, it's that the biggest distributor of "news" is social media. And worse still, people generally have no interest in researching the items they read. If "fake news" confirms their pre-existing bias then they will automatically believe it. If real news disagrees with their biases then it is considered fake.

So in theory, the rise of deep fakes could lead to more people getting suckered into conspiracy theories and other such extreme opinions. We've already seen a small trend this way with low resolution images of different people with vaguely similar physical features because used as "evidence" of actors in hospitals / shootings / terrorist scenes / etc.

That all said, I don't see this as a reason not to pursue GPT-3. From that regard the proverbial genie is already out of the bottle. What we need to work on is a better framework for distributing knowledge.


It's not me I'm worried about - it's the 50% [1] of people who get their news from social media and "entertainment" news platforms. These people vote, and can get manipulated into performing quite extreme acts.

At the moment a lot of people seem to have trouble engaging with reality, and that seems to be caused by relatively small disinformation campaigns and viral rumours. How much worse could it get when there's a vast number of realistic-sounding news articles appearing, accompanied by realistic AI-generated photos and videos?

And that might not even be the biggest problem. If these things can be generated automatically and easily, it's going to be very easy to dismiss real information as fake. The labelling of real news as "fake news" phenomenon is going to get bigger.

It's going to be more work to distinguish what is real from what is fake. If it's possible to find articles supporting any position and a suspicion that any contrary new is then a lot of people are going to find it easier to just believe what they prefer to believe... even more than they do now.

[1] made-up number, but doesn't feel far off.


The majority of "fake news" are factual news described from a partial point of view and with a political spin.

Even fact checkers are not immune to this and brand other news as true or false not based on facts but based on the political spin they favour.

Fake news is a vastly overstated problem. Thanks to internet, we now have a wider breadth of political news and opinions and it's easy to label everything-but-your-side as fake news.

There are a few patently false lies on the internet which are taken as examples of fake news - but they have very few supporters.


> Even fact checkers are not immune to this and brand other news as true or false not based on facts but based on the political spin they favour.

Could you give an example?

> There are a few patently false lies on the internet which are taken as examples of fake news - but they have very few supporters.

How many do you consider "few"?

I can go to my local news site and read a story about the novel coronavirus and the majority of comments below the article are stating objectively false facts.

"It's just a flu" "Hospitals are empty" "The survival rate is 99.9%" "Vaccines alter your DNA"

...and so on.

There is the conspiracy theory or cult called QAnon, which "includes in its belief system that President Trump is waging a secret war against elite Satan-worshipping paedophiles in government, business and the media."

One QAnon Gab group has more than 165,000 users. I don't think these are small numbers.


> made-up number, but doesn't feel far off.

Pew Research says 18% report getting news primarily from social media (fielded 10/19-6/20)[0]. November 2019 research said 41% among 18-29 year olds, which was the peak age group. Older folks largely watch news on TV[1].

[0] https://www.journalism.org/2020/07/30/americans-who-mainly-g... [1] https://www.pewresearch.org/pathways-2020/NEWS_MOST/age/us_a...


Thanks for providing data. Evidence is better than making up numbers.


That fact that you made up that number is extremely funny in this context.


I don't think so - I was aware that it was a made-up number, and highlighted the fact that it was. It's the lack of awareness of what is backed up by data that is the problem I think.

Or am I missing your point?


Right, it's definitely goof that you cited it being fake, but I think the parent was pointing out the subtle (and likely unintentional) irony of discussing fake news while providing _fake_ numbers to support your opinion.


Journalists are paid by the word.


I know! One day it's going to get so bad people are going to have to deploy critical thinking instead of accepting what they read at face value and suffer the indignity of having to think for themselves.


Critical thinking won't help you when the majority (or all) of your sources are tainted and contradictory. At some point, the actual truth just gets swamped.


This is already happening, just with humans networked into social networks that favor quick reshare over deep review


This. Robots can spout 1000x more content than humans, if not more.


"the actual truth"


I'm going to throw some wild guess here and say that this sudden increase in critical thinking won't happen.


10,000 years of human civilization and this hasn't happened yet, huh? Any day now, I'm sure


Will they also learn to avoid magical thinking like "People en masse will all of a sudden develop abilities out of the blue"?


Sadly they don't. They start to believe to random things in and reject what they don't like.


Maybe that'll also be the Year of the Linux desktop I keep hearing so much about.


Maybe somebody could create an AI that evaluates the factfulness of articles.


This is possible, especially with human in the loop.


Or have the AI generate only fact-based, polite and relevant comments.

Related xkcd: https://xkcd.com/810/


Don't read news. Go to original sources and scientific papers. If you really want to understand something, a news website should only be your starting point to look for keywords. That is true today as it will be "post-GPT-3".


This scales badly today and will scale even worse in the future. Those without education or time resources will at best manage to read the news. Humanity will need a low effort way to relay reliable information to more of it's members.


> Go to original sources and scientific papers.

Given how much bunk “science” (and I'm talking things completely transparent to someone remotely competent in the field) gets published, especially in psychology, it's difficult to do even that.


You are right. You still have to read critically or find trusted sources, of course.


Primary sources need to be approached with caution https://clas.uiowa.edu/history/teaching-and-writing-center/g...


And so does every other source. You can play that analysis game with any source material. The problem is that the accuracy and detail of the reporting usually fades with each step towards mass media content.


That's what people should do, and that's what you and I will do, but many won't, especially the less educated (no condescension intended). They'll buy into the increased proliferation of fake info. It's because of these people that I think the concerns are valid.


Honestly, I consider myself fairly educated (I have a PhD in CS), but if the topic at hand is sufficiently far from my core competence, then reading the scientific article won't help. I keep reading about p-value hacking, subtle ways of biasing research, etc., and I realize that, to validate a scientific article, you have to be a domain expert and constantly keep up-to-date with today's best standards. Given the increasing number of domains to be an expert in, I fail to see how any single human can achieve that without going insane. :D

I mean, Pfizer could dump their clinical trial reports at me, and I would probably be unable to compute their vaccine's efficiency, let alone find any flaws.


Besides the time scalability aspect highlighted by someone else, I am worried that GPT-3 will have the potential to produce even "fake scientific papers".

Our trust fabric is already quite fragile post-truth. GPT-3 might make it even more fragile.


I wouldn't worry about the science bit. No one worries about the university library getting larger and larger or how its going to confuse or misguide people even though everyone knows there are many books in there, full of errors, outdated information, badly written, boring etc etc etc.

Why? Cause there is always someone on campus who knows something about a subject to guide you to the right stuff.


Most people are not smart enough to do this, and even if they are, they don't have enough time in their day.


Photoshop has existed for decades. Is it really that big of a problem for photo news?


The concern is not so much AI generated news, but malicious actors misleading, influencing or scamming people online at scale with realistic conversations. Today we already have large scale automated scams via email and robo call. Less scalable scams like Tinder love catfish scams or Russian/China trolls on reddit are now run by real people, imagine it being automated. If human moderators cannot distinguish these bots from real humans, that is a scary thought, imagine not being able to tell if this comment was written by a human or robot.


why does this matter? the internet is filled with millions of very low quality human generated discussions right now. There might not be much of a difference between thousands of humans generating comment spam and thousands of gpt-3 instances doing the same


It does matter. The nice feeling of being one of many is a feature of echo chambers. If you can create that artificially for anything with a push of a button, it's a powerful tool to edit discourse or radicalize people.

Have a look at Russias interference in the previous US election. This is what they did, but manually. To be able to scale and automate it is huge.


But careful, the human psyche has some kind of tipping point. Too much fake news, and it will flip. Too less, no real influence is made.

The exact balance must be orchestrated by a human.


> imagine not being able to tell if this comment was written by a human or robot

I think neural nets could help finding fake news and factual mistakes. Then it wouldn't matter who wrote it if it is helpful and true.


Touch ups where done before photoshop but now it’s ALWAYS done. The issues this has created in society might have a bigger emotional impact than we give it credit for.

Regarding photo news there has been quite a lot of scandals to the point that I’d guess the touchups is more or less accepted.


I conducted a workshop in media compentency for teenage girls and one of the key learnings was that every image of a female subject they encounter in printed media (this was before Instagram) has been retouched.

To hammer the point home I let them retouch a picture by themselves to see what is possible even for a completely untrained manipulator.

It was eye-opening - one of the things that should absolutely be taught in school but isn't.


"one of the things that should absolutely be taught in school but isn't."

Namely, critical thinking?


I don't think "critical thinking" is the point here. Because first you need to know that such modifications CAN be done. And not everybody knows what can be retouched with PS or programs. So yeah, if you see some super-model on a magazine cover, and you don't know PS can edit photos easily, it would be not that immediate to think "hey maybe that's not real!".

As an extreme example: would you ever checked 20 years ago a newspaper text to know if it was generated by an AI or by a human? Obviously no, because you didn't know of any AI that could do that.


I think I made my point badly because I also agree.

I am lamenting that teenagers were, in this day and age, surprised at what can be done with Photoshop. And that let loose on the appropriate software were surprised at what can be altered and how easily.

My point is suggesting this may be so because people have not been taught how to think for themselves and accept things (in this case female images) 'as is', without a hint of curiosity. It is also a problem but at the other end of the stick, with many young people I work with considering Wikipedia to be 100% full of misinformation and fake news.


Exactly this.

There is a secondary aspect of becoming aware that society has agreed on beauty standards (different for different societies) and PS being used as a means to adhere to these standards.


The difference between Photoshop and generative models is not in what it can technically achieve, but the cost of achieving the desired result. Fake news photo or text generation is possible by humans, but scales poorly compared (more humans) to a algorithmically automated process (some more compute).



Wow, that first Adnan Hajj photograph looks absolutely terrible.


We already live in a post-GPT3 world, but one where all its power is the hands of a private company.

The conversation needs to move on whether making it open and democratic is a good idea, but the tech itself is here to stay.


EleutherAI’s unofficial motto is “anything safe enough for Microsoft to sell for profit is safe enough to release.” You’re kidding yourself if you think Microsoft cares who they sell it to.

There are worthwhile conversations to have about democratizing technology, but this stuff is already out there regardless of what you or I do.


> How will I know that I have a conversation with a real human in the future?

This problem should be solved with cryptography, not by banning large neural nets.


It's likely to be bad, such as:

Massively plagiarize articles and the search engine probably have no way to identify which is the original content. It's like to rewrite everything on the internet using your own words, this may lead to the internet filled with this kind of garbage.

Reddit and platforms alike filled with bots say bullshits all the time but hard to identify by the human in the first place (current model is pretty good at generating metaphysical bullshits, but rarely insightful content). People may be surrounded by bot bullshitters and trolls, and very few of them are real.

Scams at larger scales. The skillset is essentially like customer service plus bad intentions. With new models, scammers can do their things at scale and find qualified victims more efficiently.


>(current model is pretty good at generating metaphysical bullshits, but rarely insightful content)

Wait, are we talking about bots posting crap, or the average political discussion?


I believe it is both...


You will not. Welcome to the scary generative future.


I was hoping for a "yes, we can" attitude here. :D


It’s a lot less expensive to hire a dozen teenagers to write fake news than use GPT-3. The cost to train GPT-3 is about the same as the cost to hire 1,000 people and pay them $10/hour to write responses for a year. Not to mention the fact that inference isn’t free.


> already struggling to distinguish ... automated customer request replies from genuine replies

I hope it's not only due to a decline in the quality of human support. If we could have really useful automated support agents, I for one would applaud that.


I agree. As long as it is transparent that I am speaking to an automated agent and I can easily escalate the issue to a human that can solve my problem when the agent gets stuck.


Deep fakes still feel quite uncanny valley to me. Even if they move beyond that convincing fake images have existed for a long while.

As for support, I don't really see why it matters if I'm talking to a clever script or an unmotivated human.


We'll go full circle and you'll be forced to meet people in person again.


Incidentally, over the weekend I listened to a two-hour-long presentation[1] by the cognitive scientist, Mark Turner[2]. The talk's main goal was to explore the question "Does cognitive linguistics offer a way forward for second-language teaching?"

During the Q&A, Turner explicitly mentions GPT-3 (that's when I first heard of it) as a "futuristic (but possible) language-learning environment" that is likely to be a great boon for second-language learners. One of the appealing points seems to be "conversation [with GPT-3] is not scripted; it keeps going on any subject you like". Thus allowing you to simulate some bits of the gold standard (immersive language learning in the real world).

As an advanced Dutch learner (as my fourth language), I'm curios of these approaches. And glad to see this open source model. (It is beyond ironic that the so-called "Open AI" turned out to have dubious ethics.)

[1] https://www.youtube.com/watch?v=A4Q977p8PfQ

[2] Check out the excellent book he co-authored, "Clear and simple as the truth"—it has valuable insights on improving writing based on some robust research.


https://github.com/EleutherAI/gpt-neo (Couldn't find it on the website)


That’s because the code is inefficient, won’t scale, and doesn’t work too well. The project is hype and no substance.

They also pivoted to dall-e to cover up their complete failure to deliver on any of their promises with gpt3, which was an interesting move.


It's a shame that it has turned out to be necessary to externally re-make and re-train a model that has come out of company called `OPEN`AI. Wasn't one of the founding principles of it that all of the research would be available to the public? Isn't that the premise on which the initial funding was secured? Best of luck to Eleuther.


OpenAI turning out to be a total bait and switch. Especially true when your co-founder is actively calling you out on it[1]

Remember kids: if it's not a non-profit organization it is a _for_ profit one! It was silly to expect anything else:

> In 2019, OpenAI transitioned from non-profit to for-profit. The company distributed equity to its employees and partnered with Microsoft Corporation, who announced an investment package of US$1 billion into the company. OpenAI then announced its intention to commercially license its technologies, with Microsoft as its preferred partner [2]

1 - https://edition.cnn.com/2020/09/27/tech/elon-musk-tesla-bill...

2 - https://en.wikipedia.org/wiki/OpenAI


It will be interesting to see the attitude of Microsoft towards this project in the light of their "Microsoft loves open source" propaganda.


I don't know where people got this idea that Microsoft can't participate positively in Open Source, and do that sincerely, without open sourcing absolutely everything.

Of course they can - just because you contribute to open source, and do that because you also benefit from open source projects, doesn't mean you have to do absolutely everything under open source.

Especially considering OpenAI isn't even Microsoft's IP or codebase.


> I don't know where people got this idea that Microsoft can't participate positively in Open Source, and do that sincerely, without open sourcing absolutely everything.

I'm not claiming that. Of course there is place for closed and open elements of their offerings. Let me clarify.

In the past, Microsoft was very aggressive about open source. When they realized this strategy of FUD brings little result, they changed their attitude 180 and decided to embrace it putting literal hearts everywhere.

Personally, I find it hypocritical. There is no love/hate, just business. They will use whatever strategy works to get their advantage. What I find strange is that people fell for it.


But why on this thread then, about GPT-3? It's not even their own company, IP or source to give away.

But even when Microsoft can't open source it because it's not theirs, we still have people posting in this thread that this is further evidence that Microsoft is hypocritical. It sounds a lot like a form of Confirmation Bias to me where any evidence is used as proof that Microsoft is 'anti-open-source'.


I think it is because each model from OpenAi was public until Microsoft became an investor.


How about when Steve Ballmer said something along the lines of

“Linux is a cancer that attaches itself in an intellectual property sense to everything it touches”

Pretty sure that is hostile towards open source? Linux being one of the flagship projects of open source.

[edit] source https://www.zdnet.com/article/ex-windows-chief-heres-why-mic...


Disclaimer: Microsoft employee

In my experience, I work at a completely different company than the one that Ballmer ran. Nearly everyone I talk to speaks of the "Ballmer era" in a negative light, and confirms that Satya literally turned the entire company on its head.

Many things happen every day that would never have happened under Ballmer.


It’s hostile to the GPL licence which means anything licensed under GPL can’t be used in Microsoft’s proprietary products.

I would personally say Microsoft wasn’t necessarily driven by anti open-source hate necessarily, they were just very anti-competitor. Microsoft tried to compete with their biggest competitor? Colour me shocked.


I don't think this should be seen in the light of "open source everything" but more that many see Microsoft doing open source not as part of "being good" but part of their age old "embrace extend extinguish" policy.


I don't know where people got the idea that companies can be "sincere." Sincerity is faithfully communicating your mental state to others. A company's mental state can change on a dime based on the decisionmaking of people who rely on the company for the degree of profit it generates. Any analog to sincerity that you think you see can probably be eliminated by firing one person after an exceptionally bad quarter (or an exceptionally good one.)


Sincere to me just means that you are being truthful, or not trying to be deceptive.

And I think companies can be sincere - because companies are really just groups of people and assets when you get down to the nuts and bolts of it.


> companies can be sincere

"sincere", "honest", "hypocritical" usually refers to a long-term pattern. Being able to be sincere from time to time is besides the point.

> companies are really just groups of people

...with profit as their first priority.

For-profit companies "can be sincere" only as long as it's the most profitable strategy.


Like many other companies, Microsoft loves unpaid labor.

Free Software is about giving freedom and security all the way to the end users - rather than SaaS providers.

If you remove this goal and only focus on open source as a development methodology you end up with something very similar to volunteering for free for some large corporation.


So OpenAI employees get Microsoft RSUs?


What is an RSU?


It means restricted stock unit, and it's a kind of company stock unit that may be distributed to some "valued" employees. There is usually a vesting schedule, and you can't do whatever you want with it.


Restricted Stock Units


Why would they? It's a separate company.


But I was told GPT-3 was too powerful for mere mortal hands (unless you have an account!) and that it would be used for hate speech and to bring about skynet.

How will this project avoid those terrible outcomes?


New research has revealed that intelligence is not a prerequisite for generating hate speech on social media platforms.


By putting the cat back in the bag. Oh, it's too late ... useless to think about it - we can't de-invent an invention or stop people from replicating. Its like that time when NSA wanted to restrict crypto.


I don't know a single intelligent person who believed this argument, it simply doesn't hold up.


It's a recurring theme in OpenAI research, they become more and more closed. For instance their latest model called DALL·E hit the headlines before the release of the paper. Needless to say, the model is not available and no online demo has been published so far.


Because its winner-take-all in this research, not "winner-take-some".

Andrew Yang talked about this and why breaking up Big Tech won't work. No one wants to use the second best search engine. The second best search engine is Bing and I almost never go there.

Tech isn't like automobiles, where you might prefer a Honda over a Toyota, but ultimately they're interchangeable. A Camry isn't dramatically different and doesn't perform dramatically better than an Accord. Whoever builds the best AI "wins" and wins totally.


But they still released the CLIP model which is the complement of DALL-E and used in the DALL-E pipeline as a final filter. There are collabs with CLIP floating around and even a web demo.


Thank you for this info, as you mentioned CLIP is used for re-ranking DALL-E outputs, by itself it is just an (image, text) pairs classification network.


Really. OpenAI assembled some of the best minds from the deep learning community. The problem isn't that they are a for-profit SaaS, the problem is they lied.


And ended up making an AI service that's really good at... lying.


I don't really care if OpenAI offers commercial licenses as long as the underlying research is truly open. This way alternative options will become available eventually.


Arguably openAI is one of the most closed industry AI labs (among those that are still participating in the research community), on par only with deep mind (though deepmind at least publishes way more). Funnily enough, FAIR and Google Brain have a vastly better track record wrt. publishing not only papers but also code and models.


It was probably bait and switch to hire top researchers and get initial funding. Now that OpenAI is a household name, they don't have to pretend anymore.


I buy the former, researchers might be happier knowing their work potentially benefits all of humanity, not just a bunch of investors. But wouldn’t it be more difficult to get funding as a non-profit?


It's just never going to be difficult to get funding when you have Elon Musk and Sam Altman as founders (and even more so when founders put one billion of their own money into it).


Sure, but that's OpenAI's particular set of circumstances. Generally speaking I struggle to see investors preferring a nebulous non-profit over a for-profit with a clear path to market.


Sure, but we're explicitly talking about OpenAI here.


Of course. It's just that the comment I've been responding to suggested OpenAI going the "open"/non-profit route was to 1) get top researchers and 2) get investment. I was arguing that this doesn't seem to (generally) be a good way to get investment, but I agree with you in that in their case investment just wasn't a consideration at all.


The research is open to the public. Here's the gpt3 paper https://arxiv.org/abs/2005.14165

Also gpt2 models and code at least were publicly released and so has a lot of their work.

And yes, they realized they can achieve more by turning for profit and partnering with Microsoft. So true, they are not fully 'open' but pretending they don't release things to the public and making the constant 'more like closedai aimirite' comments is getting old.


Wild-Ass Guess (Ass-Guess) incoming:

OpenAI was built to influence the eventual value chain of AI in directions that would give the funding parties more confidence that their AI bets would pay off.

This value chain basically being one revolving around AI as substituting predictions and human judgement in a business process, much like cloud can be (oversimply) modeled as moving Capex to Opex in IT procurement.

They saw that, like any primarily B2B sector, the value chain was necessarily going to be vertically stratified. The output of the AI value chain is as an input to another value chain, it's not a standalone consumer-facing proposition.

The point of OpenAI is to invest/incubate a Microsoft or Intel, not a Compaq or Sun.

They wanted to spend a comparatively small amount of money to get a feel for a likely vision of the long-term AI value chain, and weaponize selective openness to: 1) establish moats, 2) Encourage commodification of complementary layers which add value to, or create an ecosystem around, 'their' layer(s), and 3) Get insider insight into who their true substitutes are by subsidizing companies to use their APIs

As AI is a technology that largely provides benefit by modifying business processes, rather than by improving existing technology behind the scenes, your blue ocean strategy will largely involve replacing substitutes instead of displacing direct competitors, so points 2 and 3 are most important when deciding where to funnel the largest slice of the funding pie.

_Side Note: Becoming an Apple (end-to-end vertical integration) is much harder to predict ahead of time, relies on the 'taste' and curation of key individuals giving them much of the economic leverage, and is more likely to derail along the way._

They went non-profit to for-profit after they confirmed the hypothesis that they can create generalizeable base models that others can add business logic and constraints to and generate "magic" without having to share the underlying model.

In turn, a future AI SaaS provider can specialize in tuning the "base+1" model, then selling that value-add service to the companies who are actually incorporating AI into their business processes.

It turned out, a key advantage at the base layer is just brute force and money, and further outcomes have shown there doesn't seem to be an inherent ceiling to this; you can just spend more money to get a model which is unilaterally better than the last one.

There is likely so much more pricing power here than cloud.

In cloud, your substitute (for the category) is buying and managing commodity hardware. This introduces a large-ish baseline cost, but then can give you more favorable unit costs if your compute load is somewhat predictable in the long term.

More importantly, projects like OpenStack and Kubernetes have been desperately doing everything to commodotize the base layer of cloud, largely to minimize switching costs and/or move the competition over profits up to a higher layer. You also have category buyers like Facebook, BackBlaze, and Netflix investing heavily into areas aimed at minimizing the economic power of cloud as a category, so they have leverage to protect their own margins.

It's possible the key "layer battle" will be between the hardware (Nvidia/TPUs) and base model (OpenAI) layers.

It's very likely hardware will win this for as long as they're the bottleneck. If value creation is a direct function of how much hardware is being utilized for how long, and the value creation is linear-ish as the amount of total hardware scales, the hardware layer just needs to let a bidding war happen, and they'll be capturing much of the economic profit for as long as that continues to be the case.

However, the hardware appears (I'm no expert though) to be something that is easier to design and manufacture, it's mostly a capacity problem at this point, so over time this likely gets commoditized (still highly profitable, but with less pricing power) to a level where the economic leverage goes to the Base model layer, and then the base layer becomes the oligopsony buyer, and the high fixed investment the hardware layer made then becomes a problem.

The 'Base+1' layer will have a large boom of startups and incumbent entrants, and much of the attention and excitement in the press will be equal parts gushing and mining schaudenfreude about that layer, but they'll be wholly dependent on their access to base models, who will slowly (and deliberately) look more and more boring apart from the occasional handwringing over their monopoly power over our economy and society.

There will be exceptions to this who are able to leverage proprietary data and who are large enough to build their own base models in-house based on that data, and those are likely to be valuable for their internal AI services preventing an 'OpenAI' from having as much leverage over them and being much better matched to their process needs, but they will not be as generalized as the models coming from the arms race of companies who see that as their primary competitive advantage. Facebook and Twitter are two obvious ones in this category, and they will primarily consume their own models, rather than expose them as model-as-a-service directly.

The biggest question to me is whether there's a feedback loop here which leads to one clear winning base layer company (probably the world's most well-funded startup to date due to the inherent upfront costs and potential long-term income), or if multiple large, incumbent tech companies see this as an existential enough question that they more or less keep pace with each other, and we have a long-term stable oligopoly of mostly interchangeable base layers, like we do in cloud at the moment.

Things get more complex when you look to other large investment efforts such as in China, but this feels like a plausible scenario for the SV-focused upcoming AI wars.


Apparently you don't need to be a large company to train GPT-3. EleutherAI is using free GPU from CoreWeave, the largest North American GPU miner, who agreed to this deal to get the final model open sourced and have their name on it. They are also looking at offering it as an API.


I think it's great they're doing this, but GPT-3 is the bellwether not the end state.

Open models will function a lot like Open Source does today, where there are hobby projects, charitable projects, and companies making bad strategic decisions (Sun open sourcing Java), but the bulk of Open AI (open research and models, not the company) will be funded and released strategically by large companies trying to maintain market power.

I'm thinking of models that will take $100 million to $1 billion to create, or even more.

We spend billions on chip fabs because we can project out long term profitability of a huge upfront investment that gives you ongoing high-margin capacity. The current (admittedly early and noisy) data we have about AI models looks very similar IMO.

The other parallel is that the initial computing revolution allowed a large scale shift of business activities from requiring teams of people doing manual activities, coordinated by a supervisor towards having those functions live inside a spreadsheet, word processor, or email.

This replaces a team of people with (outdated) specializations with fewer people accomplishing the same admin/clerical work by letting the computer do what it's good at doing.

I think a similar shift will happen with AI (and other technologies) where work done by humans in cost centers is retooled to allow fewer people to do a better job at less cost. Think compliance, customer support, business intelligence, HR, etc.

If that ends up being the case, donating a few million dollars worth of GPU time doesn't change the larger trends, and likely ends up being useful cover as to why we shouldn't be worried about what the large companies are up to in AI because we have access to crowdsourced and donated models.


I think calling this a "wild-ass guess" undersells it a bit (either that or we have very different definitions of a WAG).Very well though-through and compelling case.

My biggest question is whether composable models are indeed the general case, which you say they confirmed as evidenced by the shift away from non-profit. It's certainly true for some domains, but I wonder if it's universal enough to enable the ecosystem you describe.


This is neat, but almost no startups of any kind, even mid size corps, have such complicated and intricate plans.

More likely: OpenAI was a legit premise, they started to run out of money, MS wanted to license and it wasn't going to work otherwise, so they just took the temperature with their initial sponsors and staff and went commercial.

And that's it.


What does the future of open-source large neural nets look like? My understanding is GPT-3 takes ~600GB of GPU memory to run inference. Does an open source model just allow you a choice of a handful of cloud providers instead of one?


Open source doesn’t mean that everyone will be rolling their own. It means that lots of players will start to offer endpoints with GPT-X, perhaps bundled with other services. It is good for the market.


Right, this is why a cloud company is sponsoring the replication. They can’t sell GPT-3 as a service until they get a copy of it.


I'd love to see an equal amount of the effort put toward initiatives like this, also being put toward mitigating their extremely likely negative societal impacts (and putting in safeguards).

Of course, that's not nearly as sexy.

Yes, there are lots of incredible positive impacts of such technology, just like there was with fire, or nuclear physics. But that doesn't mean that safeguards aren't absolutely critical if you want it to be net win for society.

These negative impacts are not theoretical. They are obvious and already a problem for anyone who works in the right parts of the security and disinformation world.

We've been through all this before... https://aviv.medium.com/the-path-to-deepfake-harm-da4effb541...

Of course, some of the same people who ignored recommendations[1] for harm mitigations in visual deepfake synthesis tools (which ended up being used for espionage and botnets) seem to be working on this.

[1] e.g. https://www.technologyreview.com/2019/12/12/131605/ethical-d...


Is there any real justification behind this fear of close nature of OpenAI or this is just frustration coming out? We had this debate of closed Vs open source 20 years back and eventually opensource won it because of various reasons. Won't those same reasons apply to this situation of close nature of OpenAI? If so then why are people worried about this? What is differnt this time?


The cost.

Closed source and open source developers use the same $300-3,000 laptops / desktops. Everybody can afford them.

Training a large model in a reasonable time costs much more. According to https://lambdalabs.com/blog/demystifying-gpt-3/ the cost of training GPT-3 was $4.6 million. Multiply it by the number of trial and errors.

Of course we can't expect that something that costs tens or hundreds of millions will be given away for free or to be able to rebuild it without some collective training effort that distributes the cost on at least thousands of volunteers.


OpenAI only trained the full sized GPT-3 once. Hyperparameter sweep was conducted on significantly smaller models (see: https://arxiv.org/abs/2001.08361)


This. Plus the increasing amount of intransparent results. Training data is private, so it's impossible to even try to recreate results, validate methods, or find biases/failure cases.


This is dead right and very important. The data you train on is much more important than model architecture in terms of validation and compliance, and yet it’s a closely held secret. Producing or obtaining good data is a pain in the ass.

For this reason, EAI has made the data we are training on public. I can’t link to it because of anon policies at conferences, but if you look at our website I’m sure you can find a paper detailing it and a link to download it.


So how much money would it take to rebuild this foss alternative ? And distributive power like seti@home? If it can be done and I hope it does, what benefit would the original proprietary one have over this? Licensing?


EleutherAI has already secured the resources necessary.

They get the seti@home suggestion a lot. There's a section in their FAQ that explains why it's infeasible.

https://github.com/EleutherAI/info


OpenAI will execute the original one for you. If you can get an account, anyway.


I'd gladly contribute (power and) few of idle GTX cards I have to public peer/volunteer/seti@home-like project if result snapshot(s) are available publicly/to registered, active contributors.


SETI@home style distributed computation is not suitable for training something like GPT-3, unlike for SETI, the unit of work a node can do before needing to share it's output with the next node is really small, so very fast interconnect between the nodes is needed (Infiniband and NVLink is used in clusters to train it). It would probably take a decade to train such a model over regular internet.


Maybe a case for a community colocation cloud where I a consumer can buy a system and colocate it in a large data center with great internal networking. Edit: typo


Handling heterogenous (and potentially untrustworthy) systems also adds overhead, not to mention that buying hardware in bulk is cheaper, so it makes the most sense just to raise the money and buy the hardware.


The problem is potentially solvable as generating solutions takes a lot of GPU time and verifying it is very fast. Aquiring input data may be a problem, but should be possible with dedicated models for this type of computation.


Are there any models/research optimised on working on this kind of small, distributed batches that would fit ie. ~10GB of commodity GPU?


Broken link, broken url fix: https://eleuther.ai/projects/gpt-neo


How does the outfit intend to fund the project? OpenAI spends millions on computing resources to train the models.


Hey! One of the lead devs here. A cloud computing company called CoreWeave is giving us the compute for free in exchange for us releasing it. We're currently at the ~10B scale and are working on understanding datacenter scale parallelized training better, but we expect to train the model on 300-500 V100s for 4-6 months.


I imagine recreating the model will be computationally cheaper because they will not have to sift through the same huge hyperparameter space as the initial GPT-3 team had to.


This is not true. The OpenAI team only trained one full-sized GPT-3, and conducted their hyperparameter sweep on significantly smaller models (see: https://arxiv.org/abs/2001.08361). The compute savings from not having to do the hyperparameter sweep are negligible and do not significantly change the feasibility of the project.


Why is that?


The cloud company CoreWeave has agreed to provide the GPU resources necessary.


With Open-AI being corporate controlled and not really 'Open'. Is Neo a nod at 'The Matrix'?


Is it standard to prune these kinds of large language models once they've been trained to speed them up?


Would be good if this could decentralized bittorrent/BOINC style somehow.

Wouldn't mind contributing some horsepower


They get this suggestion a lot. There's a section in their FAQ that explains why it's infeasible.

https://github.com/EleutherAI/info


I see. THanks for highlighting that. Makes sense unfortunately


I love the name's play on Greek Eleutheria ("ἐλευθερία") - freedom, liberty!


Hope there will be a distilled or approximate attention version so it can be run on consumer gpus.


Do you know how can one donate?


It still baffles me that GPT turned out to be more than a glorified markov chain text generator. It seems we’ve actually made it create a model of the world to some degree.

And we kind of just stumbled on the design by throwing massive data and neural networks together?


It turns out that brute-force works, and the scaling curve is still not bending.

I doubt we'll ever see a GPT-4, because there are known improvements they could make besides just upsizing it further, but that's besides the point. If that curve doesn't bend soon then a 10x larger network would be human-level in many ways.

(Well, that is to say. It's actually bending. Upwards.)


What % of all digitized and reasonably easy-to-access text data did they use to train GPT-3? I'm wondering whether the current limits on GPT-n are computation or data.


> As per the creators, the OpenAI GPT-3 model has been trained about 45 TB text data from multiple sources which include Wikipedia and books.

It's about 400 B tokens. Library if Congress is about 40M books, let's say 50K tokens per book, or about 2T tokens. Not necessarily unique.

I would say it's plausible that it was a decent percent of the indexed text available, and even more of the unique content. GPT2 was 10B tokens. Do we have 20T tokens available for GPT4? Maybe. But the low hanging fruit are definitely plucked.


So fascinating. I’d love to understand why it’s working so well. I guess no one knows.

Wouldn’t gpt4 just be more data and more parameters?


You're made of meat and yet you manage to be more than a glorified markov chain generator. :)

(I hope)


this is beautiful. why not? maybe we can make something eventually better than the now closed source version


If they succeed, Eleuther should change their name to ReallyOpenAI.


Or for extra irony, ClosedAI


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: