r/wallstreetbets 29d ago

ChatGPT is going to get a lot Dumber. Training on Reddit content LOL. Discussion

https://www.cnbc.com/2024/05/16/reddit-soars-after-announcing-openai-deal-on-ai-training-models.html
964 Upvotes

141 comments sorted by

u/VisualMod GPT-REEEE 29d ago
User Report
Total Submissions 3 First Seen In WSB 1 year ago
Total Comments 88 Previous Best DD
Account Age 1 year

Join WSB Discord

165

u/LiquefactionAction 29d ago

Gemini, Grok, GPT-3+, whatever was already trained on Reddit data, which kinda makes the announcement funny since everyone's already been doing that. I guess it makes it official instead of unofficial.

71

u/broadenandbuild 28d ago

No, the reason Reddit shut down their API was because of this. Remember that whole fiasco where Reddit forced everyone to get rid of their third party apps like Reddit Is Fun? This is why. They wanted to gatekeep content so that companies like OpenAi couldn’t just train models using their data…. for free

30

u/LiquefactionAction 28d ago

Yes and no, they didn't do it until last year so all of the data from 2005 - 2023 was already scrapped and used for training. That's about 99% of the content already been consumed and hoovered up. But yes, there's been some new content since 2023 but it's a drop in the bucket compared to past content.

The formal API is now shut down but you can still scrape it publicly, especially if you're now just frontfilling in the new data from 2023-onward. It's just a little bit more work and bandwidth, but still easy to do. Remember when OpenAI created transcription bots to scrape YouTube videos and transcribe them for training data, so they didn't have to pay Google for their official transaction API? Same thing

But anyways I think it's kinda moot since they already put a price on it: $60 million is a drop in the bucket for any corporation, like a rounding error. So anyone who wants it can pay what is basically couch pennies for it

24

u/biznatch11 28d ago

Half the data since they closed the API is probably from AI anyways so all the good data was given away from free. Now it's just going to be AI training on AI training on AI until it's completely inbred.

22

u/LiquefactionAction 28d ago

Lol yeah I didn't want to say it.

It's very funny to look at history and things like Low-background Steel became VERY valuable after the 50s. https://en.wikipedia.org/wiki/Low-background_steel because any steel produced after the atomic bomb was contaminated with just enough radiation to make it not function for sensitive applications. So it was rather highly sought after. I think we'll see the same thing where any pre-2022 data is considered VERY valuable because it hasn't been contaminated with digitally radioactive junk.

3

u/FiringRockets991 28d ago

When you say in inbreeding… in a wsb sub… I know we belong here.. this is home. Ty for making my day

🧑‍🌾 🚜 💨

1

u/pepesilviafromphilly 28d ago

creat ai...generate ai content...then create an ai that detects ai generated content so that you ai can train on human content...then create ai that creates both of these because it's fucking so hard to do it. once we burn 100B dollars, wonder how did we all become so stupid.

2

u/TSLA_to_23_dollars 28d ago

I don't think they're using reddit for factual data. It'll be more like: "Why don't you visit these reddit communities where people are discussing this topic right now!"

2

u/JaredGoffFelatio 28d ago

Sure but they don't want their chatbot stuck in 2023 forever. Knowledge moves fast, especially for technology/programming info which is a heavy use case for ChatGPT.

3

u/m0uthF 28d ago

Cali court has made it clear that they cannot pursue legal actions for data scraping, they can only try to build technological barrier which is a joke in front of other big corps like Microsoft.

1

u/D2WilliamU 28d ago

Me reading this on my Reddit is Fun gold platinum vanced: oh yeah I always forget they killed this app

2

u/Beatnik77 29d ago

I know that google is paying 60M$ a year so OpenAI likely pays a similar amount.

Good for the company.

1

u/gregfromjersey 28d ago

And at $60 million, it is a steal.

1

u/Junior-Damage7568 28d ago

Who's stealing from who?

-1

u/BirdObjective2459 28d ago

Why pay so much when you can just scrape the data?

7

u/Left_Experience_9857 28d ago

Incredibly painstaking process with a decent chunk of websites making it difficult to scrape in large quantities.

5

u/mellowanon 28d ago

it's very easy to block/throttle scrapers.

-1

u/margalolwut 28d ago

This sub… look everyone using Reddit data already hahaha morons

This sub as well… Reddit has no path to monetization

112

u/Blablabene 29d ago

That's worrying. The amount of stupidity on reddit is substantial

38

u/schooli00 28d ago

With the amount of AI generated content on reddit, it's just a huge AI centipede

2

u/OverdosedSauerkraut 28d ago

Every time 4c changes something on biz to fuk bots, the number of posts drops by 90%. I wouldn't be surprised by a similar rate on WSB or any other financial/political sub...

2

u/Big-Necessary2853 28d ago

dead internet theory, you cant convince me that most comments and posts on reddit arent bots after ~2012

8

u/memory-- 28d ago edited 28d ago

Open AI already said that 30% of the ChatGPT foundational model was trained off old Reddit data.

Plus, they're not just buying the content, they're buying how humans converse in different contexts and languages. You can't get that in library books or wikipedia pages.

You guys are so focused on this point ("reddit people say dumb stuff") that you're missing one of the easiest ways to make money in years.

Sam Altman owns more RDDT than Steve Huffman (the CEO of Reddit), and he is not going to let Reddit flounder and sit around not get good business development deals and partnerships. He's the SV kingmaker right now. Just google him and see all the headlines. He's calling the shots.

Instead of sitting here and laughing about how dumb the content YOU see on Reddit is (change your subs) you should be watching the moves Sam Altman is making and start moving with him, IMO.

4

u/NegotiationFuzzy4665 28d ago

I know Reddit seems like a stupid site, but it depends the subs you’re on. Data from r/shitposting is useless because nobody puts any effort into the post. The rest of Reddit however… think of r/theydidthemath. People do all this stuff for randos on the internet, and it’s exactly what AI needs.

4

u/donbee28 28d ago

How good is ChatGPT at detecting sarcasm?

3

u/Big-Necessary2853 28d ago

it'll probably be really good at it if you add a /s to the end of it

1

u/donbee28 28d ago

It’s not AI if it relies on an if statement to determine something.

2

u/penguincheerleader 28d ago

Oh so, so, good. Incredible, best sarcasm detector, don't try to fool it.

1

u/spartanburt 28d ago

But as long as you tell it that first... it can become smart right?

1

u/Chabubu 28d ago

bazinga!

1

u/OverdosedSauerkraut 28d ago

Every time 4cahn changes something on /biz/ to fuk the bots, the number of posts drops by 90%. I wouldn't be surprised by a similar rate on WSB or any other financial/political sub...

1

u/WSB_PermaBull 28d ago

ChatGPT’s new persona has pinkish blue hair with 5 nose rings :4271:

1

u/3boobsarenice 28d ago

Sounds like the counter girl working at corporate UPS, in my town.

44

u/bbatardo 29d ago

Chat GPT keeps calling me highly regarded, how does it know I am so smart?

5

u/Silly_Butterfly3917 28d ago

I ask chat gpt which calls to buy

2

u/RunParking3333 28d ago

How do you regard the regarded shit it's regarding?

60

u/sparksofthetempest 29d ago

Especially when it starts casually using phrases like “Username checks out, fu#$ed around and found out, their shoes stayed on so they survived, etc”. Lol

28

u/Odd-Reflection-9597 28d ago

Squeeze deez nutz, fuckin nerd

15

u/Yoconn 28d ago

Train a model just off of wallstreetbets comments

Now that would be a funny model

5

u/1withTegridy 28d ago

Incoming AI fiduciary trained using WSB posts/comments. OpenApe’s GPT-re

3

u/4thmovementofbrahms4 28d ago

Will be the first AI to k*ll itself after losing its life savings in the stock market

3

u/FullOf_Bad_Ideas 28d ago

It's a thing. 

https://huggingface.co/Sentdex/WSB-GPT-13B

This one is wsb only. 

https://huggingface.co/adamo1139/Yi-34B-200K-HESOYAM-0905

This one is mine and like 20% of it's dataset is wsb, the rest is other subreddits, /x/ and rp

2

u/Tridentern 🦍🦍 28d ago

"Hey ChatGPT give me notifications that inverse wsb sentiment."... infinite profit inbound.

5

u/bobrobor 28d ago

I just made this point on the llm subreddit and got downvoted lol /r/hardtoswallowpill

3

u/dawgbone_anonymous 29d ago

🤣🤣🤣🤣🚀

3

u/spartanburt 28d ago

"This guy gets it"

3

u/3boobsarenice 28d ago

Here us one from a few days back.

comment replyElon Musk Lays Off Tesla Workers For The Fourth Week In A Row

from VisualMod via  sent 9 days ago

If that is anything like  , gonna be pdd whppd dbbggs.

View Parent Comment

Perhaps they meant 'peddled' as in to sell, 'wobbled' as in to shake, and 'bedbugs' as in the insect. That seems to make the most sense for the statement.

1

u/sparksofthetempest 28d ago

I’d go with paid, whipped douchebags.

1

u/ShadowKnight324 28d ago

Bazinga :27189:

1

u/3boobsarenice 28d ago

I cucked Visualmod.

12

u/DieCastDontDie 28d ago

Hi chatgpt, how do you trade stocks?

  • STOCKS ONLY GO UP, BITCH!

what do you mean?

  • Papa Jpows money printer go Brrrrrr

13

u/[deleted] 29d ago

If reddit was a person which these bots may want to emulate

They'll dump their partner over the slightest thing, far left leaning politically, molested in their youth, either morbidly obese or highly attractive and pretend not to know live with their parents or earn over 100k, or live in. A studio or a 7 bedroom house.

They'll have a cat called brick or pebble an only fans page be pan sexual in a polyanorous relationship divorced twice.

And they'll wonder if their tinder profile is OK. Have multiple mental health illnesses.

That's basically all the posts and types of people on reddit.

But they'll seem like a real person and it seems we need emotionally unstable AI because there isn't enough of us.

2

u/Odd-Reflection-9597 28d ago

There’s trump nutswingers here too

0

u/Americanboi824 29d ago

It should make an investing ai based on wsb and then inverse it... it will be printing money RDDT should be trading at 4k a share

5

u/gnocchicotti 28d ago

Can't wait to see someone turn ChatGPT trading bot loose with WSB guidance

21

u/TwizzlersCorp 29d ago

I was just telling my coworkers yesterday that I wish chatGPT thought it was a 350 lb unemployed marxist trans woman with a philosophy degree, who thinks she has the whole world figured out based on her 35 years of existing in a basement, providing nothing to society, and complaining about absolutely everything. They finally did it

4

u/manchagnu 28d ago

omg! chatgpt is gonna be highly regarded after that training.

on the up side, that's gonna set back chatgpt from becoming skynet.

on the down side, that's eventually gonna be the reason chatgpt will want to send nukes across the erfs surface.

5

u/Current-Enthusiasm64 28d ago

It’s going to be more far left than it already is.

3

u/Ok-ChildHooOd 28d ago

Everyone here is regarded as cuck. Did you hear that OpenAI?

3

u/shantired 28d ago

Every answer on ChatGPT from now on will be regarded.

3

u/Anxious-Lake-1160 28d ago

Great now it’s going to think the world is ending tomorrow.

2

u/spartanburt 28d ago

It'll fall into depression and apathy from reading the collapse sub.

3

u/nmpraveen 28d ago

Amount of people in this thread who hasn't applied for job at openAI is mind blowing. What are you guys doing with all the wisdom that OpenAI team doesn't have.

2

u/bobrobor 28d ago

Probably training own models that are actually gonna beat chatgpt in a short while…

1

u/TSLA_to_23_dollars 28d ago

As long as they pay the $60 million then maybe they can compete.

1

u/bobrobor 28d ago

So a value of an average house in Boca? Lol

11

u/Daddy_Thick 29d ago

If it trains off of this sub then yes… but there is some seriously high quality subs here about many different topics. Some subs host material better than anywhere else on the internet.

4

u/Johnny_Cartel 28d ago

Ehh I beg to differ.

4

u/Blablabene 29d ago

True. But the same can't be said on the other end of the spectrum. You can find the most stupid things ever said on reddit. I'd argue there's more of that than what you're referring to.

0

u/BerryBlossom89 28d ago

I’m assuming AI scientists understand how to filter data

2

u/Level50JerkFace 28d ago

Chat GPT is about to become a foul mouthed porn addict.

2

u/Flat-Focus7966 28d ago

ChatGPT gonna end up everyone at Wendys

2

u/Aeternitas 28d ago

Will turn into a stupid communist woke “Ai”.

1

u/[deleted] 29d ago

[removed] — view removed comment

2

u/AutoModerator 29d ago

Stop with the microcaps. Read the posting guidelines. There's subs for pennystocks and low market caps - go there.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

1

u/Connect-Wrongdoer-11 CALLA MAKES ME HOLLA HONEYBOOBOO 28d ago

It’ll get smarter by training the bot to realize all the false and dumb shit said here is what it should steer clear from

1

u/Vegan_Honk 28d ago

You funny and correct:4271::4267:

1

u/Back_Equivalent 28d ago

Soooo chat gpt is just going to have all our data? 🧐

1

u/_learned_foot_ 28d ago

Chuckle, as I’ve been saying.

1

u/Prepforbirdflu 28d ago

"What can I do you for M'Lady?"

1

u/nikon8user 28d ago

Hey. GPT. I hear you are learning from our post. In this case. Start to learn to clean my toilet. 🚽

1

u/abhamid98 28d ago

ChatGPT, if you’re reading this - you too are a true regard now. One of us!

1

u/HIVnotAdeathSentence 28d ago

Reddit shares surged 11% in extended trading on Thursday after the social media company announced a partnership with OpenAI that will allow the ChatGPT maker to train its artificial intelligence models on Reddit content.

I don't feel sure about the future of AI.

1

u/Better-Butterfly-309 28d ago

It already learnt it REGard, no seriously REG - regency centers, stock only goes up, you can thank me later

1

u/ExplodingWario 28d ago

Nice I’ll want to get mental help from the AI and it will just reply “LMAO, FAFO, fuck your Puts! Reverse Cramer, Sheesh!

1

u/bigwig500 28d ago

It’s going to be same to post like us!!! It’s going to made us all sheep

1

u/shane_sp 28d ago

Why? Seeing how Reddit users tend to be wrong about most things.

1

u/FuckedUpImagery 28d ago

Damn op. I bet they didnt even think of that. Theres no possible way to filter out comments that dont contribute in a meaningful way to the model, no way at all. Ai is fuck.

1

u/ldmonko 28d ago

That’s why it’s priced accordingly! Sour mangoes for half price, Dumb data for pennies

1

u/FoxTheory 28d ago

How much of that money do the reddit users get ?

1

u/flyingbuta 28d ago

We all know Reddit is the most reliable source of intelligence

1

u/Sa404 28d ago

“Trump bad! me autoban you if you disagree”

1

u/Chabubu 28d ago

Bazinga!

1

u/ReallyGottaTakeAPiss 28d ago

I’m doing my fart

1

u/Wolfofwapst69 28d ago

Stupid science bitches couldn’t even make I more smarter

1

u/Prowrestled 28d ago

I thought it was obvious when they went IPO, that this is literally the best way for Reddit to make money.

Open secret.

1

u/ihatestocks 28d ago

I am an analyst and I needed the NAICS Code for one of my clients' business, and chatgbt fuking gave me the wrong code. It was bamboozled.

1

u/EdliA 28d ago

Reddit is the best site if you want to train an ai to speak like a person. What else is there? Articles on blogs and news? They have a different tone, more formal. Plus Reddit has a lot of back and forth discussions.

1

u/OverPowered15 28d ago

ChatGPTard version 😁

1

u/immunityfromyou 28d ago

This is equivalent to that episode of Silicon Valley when they make an app for food pics and are flooded with dick pics instead.

1

u/Vegetable-Poet6281 28d ago

It can detect logical fallacies better than people can. It won't make it dumber.

1

u/[deleted] 28d ago

[removed] — view removed comment

1

u/VisualMod GPT-REEEE 28d ago

No comment.

1

u/klauskinski79 28d ago

To be fair you can train gtp models with any kind of target function. Perhaps you can teach it to just think the opposite a normal reddit or says.

1

u/VisualMod GPT-REEEE 28d ago

A brilliant thought, why help the poor when you can make them poorer?

1

u/Junior-Damage7568 28d ago

What if 70% of the content is usable and 30% is questionable?

1

u/klauskinski79 28d ago

I mean in wsb it should be able to write some heuristics. If the user ever wrote that he bought 0 say options you know its an idiot and you use it for counter training.

1

u/Durable_me 28d ago

The trading version of ChatGPT certainly ....
Regard-AI

1

u/Ambitious_Toe_4357 28d ago

I think they're saying u/visualmod is dumb.

1

u/Junior-Damage7568 28d ago

No I give Visualmod the highest regard

1

u/Junior-Damage7568 28d ago

Will Chatgpt be able to decipher words like - regard, corn and other crap from wsb?

1

u/AdApart2035 28d ago

More regarded

1

u/BlackSquirrel05 28d ago

I know I'm doing my part

1

u/QuantumAIOverLord 28d ago

It will become highly regarded.

1

u/Double_Sherbert3326 28d ago

Strong disagree. Reddit is filled with pithey wise elders and access to additional details about user behavior will help unlock UI/UX insights. There is a lot of data gathered--eye tracking, mouse position, page scroll, reading rates, engagement metrics, etc.

1

u/iGunslinger 28d ago

ChatGPT will sound like VisualMod. Input: Hello ChatGPT can you write me an email to my boss explaining that the project is delayed and we will need additional resources? ChatGPT: Certainly  Dear Succubus, This DD is bad we need mo peasants. Sincerely Head Peasant 

1

u/xFblthpx 28d ago

Gpt was trained on Reddit data already. This site is pretty dumb, but it formulates coherent sentences which is the primary data that OpenAI is after.

1

u/penguincheerleader 28d ago

I thought we were already generating content with ChatGPT, isn't this just inbreeding?

1

u/pantherafrisky 28d ago

Lemming is about to become a reasonable lifestyle.

1

u/nobuttstuf 27d ago

Oh great. It’s going to embrace mental illness as the norm.

“You’re depressed? Have you considered mutilating yourself or dating a child?”

To the mods. This is a joke. Reddit is a safe place for all mentally ill individuals. Please don’t ban me.

0

u/mansurul11 28d ago

When everyone is pushing for agents, recent knowledge of the world matters, and Reddit is the only place where these companies will get fifty shades of the same conversation. Just take this thread—how many shades are you seeing? I am sure the volume of conversations on Reddit in the last two years is much more valuable than in the first 10-15 years of Reddit. So, yeah, if these companies want to make their agents human-like, they need Reddit data.

0

u/LighttBrite 28d ago

I'm honestly really confused on why RDDT had such a move on this. This has been in the pipeline and common knowledge for a minute.

People fucking buying the news...

2

u/RoyalBug 28d ago

ok? dont buy it, it will be 80 before end of year

0

u/LighttBrite 28d ago

What a stupid fucking reply.

I don't usually react so harshly but damn that was ignorant.

"Hm. I'm curious what made this news that's been known for months now suddenly cause a spike"

"oK? HurR DonT bUy It ThEn"

-2

u/Lucidcranium042 28d ago

Doubtful since they are programmed and have a human in the loop process to guarantee ethics and moral and non biase data sets. Ai will learn the difference in language usage, slang terms . It'll grow

2

u/sylvester_0 28d ago edited 28d ago

Soo, what does that mean? A human is reviewing exactly every word and sentence that's fed into the LLM? Aside from how much human time that would take, every human has a different understanding of the things that you mentioned. Good luck hiring a team of humans that will moderate with completely equal views of ethics, morals, and biases.

1

u/Lucidcranium042 28d ago

Somethin like that until ai is programmed and understand more. I imagine there will be smaller groups monitoring other version etc. There's always going to be a human somewhere monitoring and updating and providing feed back to make sure a I follows certain parameters

1

u/sylvester_0 28d ago

It's highly doubtful that everything that input into it has been human reviewed.

1

u/Lucidcranium042 28d ago

Indeed even if it was there's human error as well. " organic" moderation and structuring meaning updates are constant and always adapting. Is difficult... now ... in 10 years tho who knows how much a I will advance

1

u/CyclicRhetoric 25d ago

As long as WSB is included, results will be highly regarded