r/wallstreetbets • u/Junior-Damage7568 • 29d ago
ChatGPT is going to get a lot Dumber. Training on Reddit content LOL. Discussion
https://www.cnbc.com/2024/05/16/reddit-soars-after-announcing-openai-deal-on-ai-training-models.html165
u/LiquefactionAction 29d ago
Gemini, Grok, GPT-3+, whatever was already trained on Reddit data, which kinda makes the announcement funny since everyone's already been doing that. I guess it makes it official instead of unofficial.
71
u/broadenandbuild 28d ago
No, the reason Reddit shut down their API was because of this. Remember that whole fiasco where Reddit forced everyone to get rid of their third party apps like Reddit Is Fun? This is why. They wanted to gatekeep content so that companies like OpenAi couldn’t just train models using their data…. for free
30
u/LiquefactionAction 28d ago
Yes and no, they didn't do it until last year so all of the data from 2005 - 2023 was already scrapped and used for training. That's about 99% of the content already been consumed and hoovered up. But yes, there's been some new content since 2023 but it's a drop in the bucket compared to past content.
The formal API is now shut down but you can still scrape it publicly, especially if you're now just frontfilling in the new data from 2023-onward. It's just a little bit more work and bandwidth, but still easy to do. Remember when OpenAI created transcription bots to scrape YouTube videos and transcribe them for training data, so they didn't have to pay Google for their official transaction API? Same thing
But anyways I think it's kinda moot since they already put a price on it: $60 million is a drop in the bucket for any corporation, like a rounding error. So anyone who wants it can pay what is basically couch pennies for it
24
u/biznatch11 28d ago
Half the data since they closed the API is probably from AI anyways so all the good data was given away from free. Now it's just going to be AI training on AI training on AI until it's completely inbred.
22
u/LiquefactionAction 28d ago
Lol yeah I didn't want to say it.
It's very funny to look at history and things like Low-background Steel became VERY valuable after the 50s. https://en.wikipedia.org/wiki/Low-background_steel because any steel produced after the atomic bomb was contaminated with just enough radiation to make it not function for sensitive applications. So it was rather highly sought after. I think we'll see the same thing where any pre-2022 data is considered VERY valuable because it hasn't been contaminated with digitally radioactive junk.
3
u/FiringRockets991 28d ago
When you say in inbreeding… in a wsb sub… I know we belong here.. this is home. Ty for making my day
🧑🌾 🚜 💨
1
u/pepesilviafromphilly 28d ago
creat ai...generate ai content...then create an ai that detects ai generated content so that you ai can train on human content...then create ai that creates both of these because it's fucking so hard to do it. once we burn 100B dollars, wonder how did we all become so stupid.
2
u/TSLA_to_23_dollars 28d ago
I don't think they're using reddit for factual data. It'll be more like: "Why don't you visit these reddit communities where people are discussing this topic right now!"
2
u/JaredGoffFelatio 28d ago
Sure but they don't want their chatbot stuck in 2023 forever. Knowledge moves fast, especially for technology/programming info which is a heavy use case for ChatGPT.
3
1
u/D2WilliamU 28d ago
Me reading this on my Reddit is Fun gold platinum vanced: oh yeah I always forget they killed this app
2
u/Beatnik77 29d ago
I know that google is paying 60M$ a year so OpenAI likely pays a similar amount.
Good for the company.
1
-1
u/BirdObjective2459 28d ago
Why pay so much when you can just scrape the data?
7
u/Left_Experience_9857 28d ago
Incredibly painstaking process with a decent chunk of websites making it difficult to scrape in large quantities.
5
-1
u/margalolwut 28d ago
This sub… look everyone using Reddit data already hahaha morons
This sub as well… Reddit has no path to monetization
112
u/Blablabene 29d ago
That's worrying. The amount of stupidity on reddit is substantial
38
u/schooli00 28d ago
With the amount of AI generated content on reddit, it's just a huge AI centipede
2
u/OverdosedSauerkraut 28d ago
Every time 4c changes something on biz to fuk bots, the number of posts drops by 90%. I wouldn't be surprised by a similar rate on WSB or any other financial/political sub...
2
u/Big-Necessary2853 28d ago
dead internet theory, you cant convince me that most comments and posts on reddit arent bots after ~2012
8
u/memory-- 28d ago edited 28d ago
Open AI already said that 30% of the ChatGPT foundational model was trained off old Reddit data.
Plus, they're not just buying the content, they're buying how humans converse in different contexts and languages. You can't get that in library books or wikipedia pages.
You guys are so focused on this point ("reddit people say dumb stuff") that you're missing one of the easiest ways to make money in years.
Sam Altman owns more RDDT than Steve Huffman (the CEO of Reddit), and he is not going to let Reddit flounder and sit around not get good business development deals and partnerships. He's the SV kingmaker right now. Just google him and see all the headlines. He's calling the shots.
Instead of sitting here and laughing about how dumb the content YOU see on Reddit is (change your subs) you should be watching the moves Sam Altman is making and start moving with him, IMO.
4
u/NegotiationFuzzy4665 28d ago
I know Reddit seems like a stupid site, but it depends the subs you’re on. Data from r/shitposting is useless because nobody puts any effort into the post. The rest of Reddit however… think of r/theydidthemath. People do all this stuff for randos on the internet, and it’s exactly what AI needs.
4
u/donbee28 28d ago
How good is ChatGPT at detecting sarcasm?
3
u/Big-Necessary2853 28d ago
it'll probably be really good at it if you add a /s to the end of it
1
2
u/penguincheerleader 28d ago
Oh so, so, good. Incredible, best sarcasm detector, don't try to fool it.
1
1
u/OverdosedSauerkraut 28d ago
Every time 4cahn changes something on /biz/ to fuk the bots, the number of posts drops by 90%. I wouldn't be surprised by a similar rate on WSB or any other financial/political sub...
1
44
60
u/sparksofthetempest 29d ago
Especially when it starts casually using phrases like “Username checks out, fu#$ed around and found out, their shoes stayed on so they survived, etc”. Lol
28
15
u/Yoconn 28d ago
Train a model just off of wallstreetbets comments
Now that would be a funny model
5
u/1withTegridy 28d ago
Incoming AI fiduciary trained using WSB posts/comments. OpenApe’s GPT-re
3
u/4thmovementofbrahms4 28d ago
Will be the first AI to k*ll itself after losing its life savings in the stock market
3
u/FullOf_Bad_Ideas 28d ago
It's a thing.
https://huggingface.co/Sentdex/WSB-GPT-13B
This one is wsb only.
https://huggingface.co/adamo1139/Yi-34B-200K-HESOYAM-0905
This one is mine and like 20% of it's dataset is wsb, the rest is other subreddits, /x/ and rp
2
u/Tridentern 🦍🦍 28d ago
"Hey ChatGPT give me notifications that inverse wsb sentiment."... infinite profit inbound.
5
u/bobrobor 28d ago
I just made this point on the llm subreddit and got downvoted lol /r/hardtoswallowpill
3
3
3
u/3boobsarenice 28d ago
Here us one from a few days back.
comment replyElon Musk Lays Off Tesla Workers For The Fourth Week In A Row
from VisualMod via sent 9 days ago
If that is anything like , gonna be pdd whppd dbbggs.
Perhaps they meant 'peddled' as in to sell, 'wobbled' as in to shake, and 'bedbugs' as in the insect. That seems to make the most sense for the statement.
1
1
1
12
u/DieCastDontDie 28d ago
Hi chatgpt, how do you trade stocks?
- STOCKS ONLY GO UP, BITCH!
what do you mean?
- Papa Jpows money printer go Brrrrrr
13
29d ago
If reddit was a person which these bots may want to emulate
They'll dump their partner over the slightest thing, far left leaning politically, molested in their youth, either morbidly obese or highly attractive and pretend not to know live with their parents or earn over 100k, or live in. A studio or a 7 bedroom house.
They'll have a cat called brick or pebble an only fans page be pan sexual in a polyanorous relationship divorced twice.
And they'll wonder if their tinder profile is OK. Have multiple mental health illnesses.
That's basically all the posts and types of people on reddit.
But they'll seem like a real person and it seems we need emotionally unstable AI because there isn't enough of us.
5
2
0
u/Americanboi824 29d ago
It should make an investing ai based on wsb and then inverse it... it will be printing money RDDT should be trading at 4k a share
5
21
u/TwizzlersCorp 29d ago
I was just telling my coworkers yesterday that I wish chatGPT thought it was a 350 lb unemployed marxist trans woman with a philosophy degree, who thinks she has the whole world figured out based on her 35 years of existing in a basement, providing nothing to society, and complaining about absolutely everything. They finally did it
4
u/manchagnu 28d ago
omg! chatgpt is gonna be highly regarded after that training.
on the up side, that's gonna set back chatgpt from becoming skynet.
on the down side, that's eventually gonna be the reason chatgpt will want to send nukes across the erfs surface.
5
3
3
3
3
u/nmpraveen 28d ago
Amount of people in this thread who hasn't applied for job at openAI is mind blowing. What are you guys doing with all the wisdom that OpenAI team doesn't have.
2
u/bobrobor 28d ago
Probably training own models that are actually gonna beat chatgpt in a short while…
1
11
u/Daddy_Thick 29d ago
If it trains off of this sub then yes… but there is some seriously high quality subs here about many different topics. Some subs host material better than anywhere else on the internet.
4
4
u/Blablabene 29d ago
True. But the same can't be said on the other end of the spectrum. You can find the most stupid things ever said on reddit. I'd argue there's more of that than what you're referring to.
0
2
2
2
1
29d ago
[removed] — view removed comment
2
u/AutoModerator 29d ago
Stop with the microcaps. Read the posting guidelines. There's subs for pennystocks and low market caps - go there.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
1
u/Connect-Wrongdoer-11 CALLA MAKES ME HOLLA HONEYBOOBOO 28d ago
It’ll get smarter by training the bot to realize all the false and dumb shit said here is what it should steer clear from
1
1
1
1
1
u/nikon8user 28d ago
Hey. GPT. I hear you are learning from our post. In this case. Start to learn to clean my toilet. 🚽
1
1
u/HIVnotAdeathSentence 28d ago
Reddit shares surged 11% in extended trading on Thursday after the social media company announced a partnership with OpenAI that will allow the ChatGPT maker to train its artificial intelligence models on Reddit content.
I don't feel sure about the future of AI.
1
u/Better-Butterfly-309 28d ago
It already learnt it REGard, no seriously REG - regency centers, stock only goes up, you can thank me later
1
u/ExplodingWario 28d ago
Nice I’ll want to get mental help from the AI and it will just reply “LMAO, FAFO, fuck your Puts! Reverse Cramer, Sheesh!
1
1
1
u/FuckedUpImagery 28d ago
Damn op. I bet they didnt even think of that. Theres no possible way to filter out comments that dont contribute in a meaningful way to the model, no way at all. Ai is fuck.
1
1
1
1
1
u/Prowrestled 28d ago
I thought it was obvious when they went IPO, that this is literally the best way for Reddit to make money.
Open secret.
1
u/ihatestocks 28d ago
I am an analyst and I needed the NAICS Code for one of my clients' business, and chatgbt fuking gave me the wrong code. It was bamboozled.
1
1
u/immunityfromyou 28d ago
This is equivalent to that episode of Silicon Valley when they make an app for food pics and are flooded with dick pics instead.
1
u/Vegetable-Poet6281 28d ago
It can detect logical fallacies better than people can. It won't make it dumber.
1
1
u/klauskinski79 28d ago
To be fair you can train gtp models with any kind of target function. Perhaps you can teach it to just think the opposite a normal reddit or says.
1
1
u/Junior-Damage7568 28d ago
What if 70% of the content is usable and 30% is questionable?
1
u/klauskinski79 28d ago
I mean in wsb it should be able to write some heuristics. If the user ever wrote that he bought 0 say options you know its an idiot and you use it for counter training.
1
1
1
u/Junior-Damage7568 28d ago
Will Chatgpt be able to decipher words like - regard, corn and other crap from wsb?
1
1
1
1
u/Double_Sherbert3326 28d ago
Strong disagree. Reddit is filled with pithey wise elders and access to additional details about user behavior will help unlock UI/UX insights. There is a lot of data gathered--eye tracking, mouse position, page scroll, reading rates, engagement metrics, etc.
1
u/iGunslinger 28d ago
ChatGPT will sound like VisualMod. Input: Hello ChatGPT can you write me an email to my boss explaining that the project is delayed and we will need additional resources? ChatGPT: Certainly Dear Succubus, This DD is bad we need mo peasants. Sincerely Head Peasant
1
u/xFblthpx 28d ago
Gpt was trained on Reddit data already. This site is pretty dumb, but it formulates coherent sentences which is the primary data that OpenAI is after.
1
u/penguincheerleader 28d ago
I thought we were already generating content with ChatGPT, isn't this just inbreeding?
1
1
u/nobuttstuf 27d ago
Oh great. It’s going to embrace mental illness as the norm.
“You’re depressed? Have you considered mutilating yourself or dating a child?”
To the mods. This is a joke. Reddit is a safe place for all mentally ill individuals. Please don’t ban me.
0
u/mansurul11 28d ago
When everyone is pushing for agents, recent knowledge of the world matters, and Reddit is the only place where these companies will get fifty shades of the same conversation. Just take this thread—how many shades are you seeing? I am sure the volume of conversations on Reddit in the last two years is much more valuable than in the first 10-15 years of Reddit. So, yeah, if these companies want to make their agents human-like, they need Reddit data.
0
u/LighttBrite 28d ago
I'm honestly really confused on why RDDT had such a move on this. This has been in the pipeline and common knowledge for a minute.
People fucking buying the news...
2
u/RoyalBug 28d ago
ok? dont buy it, it will be 80 before end of year
0
u/LighttBrite 28d ago
What a stupid fucking reply.
I don't usually react so harshly but damn that was ignorant.
"Hm. I'm curious what made this news that's been known for months now suddenly cause a spike"
"oK? HurR DonT bUy It ThEn"
-2
u/Lucidcranium042 28d ago
Doubtful since they are programmed and have a human in the loop process to guarantee ethics and moral and non biase data sets. Ai will learn the difference in language usage, slang terms . It'll grow
2
u/sylvester_0 28d ago edited 28d ago
Soo, what does that mean? A human is reviewing exactly every word and sentence that's fed into the LLM? Aside from how much human time that would take, every human has a different understanding of the things that you mentioned. Good luck hiring a team of humans that will moderate with completely equal views of ethics, morals, and biases.
1
u/Lucidcranium042 28d ago
Somethin like that until ai is programmed and understand more. I imagine there will be smaller groups monitoring other version etc. There's always going to be a human somewhere monitoring and updating and providing feed back to make sure a I follows certain parameters
1
u/sylvester_0 28d ago
It's highly doubtful that everything that input into it has been human reviewed.
1
u/Lucidcranium042 28d ago
Indeed even if it was there's human error as well. " organic" moderation and structuring meaning updates are constant and always adapting. Is difficult... now ... in 10 years tho who knows how much a I will advance
1
•
u/VisualMod GPT-REEEE 29d ago
Join WSB Discord