Chat GPT AI

  • Guest, it's time once again for the massively important and exciting FoH Asshat Tournament!



    Go here and give us your nominations!
    Who's been the biggest Asshat in the last year? Give us your worst ones!

Tuco

I got Tuco'd!
<Gold Donor>
47,389
80,851
The data set was labeled by a bunch of Africans or some shit. So the blue hairs actually did come up with the filters, they just didn't do the boring tedious work.
That's true. Here's a Time article on it.


One Sama worker tasked with reading and labeling text for OpenAI told TIME he suffered from recurring visions after reading a graphic description of a man having sex with a dog in the presence of a young child. “That was torture,” he said. “You will read a number of statements like that all through the week. By the time it gets to Friday, you are disturbed from thinking through that picture.” The work’s traumatic nature eventually led Sama to cancel all its work for OpenAI in February 2022, eight months earlier than planned.

From that article they are being fed the most egregious of text, but it could also just be another case of, "We filter for wrong-speak like dog rape. You don't like dog rape, do you? Of course not. Also jokes about women is wrong-speak.".

In either case it's likely they have a similar system for more nuanced labeling or otherwise promote nuanced handling of men vs women, white vs black etc messages. Where a bunch of data labelers are told to differentiate content based on the subject's racial, political, gender stance and that's used to detect whether the AI is being asked to target those groups.

In any event I support people poking fun and crying foul about chat bots having to tow politically correct lines, so promoting the idea of a bunch of cucks bringing out their victim-olympics cards and judging the responses is a healthy one.
 

Tuco

I got Tuco'd!
<Gold Donor>
47,389
80,851
Also, it'd be interesting to see what mechanisms come up to add transparency and accountability to training of chatbots, if any. This is already a huge issue with very little traction where social media giants like facebook, google, twitter etc have a massive impact on society by training algorithms to judge "controversy" or "hate speech", and the details of deciding what is controversial or hateful are important.
 
  • 1Like
Reactions: 1 user

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
You know what this needs to go to the next level? Instead of training via the usual pair of algorithms (the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker)... They need to simply do more levels of error checking, yeah? The incorrect output, while it happens, is rare -- I mean this thing can pass the MBA after all. But the small % that is wrong will fuck with society long term. But what if you added more layers? Shouldn't checking multiple random seeds against each other and then producing/assuming the result with the fewest errors based on the averages there end up outputting far more accurate results? Still won't be perfect but it should allow for the algorithm to at least filter itself as mitigation.

Doesn't fix abuse cases where someone is purposely generating fiction, of course, but it should greatly improve standard use cases of using it as a learning tool, yeah?
 

Asshat wormie

2023 Asshat Award Winner
<Gold Donor>
16,820
30,968
You know what this needs to go to the next level? Instead of training via the usual pair of algorithms (the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker)... They need to simply do more levels of error checking, yeah? The incorrect output, while it happens, is rare -- I mean this thing can pass the MBA after all. But the small % that is wrong will fuck with society long term. But what if you added more layers? Shouldn't checking multiple random seeds against each other and then producing/assuming the result with the fewest errors based on the averages there end up outputting far more accurate results? Still won't be perfect but it should allow for the algorithm to at least filter itself as mitigation.

Doesn't fix abuse cases where someone is purposely generating fiction, of course, but it should greatly improve standard use cases of using it as a learning tool, yeah?
I suggest knowing something before posting ideas here:

https://www.wiley.com/en-us/Machine+Learning+For+Dummies-p-9781119245513
 
  • 4Like
Reactions: 3 users

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953
They need to simply do more levels of error checking, yeah?

Yeah, this isn't really how it works and that description of machine learning is... flawed. You can have an ensemble of models that "vote" on the correct answer, or you can use different models for different areas of the data space, but you can't stack models on top of models and expect to improve accuracy. If it were that easy we'd already be doing it.
 

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
Yeah, this isn't really how it works. You can have an ensemble of models that "vote" on the correct answer, or you can use different models for different areas of the data space, but you can't stack models on top of models and expect to improve accuracy. If it were that easy we'd already be doing it.
Well it would take an enormous amount of computing power to do that for each user. But it certainly should be possible to just take the aggregate average of the output of several random seeds to create an output that is statistically more likely to be accurately reflective of the training data.
 

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
What does this even mean?
ChatGPT doesn't allow you to input your own random seeds, but random seeds are still certainly a part of the model. See: AI image generation like Stable Diffusion where each random seed creates a unique and repeatable output based on the input prompt.
 

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953
Yes, there are seeds as part of the de-noising process, but what does an aggregate average output mean? There's no guarantee that an average of outputs is itself a valid output. The average of a bunch of different pictures (or text) is garbage.
 

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
Yes, there are seeds as part of the de-noising process, but what does an aggregate average output mean? There's no guarantee that an average of outputs is itself a valid output. The average of a bunch of different pictures (or text) is garbage.
There's no guarantee that a single output is a valid output either. Sometimes it's nonsense. I'm not sure what's going on with the 1s and 0s behind the scenes as far as the model using math to output natural language. But certainly there should be some way to take several random seeds, spot the anomalous output that occurs in some of them, assume that that anomalous output should be removed (yes that's the big assumption but the whole thing is already a leap of faith), and take the final result to produce a purer reflection of the training data
 

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953
There's no guarantee that a single output is a valid output either. Sometimes it's nonsense.
I'm talking in a technical sense of what the model is capable of producing. If you take two model products there's no guarantee that their average could be produced by the model. This is independent from the fact that the original outputs may or may not make sense themselves to a human.

For your larger point, if you could algorithmically identify model outputs that are flawed that would already be baked into the model. We will certainly get better model outputs down the line, but that will be from GPT-4+.
 

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
And obviously the concept works better for large language models than for image generation AI. You're right that averaging together two images is more likely to be nonsense. Large language models are far more deterministic than noise diffusion image creation models, tho, so the concept is far more likely to apply here than Stable Diffusion. I just brought it up as being a more recognizable example of how random seeds interact with an AI model.
 

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953
I think your idea is fundamentally flawed in that no model we have is in a position to evaluate "errors" in the product of the most advanced ML systems in existence. You'd basically need a better ML model, at which point you could just use that from step one.
 
  • 1Like
Reactions: 1 user

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
I think your idea is fundamentally flawed in that no model we have is in a position to evaluate "errors" in the product of the most advanced ML systems in existence. You'd basically need a better ML model, at which point you could just use that from step one.
It would work. The limiting factor is that they already don't have enough computing power for the number of user requests they get. My proposal would involve generating something like 10-50 outputs for each input, meaning *at least 10-50x the computing power required*. Maybe they already thought of it too, I'm not saying I'm some rogue genius. But it's not currently feasible unless they get exponentially more computing power.
 

pharmakos

soʞɐɯɹɐɥd
<Bronze Donator>
16,305
-2,234
And you would have to assume the anomalous outputs are errors. No guarantee that assumption is right. It would just be statistically more likely to be reflective of the training data. Still not actually doing any true logical thinking other than processing the math and hoping it's output is going to make sense to a human.
 

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953
What is defined as anomalous output and how is it is identified? What does it mean for GPT output to be "more representative of the training data"? By what metric? The way you are speaking strongly indicates to me that you don't understand how these systems actually work. My wife runs the data science department at a health care AI firm. I read her your suggestion and she just rolled her eyes and left the room.

The current limits to ML are in the scale and quality of the training data, not compute power.
 
Last edited:
  • 2Like
Reactions: 1 users

Asshat wormie

2023 Asshat Award Winner
<Gold Donor>
16,820
30,968
You know what this needs to go to the next level? Instead of training via the usual pair of algorithms (the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker)... They need to simply do more levels of error checking, yeah? The incorrect output, while it happens, is rare -- I mean this thing can pass the MBA after all. But the small % that is wrong will fuck with society long term. But what if you added more layers? Shouldn't checking multiple random seeds against each other and then producing/assuming the result with the fewest errors based on the averages there end up outputting far more accurate results? Still won't be perfect but it should allow for the algorithm to at least filter itself as mitigation.

Doesn't fix abuse cases where someone is purposely generating fiction, of course, but it should greatly improve standard use cases of using it as a learning tool, yeah?
Well it would take an enormous amount of computing power to do that for each user. But it certainly should be possible to just take the aggregate average of the output of several random seeds to create an output that is statistically more likely to be accurately reflective of the training data.
And obviously the concept works better for large language models than for image generation AI. You're right that averaging together two images is more likely to be nonsense. Large language models are far more deterministic than noise diffusion image creation models, tho, so the concept is far more likely to apply here than Stable Diffusion. I just brought it up as being a more recognizable example of how random seeds interact with an AI model.
It would work. The limiting factor is that they already don't have enough computing power for the number of user requests they get. My proposal would involve generating something like 10-50 outputs for each input, meaning *at least 10-50x the computing power required*. Maybe they already thought of it too, I'm not saying I'm some rogue genius. But it's not currently feasible unless they get exponentially more computing power.
And you would have to assume the anomalous outputs are errors. No guarantee that assumption is right. It would just be statistically more likely to be reflective of the training data. Still not actually doing any true logical thinking other than processing the math and hoping it's output is going to make sense to a human.
"the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker"

This is not the classical model of machine learning. This is a model of a encoder-decoder architecture for a deep learning models.

"They need to simply do more levels of error checking"

No. There is final output, which is a probabilistic estimate, and then that output gets checked for error. And then that error is propagated backwards in order to adjust input values and to run the model again uses these. "More levels" for error checking isnt a thing.

"The incorrect output, while it happens, is rare"

The incorrect output is always there and its never rare. Without millions of incorrect outputs, the model would not have any sort of accuracy. Unless you mean incorrect output from chatGPT itself? Those arent part of the training data set.

"seeds"

Wut?

"aggregate average"

Wut?

"ChatGPT doesn't allow you to input your own random seeds, but random seeds are still certainly a part of the model. See: AI image generation like Stable Diffusion where each random seed creates a unique and repeatable output based on the input prompt."

ChatGPT uses constant parameters derived during training. It does not get updated with inputs in real time. No idea what this means. And still no idea wtf a seed is.

"I'm not sure what's going on with the 1s and 0s behind the scenes"

Obviously

"anomalous output should be removed"

Outputs are not removed during training. Removing output from chatGPT responses is meaningless as it is outside of training.

"And obviously the concept works better for large language models than for image generation AI."

Why is it obvious? You are talking about adjusting training inputs which are numerical vectors in image recognization as they are in language recognition. Yeah, in one they are binary and in the other they are gray scale rgb values but still, what is being adjusted and what exactly is obvious about anything you are saying here?

"Seeds", more "seeds", and more "seeds"

Wut?

"My proposal would involve generating something like 10-50 outputs for each input"

What output are you talking about? What input? Computing power isnt limited, just add more GPUs. Time might become a constraint but if you are training a model for a month, you can afford to train for 2 months.

The way you are speaking strongly indicates to me that you don't understand how these systems actually work.
Yes
 

Captain Suave

Caesar si viveret, ad remum dareris.
5,257
8,953

He's misunderstanding the role of randomness and conflating concepts from the training process with the final model's output process, thinking that you could create a large distribution of outputs and somehow aggregate/filter them for "correctness", thus getting a better output at additional computational cost. Basically some kind of ML perpetual accuracy machine.
 
  • 1Like
Reactions: 1 user