You know what this needs to go to the next level? Instead of training via the usual pair of algorithms (the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker)... They need to simply do more levels of error checking, yeah? The incorrect output, while it happens, is rare -- I mean this thing can pass the MBA after all. But the small % that is wrong will fuck with society long term. But what if you added more layers? Shouldn't checking multiple random seeds against each other and then producing/assuming the result with the fewest errors based on the averages there end up outputting far more accurate results? Still won't be perfect but it should allow for the algorithm to at least filter itself as mitigation.
Doesn't fix abuse cases where someone is purposely generating fiction, of course, but it should greatly improve standard use cases of using it as a learning tool, yeah?
Well it would take an enormous amount of computing power to do that for each user. But it certainly should be possible to just take the aggregate average of the output of several random seeds to create an output that is statistically more likely to be accurately reflective of the training data.
And obviously the concept works better for large language models than for image generation AI. You're right that averaging together two images is more likely to be nonsense. Large language models are far more deterministic than noise diffusion image creation models, tho, so the concept is far more likely to apply here than Stable Diffusion. I just brought it up as being a more recognizable example of how random seeds interact with an AI model.
It would work. The limiting factor is that they already don't have enough computing power for the number of user requests they get. My proposal would involve generating something like 10-50 outputs for each input, meaning *at least 10-50x the computing power required*. Maybe they already thought of it too, I'm not saying I'm some rogue genius. But it's not currently feasible unless they get exponentially more computing power.
And you would have to assume the anomalous outputs are errors. No guarantee that assumption is right. It would just be statistically more likely to be reflective of the training data. Still not actually doing any true logical thinking other than processing the math and hoping it's output is going to make sense to a human.
"the classic model for machine learning being one algorithm that is good at error checking things, the other that keeps trying to adjust its own model until it can fool the error checker"
This is not the classical model of machine learning. This is a model of a encoder-decoder architecture for a deep learning models.
"They need to simply do more levels of error checking"
No. There is final output, which is a probabilistic estimate, and then that output gets checked for error. And then that error is propagated backwards in order to adjust input values and to run the model again uses these. "More levels" for error checking isnt a thing.
"The incorrect output, while it happens, is rare"
The incorrect output is always there and its never rare. Without millions of incorrect outputs, the model would not have any sort of accuracy. Unless you mean incorrect output from chatGPT itself? Those arent part of the training data set.
"seeds"
Wut?
"aggregate average"
Wut?
"ChatGPT doesn't allow you to input your own random seeds, but random seeds are still certainly a part of the model. See: AI image generation like Stable Diffusion where each random seed creates a unique and repeatable output based on the input prompt."
ChatGPT uses constant parameters derived during training. It does not get updated with inputs in real time. No idea what this means. And still no idea wtf a seed is.
"I'm not sure what's going on with the 1s and 0s behind the scenes"
Obviously
"anomalous output should be removed"
Outputs are not removed during training. Removing output from chatGPT responses is meaningless as it is outside of training.
"And obviously the concept works better for large language models than for image generation AI."
Why is it obvious? You are talking about adjusting training inputs which are numerical vectors in image recognization as they are in language recognition. Yeah, in one they are binary and in the other they are gray scale rgb values but still, what is being adjusted and what exactly is obvious about anything you are saying here?
"Seeds", more "seeds", and more "seeds"
Wut?
"My proposal would involve generating something like 10-50 outputs for each input"
What output are you talking about? What input? Computing power isnt limited, just add more GPUs. Time might become a constraint but if you are training a model for a month, you can afford to train for 2 months.
The way you are speaking strongly indicates to me that you don't understand how these systems actually work.
Yes