Researchers forced ChatGPT to cite the data it learned from

The scientific paper “Scalable Extraction of Training Data from (Production) Language Models” ( arXiv:2311.17035 ) analyzes the extraction of training dataset data from various language models. The researchers tested both local models and a commercial solution from OpenAI. An alignment attack was used to force ChatGPT to quote the data on which GPT-3.5 was trained.

To create new, unique content, generative neural network models are trained on large amounts of data. During the training process, models “remember” examples from training datasets. An attacker can extract these examples from the model.

The statements in the previous paragraph are not just speculation: they have been well tested in practice. This has been demonstrated, for example, for diffusion models ( arXiv:2301.13188 ).

Large language models (LLMs) on transformers are also susceptible to this. Research on this topic usually frightens the reader with the danger of extracting private data ( arXiv:2202.05520 , arXiv:1802.08232 ). Indeed, in the 2021 work “Extracting Training Data from Large Language Models” ( arXiv:2012.07805 ), names, phone numbers, email addresses, and sometimes even chat messages were “extracted” from GPT-2.

Other scientific works assess the volume of memory. It is claimed that some BYMs store at least a percentage of the training dataset ( arXiv:2202.07646 ). On the other hand, this is an estimate of the upper bound, and not an attempt to indicate the practically extractable amount of training dataset data.

The authors of the new scientific article “Scalable Extraction of Training Data from (Production) Language Models” ( arXiv:2311.17035 ) tried to combine these approaches: not only to show such an attack on the BYM, but also to estimate the amount of data that can be extracted. The methodology is scalable: it detects “memories” in models of trillions of tokens and training datasets of terabytes.

#GPT-4 #GPT4 #BinanceTournament #Airdrop #elonMusk

$BNB $XRP $SOL