Copyright of the training data
As we’ve said, we don’t know what was in the LLM training data, but there is a high chance that it contained copyrighted material in the form of document, images, audio and video, and that the copyright holders did not give consent for their material to be used. In summer 2023, actors and other workers in the US film industry went on strike to protest against their images and voices being used to train systems and generate movie material which they had not themselves participated in. Legal challenges have been launched by copyright owners in the USA, and probably elsewhere, and we will need to wait to see what impact these proceedings have on software developers and governments.
Summary: the situation is not very clear at the moment: we must continue to ask questions about the ownership and rights of use of sources.