That sounds like a cool research project. However I think it would be enough to simply (1) not erase authorship data, and (2) not fine-tune and monitor LLMs to suppress outputs that mention people. It's an emergent property that they can trace the provenance of text. For example, if I give it an often repeated quote from a book written in the 1800's, then it can tell me where it came from. That's hundreds of years of noise it's weeding through. Imagine what language models could do for recently created knowledge.
> For example, if I give it an often repeated quote from a book written in the 1800's, then it can tell me where it came from. That's hundreds of years of noise it's weeding through. Imagine what language models could do for recently created knowledge.
Alternatively: That's 100s of years of mentions of that quote it can pull from.