The only way to include a book in a training dataset for LLMs without violating copyright law is to contact the rights holder and buy a license to do so. Buying an ebook license off Amazon isn't enough for this, and creating a digital copy from a physical copy for your commercial use is also against the law. A good rule of thumb is if it would be illegal for a company to distribute the digital file to empolyees for training, it's definetally illegal to train an AI the company will own on it.
It is widely accepted in many jurisdictions that different types of uses have different types of copy right schemes. Especially true for video and music content. You can't just take DVD/Blueray copy of movie and show it to movie theatre in many places. Or copy a CD and play it on radio.
I see no reason why training AI should be treated like human reading. Especially if it is repeated. And more so if copies are illegally acquired like torrents.