We already do tree searches: see beam search and “best of” search. Arguable if it is a “clever” tree search but it’s not entirely unguided either since you prune your tree based on factors like perplexity which is a measure of how probable/plausible the model rates a branch as it stands so far.
In beam search you might keep the top n branches at each token generation step. Best of is in a sense the same but you take many steps using regular sampling at a time before pruning.
In beam search you might keep the top n branches at each token generation step. Best of is in a sense the same but you take many steps using regular sampling at a time before pruning.