Funny enough "learning characters/words in the context of the vocabulary they are in" is exactly what NLP machine learning models use to learn "rich" word/text representations based on the "distribution hypothesis" which states that the choice of words in the same context share a common meaning.