Your approach is correct. The graph that you've built is a DAG. This means that, you can solve for the lower vertices before solving for the upper ones. This is exactly how a typical Dynamic Programming solution works. If for solving a problem, we can see the relationship b/w this problem and other similar but smaller ones, we solve the smaller problems first and use these solutions to build up a solution for the larger case.
EDIT: A naive implementation of your idea can take upto O(n3m) time where m is the no. of words in dictionary. Using a fancy data structure like a trie or suffix tree or an automaton can speed it upto O(n^2). Can we better that?