Jack Dermody


Text Clustering Four Ways

Again, results will change each time but a cursory examination of the clustering results shows that it seems to do a better job on the data-set than k-means. We haven't formally evaluated the results in this tutorial but a cursory examination of the four sets of results shows that NNMF is well suited to text clustering, while K-means in its three variants gives good but somewhat varied results. In a real world application we might be more interested in the usefulness of the final clusters rather than the purity of the clusters themselves - an evaluation criteria that is specific to each context.