05 November 2013

Recently, I wrote a couple of tools to download and arrange web pages. Even though they were intended only for getting a text corpora for some research, the results look awesome. I wrote a simple demo that has retrived a bunch of webpages similar with initial one from Wikipedia, arranged them in hierachy of clusters and dumped a page to navigate these clusters. You can look at the bunch of pages referred by an article about “Algorithm” grouped by similarity, or read a short description here.