Easy page scraping with Zend\Dom (from Zend Framework 2)

The other day I was interested in getting some information from the sussex.academia.edu site, specifically I wanted a list of tags for each of the faculty members. Now, this sounds relatively easy except when you consider that initial page contains a list of links to various schools/departments people have listed, and then under each of those pages you have different fieldsets with different types of people on them (and I was only interested in the faculty fieldset), and each person may or may not have tags and even then those tags may be hidden behind some javascript so that you click and view all of the tags… When you consider all of that you would be forgiven in thinking that it’s actually quite a daunting task!

Let me assure you, though, that by using Zend\Dom from the Zend Framework 2 library it’s actually a really simple task. In fact, I did it in around 20 lines of code.

So let’s start by looking at the code and then break it down a little more.

Continue reading “Easy page scraping with Zend\Dom (from Zend Framework 2)”

Did you like this? Share it: