Selectors are one of the most important pieces of your scraper. Well-written selectors make your web scraper work efficiently and fast. When the website’s layout changes your scraper’s selectors need to be changed as well. Then, in a well-established scraping environment the only things that have to be changed are the selectors. In this post I will dig into CSS selectors and XPATH and share some good tips with you to write effective and fast selectors for your web scraper.
Css selectors are widely used by frontend developers to associate css properties with their html elements. For web scrapers, we can use it to navigate in the structure of an html file. If you are a beginner scraper and you’re familiar with css then I suggest that you should use css selectors over xpath, though in some cases you have to use xpath.
Xpath is a specification which is created to help you navigate in any XML document so you can use it while you’re parsing an html file. Almost each html parsing or web scraping related library has Xpath support. It’s a more robust and powerful way to locate elements than css selectors.
Element that has x id.
Elements that has x class. Selects all elements that have x class
Element is a direct or non-direct descendant of x.
Elements that are x or y.
First element that is immediately preceded by x
div + p
Element is a direct child of x.
div > p
All elements that is preceded by x
p ~ ul
Element that has y attribute.
Element’s y attribute is “z”.
Elements that is the last child of its parent.
Elements that have no children.
Start searching from root node.
Start searching from the start of the document.
Element that has y id.
Elements that has y class.
Selects elements that are x or y. Searching in the whole document.
//H1 | //H2
Elements that has y attribute which are z.
Element is direct descendant of y and y is direct descendant of x.
Selects the text in x.
The Nth y element that is a child of x.
- Be specific if necessary and at the same time use as short selectors as you can.
- Know the HTML structure of the website thoroughly. Take time to go over it.
- Maintain the selectors. If the layout changed you probably need to change your code.
- Write selectors for yourself. Try to avoid tools.
It can take a lot of time to figure out and test your selectors especially if it is a large project. If you are not afraid of messy CSS Selectors or XPATH or simply you don’t want to waste time writing your own selectors you can use one of the amazing tools below to make your job easier. These tools will generate your desired selectors and xpath. Be aware that these tools don’t necessarily create the most readable and most efficient piece of code. Also, they sometimes generate wrong strings that doesn’t select what you need.