Scrapy - Write spiders quickly

Selectors Link to heading

Figuring out selectors are necessary for your scraper. That’s why it’s important to develop a process so you can come up with selectors as quickly as possible. My process includes simply hitting up the inspector in my browser and find the element I need. I highly suggest that you should learn about css if you don’t know it yet because it will help you figure out selectors. But if you’re looking for a simpler solution you can just right click on the element in the inspector and copy the selector from your browser. That’s it. I’m showing you how to do it in the video above.

Scrapy shell Link to heading

Scrapy shell is the #1 productivity tool for you while building a scraper. It helps you debug your spider right on spot when something unexpected happens. Also with it you can test the selectors without running the whole spider. It’s pretty cool. I tell you more about scrapy shell in the video and in this post: How To Debug Your Spider

Scrapy cache Link to heading

Scrapy caching is another excellent way to save time while developing your spider. A real life spider could take several minutes to finish running. I really don’t like to wait for this while I try to test if it really works well. Scrapy cache saves all the html files your spider scrapes so the next time it will fetch data from the saved files. This way it’s quicker to test the spider because scrapy doesn’t have to request the actual page.

Item loaders, processors Link to heading

I always use item loaders and i/o processors whenever I can. It makes the code more readable and modular. At least that is what I discovered. In the video I show you what processors I use mostly. Read more about item loaders and processors here.

Scrapy templates Link to heading

At the end, I mention my Github repo where you can find the templates I use nowadays to create my spiders. Read more about it here