Cache html contentΒΆ

By default, cache is turned off. In many scraping jobs, we need to make some tweaks to our parsing part and re-scrape the site again. In that situation, caching the html content from the first scrape is very helpful, especially for big scrapes.

Enable cache

>>> import os
>>> from scrapex import Scraper
>>> s = Scraper(use_cache=True)
>>> doc = s.load('')
>>> print(os.listdir(s.cache.location))

Disable cache at request level

>>> doc = s.load('', use_cache=False)

Disable cache at scraper level

>>> import os
>>> from scrapex import Scraper
>>> s = Scraper(use_cache=False)
>>> doc = s.load('')
>>> print(os.listdir(s.cache.location))