Extract data from numerous websites using Web Scraper! A Web Scraper is a tool that pulls data from specified URLs, making it easier to collect information from various sites. To get started using the Web Scraper, follow the setup below.
Setup
- Click Settings > Integrations.
- Type "Website (Next-Generation Scraper)" in the search bar and click on it.
- Click Connect to start the setup.
- Enter the URLs of the websites you want to scrape. To add more URLs, click Add Item again and enter the additional URLs.
-
Optional: If you want to scrape more data, check the box next to “Enable Deep Crawling.” This option allows the scraper to collect data from not only the selected page but also all related child pages linked to it.
With deep crawling enabled, you can access data from multiple layers of linked pages. If you disable this option, the scraper will only collect data from the initial page. - After adding your URLs, click Save. The Web Scraper will reindex automatically every 24 hours to keep the content up-to-date.
Web Scraper Capabilities
Capabilities | Limits |
Maximum number of URLs a single Forethought user can add | Unlimited |
Maximum depth the web scraper will drill down for each added URL (when deep crawling is enabled) | Unlimited |
Maximum depth the web scraper will drill down for each added URL (when deep crawling is disabled) | 1 |
Maximum number of pages each URL can scan | Unlimited |
Total maximum number of pages that one Forethought account can scan | Unlimited |
Frequently Asked Questions (FAQs)
Q: What happens when you disable deep crawling?
A: It will only scrape data on the initial page. It won’t crawl deeper into linked children pages.
Q: How many URLs can you add manually?
A: The next-gen scraper has no URL limit you can add.
Q: What’s the maximum number of URLs it can index?
A: There's no maximum number.
Q: What does depth mean in Web Scraping?
A: Depth in web scraping refers to how many levels deep the scraper will navigate through a website's structure. For example, if you set the depth to 3, the scraper will perform the following steps:
- First Level: It starts at the main URL you provide.
- Second Level: From the first page, it collects links to other pages and scrapes data from those linked pages.
- Third Level: It then takes links from the second-level pages and scrapes data from those pages as well.