Articles in this section

Getting Started with Web Scraper

Extract data from numerous websites using Web Scraper! A Web Scraper is a tool that pulls data from specified URLs, making it easier to collect information from various sites. To get started using the  Web Scraper, follow the setup below. 

Setup

  1. Click Settings > Integrations.
  2. Type "Website (Next-Generation Scraper)" in the search bar and click on it.
  3. Click Connect to start the setup.
  4. Enter the URLs of the websites you want to scrape. To add more URLs, click Add Item again and enter the additional URLs.
  5. Optional: If you want to scrape more data, check the box next to “Enable Deep Crawling.” This option allows the scraper to collect data from not only the selected page but also all related child pages linked to it.

    With deep crawling enabled, you can access data from multiple layers of linked pages. If you disable this option, the scraper will only collect data from the initial page.

  6. After adding your URLs, click Save. The Web Scraper will reindex automatically every 24 hours to keep the content up-to-date.

Web Scraper Capabilities

Capabilities Limits
Maximum number of URLs a single Forethought user can add Unlimited
Maximum depth the web scraper will drill down for each added URL (when deep crawling is enabled) Unlimited
Maximum depth the web scraper will drill down for each added URL (when deep crawling is disabled) 1
Maximum number of pages each URL can scan Unlimited
Total maximum number of pages that one Forethought account can scan Unlimited

Frequently Asked Questions (FAQs)

Q: What happens when you disable deep crawling?

A: It will only scrape data on the initial page. It won’t crawl deeper into linked children pages.

Q: How many URLs can you add manually?

A: The next-gen scraper has no URL limit you can add.

Q: What’s the maximum number of URLs it can index?

A: There's no maximum number.

Q: What does depth mean in Web Scraping? 

A: Depth in web scraping refers to how many levels deep the scraper will navigate through a website's structure. For example, if you set the depth to 3, the scraper will perform the following steps:

  1. First Level: It starts at the main URL you provide.
  2. Second Level: From the first page, it collects links to other pages and scrapes data from those linked pages.
  3. Third Level: It then takes links from the second-level pages and scrapes data from those pages as well.
Was this article helpful?
0 out of 0 found this helpful

Support

  • Need help?

    Click here to submit a support request. We are here to assist you.

  • Business hours

    Monday to Friday 8am - 5pm PST excluding US holidays

  • Contact us

    support@forethought.ai