Expand Your Chatbot's Knowledge with the Web Scraper

Do you want to enrich your chatbot's knowledge with information from your website? In this article, you'll learn how to do this effortlessly using the Web Scraper.

In addition to adding your domain knowledge, you might want to incorporate more information into your chatbot. The Web Scraper helps by "scraping" and integrating all relevant information from specific websites into your chatbot's knowledge. This eliminates the need for manual data entry and simplifies the maintenance of your chatbot.

In this article, you'll learn more about what exactly a Web Scraper is and how it works (technically).

 

Check out this video:

1. Accessing the Web Scraper

To get started, create a new chatbot or open an existing one where you want to include information from a specific website. Go to 'Sources' to find the Web Scraper.

2. Adding the website

Once in the Web Scraper environment, you can easily add and manage your website. Enter the URL of the website from which you want to extract information and click on 'Start Scraping'.

lf your website has a sitemap, we recommend adding it as a URL. This article provides more information on what a sitemap is and how to add it. 

The scraping status and the time of the last scrape are displayed.

For large websites, it may take some time to scrape the entire site.

    Screenshot 2023-12-04 at 15-12-50-png

    Note: When scraping is completed very quickly (within a few seconds), it may indicate that only a very small part of the website has been scraped, as the website is either not scrapeable or difficult to scrape. Feel free to contact our support team to discuss whether and how the website can be scraped.

    Please also read this article regarding scraping the sitemap of your website. 

    Scraping a page / single link

    Scraping a single link is not possible. The scraper uses the entire domain. For example, it is not possible to scrape www.watermelon.ai/pricing, but it is possible to scrape www.watermelon.ai. 

    If you wish to scrape a single page, it is best to convert that page into a PDF document and then add it via the Document Scraper (see this article). Converting a page into a PDF document is done by navigating to the page and saving it as a PDF (via right mouse button).

     

    3. Scrape again

    When the content on the website changes, click 'Scrape' again to have the website rescraped.\

    4. Removing a scraped URL

    If you want to remove a specific URL from the Web Scraper, simply click on the designated button next to the URL.

    Note: Removing a URL will also erase all the knowledge the chatbot acquired from that specific website.

     

    5. Testing the chatbot with website knowlegde

    After scraping is complete, test your chatbot with the acquired knowledge from the scraped website. This can be done in the interactive tester.

    Important: If the information on your website conflicts with manually added instructions, the chatbot will use the knowledge interchangeably, potentially resulting in different answers to the same question.

    If the Web Scraper results are not as desired, please feel free to contact support at support@watermelon.ai. We will be happy to help you!