Add Sitemaps to the Web Scraper

In this article, you will learn what a sitemap is and how to add it to the Web Scraper.

What is a sitemap?

A sitemap can be seen as a kind of map for your website. It tells you which pages exist and how they are interconnected. For the Web Scraper, a sitemap is useful because it helps it better understand the structure of the website, resulting in improved responses from the chatbot.

In this article you will read how to expand your chatbot's knowledge with the Web Scraper.
In this article, you will read about the meaning and function of a Web Scraper.

 

The URL of the sitemap

We recommend you to add the URL of the sitemap instead of the general URL of your website. Most sitemaps consist of the website's URL followed by /sitemap.xml. For example Watermelon's website's sitemap is https://watermelon.ai/sitemap.xml.

If you can't find your sitemap through this link, you can check with the website builder to see if a sitemap exists and, if so, through which URL it can be accessed.

 

Sitemap index

It may also be that your website has a sitemap index. This is a kind of table of contents leading to sitemaps below it. It informs the computer where all the other sitemaps are.

You can recognize this by the following:

Screenshot 2023-12-14 at 10.44.30

The sitemap index cannot be scraped. If your website has a sitemap index, you can choose up to 3 URLs from that index to be scraped. Choose the most relevant URLs, such as product pages.

 

No sitemap?

If your website doesn't have a sitemap, we recommend creating one. This is something the website builder can do, but there are also online tools that can help. One tool we can recommend is XML-Sitemaps. Here, you can scrape up to 500 pages for free.

Add your website's URL and click 'Start'. The tool will now scrape your website. Once this is done, click on 'View sitemap details' and then 'View full XML sitemap'. The URL that opens can be added to the Web Scraper up to "/sitemap.xml" (remove "?view=1").

Note: The sitemap created with this tool only lasts for one week. Once scraped, the knowledge is added to your chatbot. However, if you want to scrape again after a week, you must create a new sitemap in the same way as described above and replace the 'old' sitemap in the Web Scraper.

If the Web Scraper results are not as desired, please feel free to contact support at support@watermelon.ai. We will be happy to help you!