ansys.tools.meilisearch.scraper#

Module for scaping web pages.

Classes#

Provides for scraping web pages and checking if responses are successful.

Get the name of the temporary file, which has a .txt extension.

ansys.tools.meilisearch.scraper.get_temp_file_name(ext='.txt')#: Get the name of the temporary file, which has a .txt extension.

class ansys.tools.meilisearch.scraper.WebScraper(meilisearch_host_url=None, meilisearch_api_key=None)#

Provides for scraping web pages and checking if responses are successful.

Parameters:

meilisearch_host_urlstr or None, default: None: URL of the Meilisearch host.
meilisearch_api_keystr or None, default: None: API key (admin) of the Meilisearch host.

scrape_url(url, index_uid, template=None, stop_urls=None, verbose=False)#

Scrape a URL for a web page using the active Meilisearch host.

This method generates a single unique name for a single URL.

Parameters:

urlstr: URL for the web page to scrape.
index_uidstr: Unique name of the MeiliSearch index.
templatestr, default: None: Template file for rendering.
verbosebool, default: False: Whether to print the output from scraping the URL.

Returns:

scrape_from_directory(path, template=None, verbose=False)#

Scrape the URLs for all web pages in a directory using the active Meilisearch host.

This method generates a unique index identifier for each URL in the directory.

Parameters:

pathstr: Path to the directory containing the URLs to scrape.
verbosebool, default: False: Whether to print the output of scraping the URLs.

Returns:

dict: Dictionary where keys are unique IDs of indexes and values are the number of hits for each URL.

Raises: