ansys.tools.meilisearch.scraper#

Module for scaping web pages.

Classes#

WebScraper

Provides for scraping web pages and checking if responses are successful.

Functions#

get_temp_file_name([ext])

Get the name of the temporary file, which has a .txt extension.

Module Contents#

ansys.tools.meilisearch.scraper.get_temp_file_name(ext='.txt')#

Get the name of the temporary file, which has a .txt extension.

class ansys.tools.meilisearch.scraper.WebScraper(meilisearch_host_url=None, meilisearch_api_key=None)#

Bases: ansys.tools.meilisearch.client.BaseClient

Provides for scraping web pages and checking if responses are successful.

Parameters:
meilisearch_host_urlstr or None, default: None

URL of the Meilisearch host.

meilisearch_api_keystr or None, default: None

API key (admin) of the Meilisearch host.

scrape_url(url, index_uid, template=None, stop_urls=None, verbose=False)#

Scrape a URL for a web page using the active Meilisearch host.

This method generates a single unique name for a single URL.

Parameters:
urlstr

URL for the web page to scrape.

index_uidstr

Unique name of the MeiliSearch index.

templatestr, default: None

Template file for rendering.

verbosebool, default: False

Whether to print the output from scraping the URL.

Returns:
int

Number of hits from the URL for the web page.

scrape_from_directory(path, template=None, verbose=False)#

Scrape the URLs for all web pages in a directory using the active Meilisearch host.

This method generates a unique index identifier for each URL in the directory.

Parameters:
pathstr

Path to the directory containing the URLs to scrape.

verbosebool, default: False

Whether to print the output of scraping the URLs.

Returns:
dict

Dictionary where keys are unique IDs of indexes and values are the number of hits for each URL.

Raises:
FileNotFoundError

If the specified path does not exist.