Scrape and upload documents or a website#
You use PyMeilisearch to scrape and upload documents or a website to Meilisearch. When starting this command-line interface (CLI) tool, you supply the template to use for content scraping, the Meilisearch index to use for identifying content, and the format type and location of the source documents.
Note
You must declare two environment variables before using PyMeilisearch:
MEILISEARCH_HOST_URL
: Registry endpoint for MeilisearchMEILISEARCH_API_KEY
: API key (admin) for creating indexes in the search registry
Start PyMeilisearch#
To start PyMeilisearch, open a command prompt or your terminal and run the
pymeilisearch
command followed by the desired subcommand and options.
Here is the general syntax for the pymeilisearch
command:
$ pymeilisearch <subcommand> [options] [arguments]
Subcommands
The pymeilisearch
command supports these subcommands:
upload
: Upload documents or a website to Meilisearch.version
: Get the current version of PyMeilisearch.
Upload documents or a website#
The upload
subcommand uploads documents or a website to Meilisearch,
creating indexes on the Meilisearch instance.
Here is the general syntax for the upload
subcommand:
$ pymeilisearch upload --template <template> --index <index> <source> <location> [options]
As you can see, this command requires certain arguments and supports additional options, depending on your requirements.
Required arguments
--template <template>
: Name of the template to use or the path to where the template file is located. Available templates aresphinx_pydata
anddefault
. The Meilisearch scraper tool, docs-scraper, requires a configuration file to know what content to scrape. For an example of a configuration file, see Set your Config File in the README for this tool’s GitHub repository.--index <index name>
: Name of the Meilisearch index to use to identify content.<source>
: Format type for the documents to upload. It can behtml
,url
, orgithub
.<location>
: Location of the documents or website to upload.
Options
--cname <cname>
: CNAME that hosts the documents. While supplying a CNAME is optional, doing so is recommended for scraping documents on the localhost.--port <port>
: Port that the localhost is connected on. The default is8000
.--orgs <orgs>
: One or more GitHub organizations to scrape public GitHub pages from.
Get the PyMeilisearch version#
The version
command gets the version of your PyMeilisearch installation:
$ pymeilisearch version