Skip to content

Crawler and Searchengine

Installation prerequisites

Those two services works together : prismeai-crawler and prismeai-searchengine, if you wish to use one of them, you have to install the second one.

They need access to:

  • An ElasticSearch, it can be the same as the one used for the core deployment
  • A Redis with JSON and SEARCH modules, it might not be the case of your Redis intance (i.e check with redis-cli info modules). If needed, the redis/redis-stack-server image include all required modules and is already packaged as Helm chart here

Environment variables

  • REDIS_URL=redis://localhost:6379
  • ELASTIC_SEARCH_URL=localhost

ELASTIC_SEARCH_URL might be set to an empty string '', in which case no webpage content would be saved, thus deactivating searches.

Microservice testing

Once you configured and started the microservice (following the generic guide) you can verify everything is in order.

  1. Create a searchengine :
curl --location 'http://localhost:8000/monitor/searchengine/test/test' \
--header 'Content-Type: application/json' \
--data '{
    "websites": [
        "https://docs.eda.prisme.ai/en/workspaces/"
    ]
}'

If successful, a complete searchengine object including an id field should be received.

  1. After a few seconds, look at the crawl history:
curl --location --request GET 'http://localhost:8000/monitor/searchengine/test/test/stats' \
--header 'Content-Type: application/json' \
--data '{
    "urls":  ["http://quotes.toscrape.com"]
}'

The fields metrics.indexed_pages and metrics.pending_requests should be greater than 0, and pages already indexed should appear in crawl_history.

  1. Try a search:

curl --location 'http://localhost:8000/search/test/test' \
--header 'Content-Type: application/json' \
--data '{
    "query": "workspace"
}'
In the answer, a results table should indicate one or more pages of the https://docs.eda.prisme.ai documentation dealing with workspaces.

Congratulations, you service is up and running!