Crawler and Searchengine
Installation prerequisites#
Those two services works together : prismeai-crawler
and prismeai-searchengine
, if you wish to use one of them, you have to install the second one.
They need access to:
- An ElasticSearch, it can be the same as the one used for the core deployment
- A Redis, it can also be the same as the one used for the core deployment
Environment variables#
- REDIS_URL=redis://localhost:6379
- ELASTIC_SEARCH_URL=localhost
ELASTIC_SEARCH_URL might be set to an empty string '', in which case no webpage content would be saved, thus deactivating searches.
Microservice testing#
Once you configured and started the microservice (following the generic guide) you can verify everything is in order.
-
Create a searchengine :
curl --location 'http://localhost:8000/monitor/searchengine/test/test' \ --header 'Content-Type: application/json' \ --data '{ "websites": [ "https://docs.eda.prisme.ai/en/workspaces/" ] }'
If successful, a complete searchengine object including an id field should be received.
-
After a few seconds, look at the crawl history:
curl --location --request GET 'http://localhost:8000/monitor/searchengine/test/test/stats' \ --header 'Content-Type: application/json' \ --data '{ "urls": ["http://quotes.toscrape.com"] }'
The fields
metrics.indexed_pages
andmetrics.pending_requests
should be greater than 0, and pages already indexed should appear incrawl_history
. -
Try a search:
In the answer, acurl --location 'http://localhost:8000/search/test/test' \ --header 'Content-Type: application/json' \ --data '{ "query": "workspace" }'
results
table should indicate one or more pages of the https://docs.eda.prisme.ai documentation dealing with workspaces.
Congratulations, you service is up and running!