Crawler and Searchengine
Installation prerequisites¶
Those two services works together : prismeai-crawler
and prismeai-searchengine
, if you wish to use one of them, you have to install the second one.
They need access to:
- An ElasticSearch, it can be the same as the one used for the core deployment
- A Redis, it can also be the same as the one used for the core deployment
Environment variables¶
- REDIS_URL=redis://localhost:6379
- ELASTIC_SEARCH_URL=localhost
ELASTIC_SEARCH_URL might be set to an empty string '', in which case no webpage content would be saved, thus deactivating searches.
Microservice testing¶
Once you configured and started the microservice (following the generic guide) you can verify everything is in order.
-
Create a searchengine :
curl --location 'http://localhost:8000/monitor/searchengine/test/test' \ --header 'Content-Type: application/json' \ --data '{ "websites": [ "https://docs.eda.prisme.ai/en/workspaces/" ] }'
If successful, a complete searchengine object including an id field should be received.
-
After a few seconds, look at the crawl history:
curl --location --request GET 'http://localhost:8000/monitor/searchengine/test/test/stats' \ --header 'Content-Type: application/json' \ --data '{ "urls": ["http://quotes.toscrape.com"] }'
The fields
metrics.indexed_pages
andmetrics.pending_requests
should be greater than 0, and pages already indexed should appear incrawl_history
. -
Try a search:
In the answer, acurl --location 'http://localhost:8000/search/test/test' \ --header 'Content-Type: application/json' \ --data '{ "query": "workspace" }'
results
table should indicate one or more pages of the https://docs.eda.prisme.ai documentation dealing with workspaces.
Congratulations, you service is up and running!