This repo is intended as for educational purposes. It contains the steps needed to get a working implementation of the OpenSearch Learning to Rank Plugin working locally with a dockerized version of OpenSearch.
The best way to understand LTR is to read the official docs here.
Project install:
pyenv local 3.12
poetry env use 3.12
poetry install
- Docker
- opensearch-plugin CLI tool
brew install opensearch
NOTE: To use the OpenSearch image with a custom plugin, you must first create a Dockerfile. See
-
Opensearch version:
2.18.0
-
LTR Plugin
v2.18.0
(compatible with OS 2.18.0). See plugin release history on GitHub.
Run:
docker compose up
curl http://localhost:9200
in a new terminal to check the cluster
./bin/index.sh
Search for a movie to confirm it worked:
curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"title": "First"
}
}
}'
Set up default featureset index:
curl -X PUT "http://localhost:9200/_ltr?pretty=true" -H 'Content-Type: application/json'
Create a feature set called moviefeatureset
:
curl -X PUT "http://localhost:9200/_ltr/_featureset/moviefeatureset?pretty=true" -H 'Content-Type: application/json' -d '@featureset.json'
Run curl http://localhost:9200/_ltr/_featureset?pretty=true
to see registered features in the featureset.
First, run a python script to generate an example query with SLTR feature logging:
python save_query.py "First Blood"
Then, run the query:
curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
-H 'Content-Type: application/json' \
-d @example-ltr.json
Run the feature logging job:
poetry shell
python train/log_features.py
Things to note:
- Logged feature values in
"_ltrlog"
- Feature logging score returned for
title_query
- Feature log result returned for
overview_query
with no score. This is because we originally indexed a document without a description field. - Nothing at all logged for
year_released
. This is because it was never registered as a feature of interest in the featureset.
poetry shell
python train/train.py
Inspect the model at train/model/model.json
.
curl -X POST "http://localhost:9200/_ltr/_featureset/moviefeatureset/_createmodel" \
-H 'Content-Type: application/json' \
-d @train/model/os_model.json
poetry shell
python save_query.py "First Blood" --rescore
Then, run the query:
curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
-H 'Content-Type: application/json' \
-d @example-ltr-rescore.json