OpenSearch Learning-to-Rank POC

This repo is intended as for educational purposes. It contains the steps needed to get a working implementation of the OpenSearch Learning to Rank Plugin working locally with a dockerized version of OpenSearch.

The best way to understand LTR is to read the official docs here.

Install

Project install:

pyenv local 3.12
poetry env use 3.12
poetry install

Requirements

Docker
opensearch-plugin CLI tool
- brew install opensearch

1. Set up Opensearch in Docker

NOTE: To use the OpenSearch image with a custom plugin, you must first create a Dockerfile. See

Working with plugins (Opensearch)
Installing (LTR official docs)
Opensearch version: 2.18.0
LTR Plugin v2.18.0 (compatible with OS 2.18.0). See plugin release history on GitHub.

Run:

docker compose up
curl http://localhost:9200 in a new terminal to check the cluster

2. Create index mapping and bulk index data

./bin/index.sh

Search for a movie to confirm it worked:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "title": "First"
        }
    }
}'

3. Set up an LTR feature store

Set up default featureset index:

curl -X PUT "http://localhost:9200/_ltr?pretty=true" -H 'Content-Type: application/json'

Create a feature set called moviefeatureset:

curl -X PUT "http://localhost:9200/_ltr/_featureset/moviefeatureset?pretty=true" -H 'Content-Type: application/json' -d '@featureset.json'

Run curl http://localhost:9200/_ltr/_featureset?pretty=true to see registered features in the featureset.

4. Run a query to get logged features

First, run a python script to generate an example query with SLTR feature logging:

python save_query.py "First Blood"

Then, run the query:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
  -H 'Content-Type: application/json' \
  -d @example-ltr.json

Run the feature logging job:

poetry shell
python train/log_features.py

Things to note:

Logged feature values in "_ltrlog"
Feature logging score returned for title_query
Feature log result returned for overview_query with no score. This is because we originally indexed a document without a description field.
Nothing at all logged for year_released. This is because it was never registered as a feature of interest in the featureset.

5. Train an XGBoost model

poetry shell
python train/train.py

Inspect the model at train/model/model.json.

6. Upload the model to Opensearch

curl -X POST "http://localhost:9200/_ltr/_featureset/moviefeatureset/_createmodel" \
  -H 'Content-Type: application/json' \
  -d @train/model/os_model.json

7. Run a query with the model

poetry shell
python save_query.py "First Blood" --rescore

Then, run the query:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
  -H 'Content-Type: application/json' \
  -d @example-ltr-rescore.json

Resources

Elasticsearch Learning to Rank: the documentation
Learning to Rank for Amazon OpenSearch Service (AWS)
Working with plugins (Opensearch)
Example LTR judgement list for movies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OpenSearch Learning-to-Rank POC

Install

Requirements

1. Set up Opensearch in Docker

2. Create index mapping and bulk index data

3. Set up an LTR feature store

4. Run a query to get logged features

5. Train an XGBoost model

6. Upload the model to Opensearch

7. Run a query with the model

Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

OpenSearch Learning-to-Rank POC

Install

Requirements

1. Set up Opensearch in Docker

2. Create index mapping and bulk index data

3. Set up an LTR feature store

4. Run a query to get logged features

5. Train an XGBoost model

6. Upload the model to Opensearch

7. Run a query with the model

Resources