Skip to content

Latest commit

 

History

History
132 lines (98 loc) · 4.04 KB

File metadata and controls

132 lines (98 loc) · 4.04 KB

OpenSearch Learning-to-Rank POC

This repo is intended as for educational purposes. It contains the steps needed to get a working implementation of the OpenSearch Learning to Rank Plugin working locally with a dockerized version of OpenSearch.

The best way to understand LTR is to read the official docs here.

Install

Project install:

pyenv local 3.12
poetry env use 3.12
poetry install

Requirements

1. Set up Opensearch in Docker

NOTE: To use the OpenSearch image with a custom plugin, you must first create a Dockerfile. See

Run:

  • docker compose up
  • curl http://localhost:9200 in a new terminal to check the cluster

2. Create index mapping and bulk index data

./bin/index.sh

Search for a movie to confirm it worked:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "title": "First"
        }
    }
}'

3. Set up an LTR feature store

Set up default featureset index:

curl -X PUT "http://localhost:9200/_ltr?pretty=true" -H 'Content-Type: application/json'

Create a feature set called moviefeatureset:

curl -X PUT "http://localhost:9200/_ltr/_featureset/moviefeatureset?pretty=true" -H 'Content-Type: application/json' -d '@featureset.json'

Run curl http://localhost:9200/_ltr/_featureset?pretty=true to see registered features in the featureset.

4. Run a query to get logged features

First, run a python script to generate an example query with SLTR feature logging:

python save_query.py "First Blood"

Then, run the query:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
  -H 'Content-Type: application/json' \
  -d @example-ltr.json

Run the feature logging job:

poetry shell
python train/log_features.py

Things to note:

  • Logged feature values in "_ltrlog"
  • Feature logging score returned for title_query
  • Feature log result returned for overview_query with no score. This is because we originally indexed a document without a description field.
  • Nothing at all logged for year_released. This is because it was never registered as a feature of interest in the featureset.

5. Train an XGBoost model

poetry shell
python train/train.py

Inspect the model at train/model/model.json.

6. Upload the model to Opensearch

curl -X POST "http://localhost:9200/_ltr/_featureset/moviefeatureset/_createmodel" \
  -H 'Content-Type: application/json' \
  -d @train/model/os_model.json

7. Run a query with the model

poetry shell
python save_query.py "First Blood" --rescore

Then, run the query:

curl -X GET "http://localhost:9200/tmdb/_search?pretty=true" \
  -H 'Content-Type: application/json' \
  -d @example-ltr-rescore.json

Resources