Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Op asset specificity #3841

Merged
merged 24 commits into from
Jan 6, 2025
Merged

Op asset specificity #3841

merged 24 commits into from
Jan 6, 2025

Conversation

jsnoble
Copy link
Member

@jsnoble jsnoble commented Nov 25, 2024

  • jobs will throw if there are name collisions and the op name has no asset identifier
  • can now specify op/reader/apis names as op-name@assetName:v1.4.0 or op-name@assetName

@godber
Copy link
Member

godber commented Nov 27, 2024

When you're back Jared, you'll need to rebase this and bump the version to 2.10.0 since we did a 2.9.0 release today.

@jsnoble
Copy link
Member Author

jsnoble commented Dec 3, 2024

To test this out, setup a running teraslice instance, and use teraslice-cli to upload assets

teraslice-cli assets deploy localhost terascope/[email protected]
teraslice-cli assets deploy localhost terascope/[email protected]

Sample Job that should fail because we don't know which asset to use for the op, please adjust index and any other settings for your setup

{
    "name": "test",
    "lifecycle": "once",
    "workers": 1,
    "analytics": true,
    "assets": [
        "elasticsearch:4.0.2",
        "elasticsearch:4.0.5"
    ],
    "operations": [
        {
            "_op": "elasticsearch_reader",
            "index": "ts_test_example-1000",
            "size": 10000,
            "date_field_name": "created",
            "preserve_id": true
        },
        {
            "_op": "elasticsearch_bulk",
            "index": "op_asset_version_test",
            "preserve_id": true,
            "size": 10000
        }
    ]
}

Sample job that should pass, please make the same updates to index and any other changes neccessary

{
    "name": "test",
    "lifecycle": "once",
    "workers": 1,
    "analytics": true,
    "assets": [
        "elasticsearch:4.0.2",
        "elasticsearch:4.0.5"
    ],
    "operations": [
        {
            "_op": "elasticsearch_reader@elasticsearch:4.0.2",
            "index": "ts_test_example-1000",
            "size": 10000,
            "date_field_name": "created",
            "preserve_id": true
        },
        {
            "_op": "elasticsearch_bulk@elasticsearch:4.0.5",
            "index": "op_asset_version_test",
            "preserve_id": true,
            "size": 10000
        }
    ]
}

@ciorg
Copy link
Member

ciorg commented Dec 3, 2024

Here's my test

job config:

"assets": [
        "common_processors:0.16.0",
        "standard:1.2.0"
    ],
    "operations": [
        {
            "_op": "data_generator",
            "size": 1
        },
        {
            "_op": "filter",
            "field": "ip",
            "value": "0.0.0.0"
        },
        {
            "_op": "noop"
        }
    ]

When I tried to run with this config I got the error Multiple assets containing the same processor name _op: "filter", please specify which specific asset to use on its name. Added 1filter@common_processors:0.16.0 and the job updated and ran.

I also tried the job will multiple filter ops using both assets and it worked as expected. I also tried just adding the asset name with no version, but it requires the version, which makes sense.

@jsnoble
Copy link
Member Author

jsnoble commented Dec 6, 2024

Example of real job that can work with the elasticsearch-assets

{
    "name": "test",
    "lifecycle": "once",
    "workers": 1,
    "analytics": true,
    "assets": [
        "elasticsearch:4.0.2",
        "elasticsearch:4.0.5"
    ],
    "apis": [
        {
            "_name": "elasticsearch_sender_api@elasticsearch:4.0.5",
            "index": "op_asset_version_test",
            "preserve_id": true,
            "size": 10000
        },
        {
            "_name": "elasticsearch_reader_api@elasticsearch:4.0.2",
            "index": "ts_test_example-1000",
            "size": 10000,
            "date_field_name": "created",
            "preserve_id": true
        }
    ],
    "operations": [
        {
            "_op": "elasticsearch_reader@elasticsearch:4.0.2",
            "api_name": "elasticsearch_reader_api@elasticsearch:4.0.2",
            "index": "ts_test_example-1000",
            "size": 10000,
            "date_field_name": "created",
            "preserve_id": true
        },
        {
            "_op": "elasticsearch_bulk@elasticsearch:4.0.5",
            "api_name": "elasticsearch_sender_api@elasticsearch:4.0.5",
            "index": "op_asset_version_test",
            "preserve_id": true,
            "size": 10000
        }
    ]
}

Copy link
Member

@godber godber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider some of my comments inline, I think I found the parseName tests, I didn't expect them included inside another specfile.

Please try and document the extent of the changes you have made in the documentation. What all is now possible? What scenarios are prohibited? What are tags?

@godber
Copy link
Member

godber commented Dec 9, 2024

All in all I feel better about these changes after reading the PR. I think we will play with these changes in the office tomorrow.

Tomorrow when we're in the office, @sotojn and @busma13 , lets pull this down, build and run it, try and test some jobs.

@busma13
Copy link
Contributor

busma13 commented Dec 10, 2024

I was testing this and it looks like using a plain apiName with no @ or using apiName:tag is now broken. I think adjustNamesByMapping() is using an undefined assetIdentifier when trying to look up the asset hash.

Copy link
Contributor

@busma13 busma13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There were just a few typos, etc I noticed.

docs/jobs/configuration.md Show resolved Hide resolved
e2e/test/cases/assets/simple-spec.ts Show resolved Hide resolved
e2e/test/cases/assets/simple-spec.ts Outdated Show resolved Hide resolved
e2e/test/cases/assets/simple-spec.ts Outdated Show resolved Hide resolved
packages/job-components/src/job-validator.ts Outdated Show resolved Hide resolved
@jsnoble jsnoble merged commit 9d918ce into master Jan 6, 2025
46 checks passed
@jsnoble jsnoble deleted the op-asset-specificity branch January 6, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants