Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Op asset specificity #3841

Merged
merged 24 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
18cd5f3
load operaitons that are versioned with set assets resolves #3837
jsnoble Nov 20, 2024
b6f3c03
more operations fixes, throw if multiple processors are found
jsnoble Nov 22, 2024
2983a65
fix collisions tests, remove long deprecated terasliceOpPath
jsnoble Nov 25, 2024
3375fb5
add tests for asset versioning with naming collisions
jsnoble Nov 25, 2024
fdf7cfd
fix tests
jsnoble Nov 25, 2024
4dad7d9
remove double await
jsnoble Nov 25, 2024
9e53448
fix testHarness to work with new operation loading
jsnoble Nov 26, 2024
99af228
Merge branch 'master' into op-asset-specificity
jsnoble Nov 26, 2024
eedf446
release: (minor) [email protected]
jsnoble Nov 26, 2024
89890f1
merge master
jsnoble Dec 2, 2024
975b7d4
release: (minor) [email protected]
jsnoble Dec 2, 2024
00be786
make versioning backwards compatible with validateJobs functions on s…
jsnoble Dec 2, 2024
bd3b203
remove duplicate jobValidation checks
jsnoble Dec 3, 2024
1ee5c6b
fix lint issues
jsnoble Dec 3, 2024
2b38e98
fix auto api name creation issue, add more tests
jsnoble Dec 6, 2024
d6603e4
Merge branch 'master' into op-asset-specificity
jsnoble Dec 6, 2024
3b33182
bump: (patch) @terascope/[email protected], @terascope/[email protected]
jsnoble Dec 6, 2024
7029a33
merge master
jsnoble Dec 6, 2024
a41b8dd
fix tests and lint errors
jsnoble Dec 6, 2024
9e6fcab
fix test
jsnoble Dec 6, 2024
d8c6811
merge from master
jsnoble Jan 2, 2025
d0057fb
fix bug with asset apis names and docs
jsnoble Jan 3, 2025
ccac491
fix docs and put check in place when to alter api names
jsnoble Jan 6, 2025
435f3f0
fix docs and typos
jsnoble Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 156 additions & 1 deletion docs/jobs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The first operation in the [operations](#operations) list, reads from a particul

| Configuration | Description | Type | Notes |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- |
| `_name` | The `_name` property is required, and it is required to be unqiue but can be suffixed with a identifier by using the format `"example:0"`, anything after the `:` is stripped out when searching for the file or folder. | `String` | required |
| `_name` | The `_name` property is required, and must match the name of api inside the asset. You may create several instances of an api and differentiate them by using the format `"example:someTag"`, and use the config `api_name: "example:someTag"` on the operation to reference the different apis. | `String` | required |

## Examples

Expand Down Expand Up @@ -98,4 +98,159 @@ The first operation in the [operations](#operations) list, reads from a particul
]
}
```
<!--Example with multiple apis->
jsnoble marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"name": "test",
"lifecycle": "once",
"workers": 1,
"analytics": true,
"assets": [
"elasticsearch:4.0.2"
],
"apis": [
{
"_name": "elasticsearch_reader_api:foo",
"index": "ts_test_example-1000",
"size": 10000,
"date_field_name": "created",
"preserve_id": true
},
{
"_name": "elasticsearch_reader_api:bar",
"index": "some_other_index",
"size": 20000,
"date_field_name": "_ingest",
"preserve_id": false
}
],
"operations": [
{
"_op": "elasticsearch_reader",
"api_name": "elasticsearch_reader_api:foo",
"index": "ts_test_example-1000",
"size": 10000,
"date_field_name": "created",
"preserve_id": true
},
{
"_op": "example_op",
"api_name": "elasticsearch_reader_api:bar"
}
]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->

### Operation and api name collisions
When creating a job that uses multiple assets, we check to see if any of the apis or operations specified on the job are found in multiple assets. If they are found across multiple assets then we will throw an error on job submission as we cannot tell which asset to use.

To determine which asset to use you must use the `@` symbol along with a valid asset identifier on the job to determine which asset to use when loading the operation or api. The asset identifier must match your name in the asset configuration on the job

## Examples

### Naming conventions

If for example you have an asset listed as `standard` inside a job, the operation must be annotated with it. So for this asset and operation named `filter`, below are examples of valid names:

- `filter@standard`

If you list the asset as `standard:3.2.0`

- `filter@standard:3.2.0`

If you list the asset as its asset hash `2ab55a02723c304b2b74a7819942b4920e4ee6a9`

- `filter@2ab55a02723c304b2b74a7819942b4920e4ee6a9`


You can use multiple of the same asset name in a job as long as the operation and api names match which asset version you are trying to use. An example of this can be found in the example jobs below with two elasticsearch asset example.

api names follow the same convention as operations with the only difference that apis allow for additional tags as mentioned [here](#apis).

- `some_api:someAsset:1.1.0:foo`


### Jobs
In this example job, we are using two assets that both have an operation with the same `_op` named `filter` to be exact. This is an example job that shows how to use the correct naming conventions to determine which asset to use.

<!--DOCUSAURUS_CODE_TABS-->
<!-- job with an operation name that is shared between the two assets -->
``` json
{
"name": "test filter",
"lifecycle": "once",
"workers": 1,
"analytics": true,
"assets": [
"common_processors:0.16.0",
"standard:1.2.0"
],
"operations": [
{
"_op": "data_generator",
"size": 1
},
{
"_op": "filter@common_processors:0.16.0",
"field": "ip",
"value": "0.0.0.0"
},
{
"_op": "noop"
}
]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->

In this example job, we are using two assets that both have an operation with the same `_op` and `api` name. This is an example job that show how to determine which asset to use for each api and operation.

<!--DOCUSAURUS_CODE_TABS-->
<!--Elasticsearch job with two different versions sharing both the operation and api -->
``` json
{
"name": "test",
"lifecycle": "once",
"workers": 1,
"analytics": true,
"assets": [
"elasticsearch:4.0.2",
"elasticsearch:4.0.5"
],
"apis": [
{
"_name": "elasticsearch_sender_api@elasticsearch:4.0.5",
"index": "op_asset_version_test",
"preserve_id": true,
"size": 10000
},
{
"_name": "elasticsearch_reader_api@elasticsearch:4.0.2",
"index": "ts_test_example-1000",
"size": 10000,
"date_field_name": "created",
"preserve_id": true
}
],
"operations": [
{
"_op": "elasticsearch_reader@elasticsearch:4.0.2",
"api_name": "elasticsearch_reader_api@elasticsearch:4.0.2",
"index": "ts_test_example-1000",
"size": 10000,
"date_field_name": "created",
"preserve_id": true
},
{
"_op": "elasticsearch_bulk@elasticsearch:4.0.5",
"api_name": "elasticsearch_sender_api@elasticsearch:4.0.5",
"index": "op_asset_version_test",
"preserve_id": true,
"size": 10000
}
]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
2 changes: 1 addition & 1 deletion e2e/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"ms": "~2.1.3"
},
"devDependencies": {
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"bunyan": "~1.8.15",
"elasticsearch-store": "~1.6.0",
"fs-extra": "~11.2.0",
Expand Down
73 changes: 71 additions & 2 deletions e2e/test/cases/assets/simple-spec.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import 'jest-extended';
import fs from 'node:fs';
import os from 'os';
import path from 'path';
import os from 'node:os';
import path from 'node:path';
import decompress from 'decompress';
import archiver from 'archiver';
import {
Expand Down Expand Up @@ -72,6 +72,24 @@ describe('assets', () => {
{ blocking: true }
);

/*
jsnoble marked this conversation as resolved.
Show resolved Hide resolved
{
"name": "ex1",
"version": "0.0.1",
"node_version": 18,
"platform": false,
"arch": false
}

{
"name": "ex1",
"version": "0.1.1",
"node_version": 18,
"platform": false,
"arch": false
}

*/
const newerAssetPath = 'test/fixtures/assets/example_asset_1updated.zip';
const fileStream = fs.createReadStream(newerAssetPath);
// the asset on this job already points to 'ex1' so it should use the latest available asset
Expand Down Expand Up @@ -134,6 +152,57 @@ describe('assets', () => {

await ex.stop({ blocking: true });
});

it('will throw if there are naming conflicts', async () => {
const jobSpec = terasliceHarness.newJob('generator-asset');
// Set resource constraints on workers within CI
if (TEST_PLATFORM === 'kubernetes' || TEST_PLATFORM === 'kubernetesV2') {
jobSpec.resources_requests_cpu = 0.1;
}
// the previous test confirms the newer version will be used by default
// now we test to see if we can select the older version
jsnoble marked this conversation as resolved.
Show resolved Hide resolved
jobSpec.assets = ['ex1:0.0.1', 'ex1:0.1.1', 'standard', 'elasticsearch'];

await expect(terasliceHarness.submitAndStart(jobSpec)).rejects.toThrow();
});

it('will not throw if there are naming conflicts but you use asset identifiers', async () => {
const jobSpec = terasliceHarness.newJob('generator-asset');
// Set resource constraints on workers within CI
if (TEST_PLATFORM === 'kubernetes' || TEST_PLATFORM === 'kubernetesV2') {
jobSpec.resources_requests_cpu = 0.1;
}
// the previous test confirms the newer version will be used by default
// now we test to see if we can select the older version
jsnoble marked this conversation as resolved.
Show resolved Hide resolved
jobSpec.assets = ['ex1:0.0.1', 'ex1:0.1.1', 'standard', 'elasticsearch'];
jobSpec.operations = jobSpec.operations.map((op) => {
if (op._op === 'drop_property') {
return {
...op,
_op: 'drop_property@ex1:0.1.1'
};
}
return op;
});
const { workers } = jobSpec;

const assetResponse = await terasliceHarness.teraslice.assets.getAsset('ex1', '0.0.1');
const assetId = assetResponse[0].id;

const ex = await terasliceHarness.submitAndStart(jobSpec);

const waitResponse = await terasliceHarness.forWorkersJoined(
ex.id(),
workers as number,
25
);
expect(waitResponse).toEqual(workers);

const execution = await ex.config();
expect(execution.assets[0]).toEqual(assetId);

await ex.stop({ blocking: true });
});
});

describe('s3 asset storage', () => {
Expand Down
2 changes: 1 addition & 1 deletion packages/data-mate/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
},
"dependencies": {
"@terascope/data-types": "~1.6.0",
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"@terascope/utils": "~1.6.0",
"@types/validator": "~13.12.2",
"awesome-phonenumber": "~7.2.0",
Expand Down
2 changes: 1 addition & 1 deletion packages/data-types/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"@terascope/utils": "~1.6.0",
"graphql": "~16.9.0",
"yargs": "~17.7.2"
Expand Down
2 changes: 1 addition & 1 deletion packages/elasticsearch-api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"test:watch": "TEST_RESTRAINED_ELASTICSEARCH='true' ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"@terascope/utils": "~1.6.0",
"bluebird": "~3.7.2",
"setimmediate": "~1.0.5"
Expand Down
2 changes: 1 addition & 1 deletion packages/elasticsearch-store/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"dependencies": {
"@terascope/data-mate": "~1.6.0",
"@terascope/data-types": "~1.6.0",
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"@terascope/utils": "~1.6.0",
"ajv": "~8.17.1",
"ajv-formats": "~3.0.1",
Expand Down
3 changes: 2 additions & 1 deletion packages/job-components/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,15 @@
"test:watch": "ts-scripts test --watch . --"
},
"dependencies": {
"@terascope/types": "~1.3.1",
"@terascope/types": "~1.3.2",
"@terascope/utils": "~1.6.0",
"convict": "~6.2.4",
"convict-format-with-moment": "~6.2.0",
"convict-format-with-validator": "~6.2.0",
"datemath-parser": "~1.0.6",
"import-meta-resolve": "~4.1.0",
"prom-client": "~15.1.3",
"semver": "~7.6.3",
"uuid": "~11.0.3"
},
"devDependencies": {
Expand Down
2 changes: 1 addition & 1 deletion packages/job-components/src/execution-context/base.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ export default class BaseExecutionContext<T extends OperationLifeCycle> {
this.events.on('execution:add-to-lifecycle', this._handlers['execution:add-to-lifecycle']);

const executionConfig = cloneDeep(config.executionConfig);

this._loader = new OperationLoader({
terasliceOpPath: config.terasliceOpPath,
assetPath: config.context.sysconfig.teraslice.assets_directory,
});

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ import { APICore, OperationAPIType } from '../operations/index.js';
export interface ExecutionContextConfig {
context: Context;
executionConfig: ExecutionConfig;
terasliceOpPath?: string;
assetIds?: string[];
}

Expand Down
Loading
Loading