Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-59421: add validation for vector field aliases #1903

Merged
merged 7 commits into from
Dec 14, 2023

Conversation

moshaad7
Copy link
Contributor

@moshaad7 moshaad7 commented Nov 10, 2023

vector field Aliases (fields with same name and type as vector) must have same value for dimensions and similarity, respectively.

mapping/mapping_vectors.go Outdated Show resolved Hide resolved
@Thejas-bhat
Copy link
Member

i believe i follow the changes your trying to make over here. basically, the argument fieldAliasCtx map[string]*FieldMapping is passed all over document mappings that are there and then while validating a vector field, there is a check happening as to whether the same name is present in the map (due to aliasing) and then use the config of that entry to validate the provided similarity value etc. for this field. (please correct me if i'm wrong over here)

there is a cache object being passed in the documentMapping.Validate(cache) [basically the signature before the PR] which is also maintained throughout the parsing of the document mapping. Now, my suggestion over here is if this cache object can be leveraged to do the validations.

roughly speaking, my thinking revolves around possiblity of introducing a new field in the registry.Cache say "VectorCache" which maps field.Name -> &vecFieldConfig{simlarity, dims ... }

and during parsing perhaps what can be done is

if field.Type == "vector" {
	err := cache.ValidateVecConfig(field.Name, {field.Similarity, field.Dims})
	if err != nil {
		return err
	}
}

and inside the cache, you check that if there is an entry corresponding to the field.Name in the VectorCache, and then validate the passed config value with what's there in the cache entry. if the entry is not present already, just set the value.

the main reason why i'm suggesting this is because, i feel like the changes would be isolated to the VectorCache component and avoid change all the function signatures over here.

please let me know what you all think about this.

- If there are vector field Aliases
  (fields with same name and type as vector)
  across defaultMapping and typeMapping, Then,
  their dimensions and similarity value must
  be same
@moshaad7 moshaad7 force-pushed the validate_field_alias branch from 3e6ef06 to 4023370 Compare December 14, 2023 12:54
@moshaad7
Copy link
Contributor Author

i believe i follow the changes your trying to make over here. basically, the argument fieldAliasCtx map[string]*FieldMapping is passed all over document mappings that are there and then while validating a vector field, there is a check happening as to whether the same name is present in the map (due to aliasing) and then use the config of that entry to validate the provided similarity value etc. for this field. (please correct me if i'm wrong over here)

there is a cache object being passed in the documentMapping.Validate(cache) [basically the signature before the PR] which is also maintained throughout the parsing of the document mapping. Now, my suggestion over here is if this cache object can be leveraged to do the validations.

roughly speaking, my thinking revolves around possiblity of introducing a new field in the registry.Cache say "VectorCache" which maps field.Name -> &vecFieldConfig{simlarity, dims ... }

and during parsing perhaps what can be done is

if field.Type == "vector" {
	err := cache.ValidateVecConfig(field.Name, {field.Similarity, field.Dims})
	if err != nil {
		return err
	}
}

and inside the cache, you check that if there is an entry corresponding to the field.Name in the VectorCache, and then validate the passed config value with what's there in the cache entry. if the entry is not present already, just set the value.

the main reason why i'm suggesting this is because, i feel like the changes would be isolated to the VectorCache component and avoid change all the function signatures over here.

please let me know what you all think about this.

@Thejas-bhat ,
It won't be appropriate to use this cache object.
As, this cache is meant to load the relevant analysis pipeline objects.

For example, If in the index mapping, we specify "en" analyzer.
Then, during the Validation phase, while traversing the mapping, we will encounter the use of "en" and we call the AnalyzerNamed method of the cache. This will go to the analyzer registries and construct an instance of type Analyzer which can support "en" analyzer features.

And that way, after the completion of Validation, we will populate our cache with all the relevant analysis related objects (analyzers, tokenizers, .... )

Now,
If you look at theIndexMapping interface

type IndexMapping interface {
        ...
	DateTimeParserNamed(name string) analysis.DateTimeParser
	AnalyzerNamed(name string) analysis.Analyzer
}

During index and Search operations, we will call AnalyzerNamed method of the indexMapping to get the cached Analyzer object.


That's the whole use of this Registry.Cache.

I don't think fieldAliasCtx should go into this, because, we don't want to keep this object after Validation is over.

@abhinavdangeti also agreed to this.

@moshaad7 moshaad7 changed the title add validation for vector field aliases MB-59421: add validation for vector field aliases Dec 14, 2023
@abhinavdangeti abhinavdangeti merged commit b0f4cb4 into unstable Dec 14, 2023
9 checks passed
@abhinavdangeti abhinavdangeti deleted the validate_field_alias branch December 14, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants