Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow outer Solution Modifiers for SPARQL inputs #126

Closed
lu-pl opened this issue Oct 30, 2024 · 8 comments · May be fixed by #129
Closed

Disallow outer Solution Modifiers for SPARQL inputs #126

lu-pl opened this issue Oct 30, 2024 · 8 comments · May be fixed by #129
Assignees
Labels
enhancement New feature or request

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Oct 30, 2024

rdfproxy.SPARQLModelAdapter implements pagination by dynamically modifying the supplied SPARQL query.

For ungrouped models, OFFSET and LIMIT modifiers are injected, for grouped models a subquery (based on the initial query) is generated and injected into the initial query.

E.g. for

select ?parent ?child ?name
where {
    values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
    }
}

and a group_by definition in the respective model specifying "parent" as the grouping value, the following query gets generated:

select ?parent ?child ?name
where {
    values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
    }

  {
    select distinct ?parent
    where {
      values (?parent ?child ?name) {
        ('x' 'c' 'foo')
        ('y' 'd' UNDEF)
        ('y' 'e' UNDEF)
        ('z' UNDEF UNDEF)
      }
    }
    order by ?parent
    limit 2
    offset 0
  }
}

See #114.

This raises the question though, how to handle input queries that themselves define solution modifiers.
rdfproxy currently simply does not handle that case, passing a query with solution modifiers is undefined behavior and will almost certainly lead to unintended effects. So the case urgently MUST be handled.

The easiest and probably best solution to that problem is to simply disallow solution modifiers for SPARQL inputs.

First of all, I cannot really think of a reason to use SPARQL queries with solution modifiers in rdfproxy. LIMIT and OFFSET are handled by rdfproxy pagination and GROUP BY is handled by rdfproxy's mapping and grouping mechanism.

Secondly, proper handling of that case would come with significant overhead and difficulties. Apart from the lack of proper SPARQL query analysis facilities for Python, one would also need to think about and decide, what cases of solution modifiers in an inbound query could even be meaningful in the context of rdfproxy.

@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 30, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not for cleanly extracting possible solution modifiers.

@lu-pl lu-pl self-assigned this Oct 30, 2024
@lu-pl lu-pl mentioned this issue Oct 30, 2024
3 tasks
@lu-pl
Copy link
Contributor Author

lu-pl commented Oct 30, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not to cleanly extract possible solution modifiers.

Note: Handling of nested queries with solution modifiers is much more complicated and very likely not even necessary.

lu-pl added a commit that referenced this issue Oct 31, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Nov 1, 2024

Possible predicate function for checking a query for solution modifiers:

def query_has_solution_modifiers(query: str) -> bool:
    """Predicate for checking if a SPARQL query has a solution modifier."""
    pattern = r"}[^}]*\w+$"
    result = re.search(pattern, query)
    return bool(result)

The regex matches any word characters after (and including) the last '}'. So this is simply for checking the existence of a solution modifier, not for cleanly extracting possible solution modifiers.

Consider (?<=})\s+([^}]*)\Z.

The combination of positive lookbehind and absolute anchor is able to match the solution modifier of the outermost SPARQL clause regardless of wether multiline mode is active. Also the group captures the solution modifier more cleanly.

@lu-pl lu-pl added this to the Query checking milestone Nov 7, 2024
@lu-pl lu-pl added the enhancement New feature or request label Nov 7, 2024
@kevinstadler kevinstadler removed this from the Query checking milestone Dec 18, 2024
@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 21, 2025

Actually, there is a very trivial solution for allowing incoming queries with solution modifiers, i.e. simply providing a very thin SPARQL wrapper in case a solution modifier is detected.

E.g. a query with a solution modifier

select ?s
where {
    ?s ?p ?o .
}
limit 10

can be thinly wrapped like

select ?s
where {
  {
    select ?s
    where {
      ?s ?p ?o .
    }
    limit 10
  }
}
limit 2
offset 0

LIMIT/OFFSET in that case represent RDFProxy query modifications, which could then be applicable without problems.

I think a warning should be emitted if a query with solution modifiers is detected though, there simply is no/rarely need for passing queries with solution modifiers to RDFProxy.

@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Actually, there is a very trivial solution for allowing incoming queries with solution modifiers, i.e. simply providing a very thin SPARQL wrapper in case a solution modifier is detected.

E.g. a query with a solution modifier

select ?s
where {
?s ?p ?o .
}
limit 10

can be thinly wrapped like

select ?s
where {
{
select ?s
where {
?s ?p ?o .
}
limit 10
}
}
limit 2
offset 0

LIMIT/OFFSET in that case represent RDFProxy query modifications, which could then be applicable without problems.

I think a warning should be emitted if a query with solution modifiers is detected though, there simply is no/rarely need for passing queries with solution modifiers to RDFProxy.

This solution is only applicable for ungrouped models, grouped models significantly more complicated and I am currently having trouble even imagining a potential sane use case for allowing solution modifiers for grouped models.

@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Actually, I am in favor of generally disallowing solution modifiers. All that can be done with solution modifiers can be done with RDFProxy parameters and elaborately nesting and wrapping queries if solution modifiers are detected very likely would lead to unexpected results for users.

@lu-pl lu-pl changed the title Disallow Solution Modifiers for SPARQL inputs Handle or disallow Solution Modifiers for SPARQL inputs Jan 24, 2025
@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 24, 2025

Given a grouped model, for a simple query with solution modifier like

select ?x ?y
where {
  values (?x ?y) {
    (1 2)
    (1 3)
    (2 4)
    (3 5)
  }
}
limit 2

the following steps are necessary for handling the query:

  1. thinly wrap the query
  2. construct the subquery based on the initial query with the solution modifiers removed
  3. inject this subquery into the wrapped query

So for the query above, the following query would need to get generated:

select ?x ?y
where {
  {
    select ?x ?y
    where {
      values (?x ?y) {
	(1 2)
	(1 3)
	(2 4)
	(3 5)
      }
    }
    limit 2
  }
  # injected
  {
    select distinct ?x
    where {
      values (?x ?y) {
	(1 2)
	(1 3)
	(2 4)
	(3 5)
      }
    }
    order by ?x limit 2 offset 0
  }
}

Again, I wonder if generally disallowing outer solution modifiers would not be the better option here.

@lu-pl lu-pl changed the title Handle or disallow Solution Modifiers for SPARQL inputs Disallow outer Solution Modifiers for SPARQL inputs Jan 27, 2025
@lu-pl
Copy link
Contributor Author

lu-pl commented Jan 27, 2025

Possible handling of outer-level SPARQL solution modifiers is a separate issue, see #206.

For now, outer solution modifiers should be disallowed.

lu-pl added a commit that referenced this issue Jan 27, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises.
This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116. Closes #126.
lu-pl added a commit that referenced this issue Jan 27, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises.
This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116. Closes #126.
lu-pl added a commit that referenced this issue Jan 28, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises. This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116. Closes #126.
lu-pl added a commit that referenced this issue Jan 29, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises. This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Closes #116. Closes #126.
lu-pl added a commit that referenced this issue Jan 29, 2025
The change introduces a check_query callable which runs an extensible
compose pipeline of query checkers.

Note regarding QueryParseException: This custom exception is intended
to be a thin wrapper around a pyparsing ParseException that RDFLib
raises. This avoids introducing pyparsing as a dependency just to be able to
test against this exception. I feel like RDFLib should not raise a
pyparsing exception but provide a thin wrapper itself.
See RDFLib/rdflib#3057.

The check_query function runs in SPARQLModelAdapter to enable fast
failures on inapplicable queries. Note that this somewhat couples
QueryConstructor to SPARQLModelAdapter; QueryConstructor should be
marked private for this reason.

Possible handling of queries with outer-level solution modifiers is
discussed in issue #206.

Closes #116. Closes #126.
@lu-pl lu-pl closed this as completed in d25b5af Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants