Adaptation of sparql_dataframe to Wikidata #10

lbocken · 2021-07-26T16:52:51Z

Hello,

I am trying to extract dataframes from queries in Wikidata.

For instance, this code from an example in Wikidata works to extract dictionary of countries:

`# pip install sparqlwrapper # https://rdflib.github.io/sparqlwrapper/
import sparql_dataframe
import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """#Countries
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 wd:Q6256.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}"""

def get_results(endpoint_url, query):
user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
# TODO adjust user agent; see https://w.wiki/CX6
sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
return sparql.query().convert()

results = get_results(endpoint_url, query)

for result in results["results"]["bindings"]:
print(result)
`

When I do that :
df = sparql_dataframe.get(endpoint_url, query)

I receive this error:

C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py:1315: RuntimeWarning: Format requested was CSV, but XML (application/sparql-results+xml;charset=utf-8) has been returned by the endpoint
warnings.warn(message % (requested.upper(), format_name, mime), RuntimeWarning)

AttributeError Traceback (most recent call last)
in
----> 1 df = sparql_dataframe.get(endpoint_url, query)

C:\ProgramData\Anaconda3\lib\site-packages\sparql_dataframe\sparql_dataframe.py in get_sparql_dataframe(endpoint, query, post)
28 sparql.setReturnFormat(CSV)
29 results = sparql.query().convert()
---> 30 _csv = StringIO(results.decode('utf-8'))
31 return pd.read_csv(_csv, sep=",")

AttributeError: 'Document' object has no attribute 'decode'

The text was updated successfully, but these errors were encountered:

lawlesst · 2021-07-26T17:49:09Z

Hello,

Try passing post=True. E.g.:

sparql_dataframe.get(endpoint_url, query, post=True)

You can see in the unit tests that queries against Wikidata should work fine with post=True: https://github.com/lawlesst/sparql-dataframe/blob/master/tests/test_sparql_dataframe.py#L65

lbocken · 2021-07-26T20:14:03Z

HTTPError Traceback (most recent call last)
in
----> 1 df = sparql_dataframe.get(endpoint_url, query, post = True)
2 df

C:\ProgramData\Anaconda3\lib\site-packages\sparql_dataframe\sparql_dataframe.py in get_sparql_dataframe(endpoint, query, post)
27
28 sparql.setReturnFormat(CSV)
---> 29 results = sparql.query().convert()
30 _csv = StringIO(results.decode('utf-8'))
31 return pd.read_csv(_csv, sep=",")

C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in query(self)
1105 :rtype: :class:QueryResult instance
1106 """
-> 1107 return QueryResult(self._query())
1108
1109 def queryAndConvert(self):

C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in _query(self)
1085 raise EndPointInternalError(e.read())
1086 else:
-> 1087 raise e
1088
1089 def query(self):

C:\ProgramData\Anaconda3\lib\site-packages\SPARQLWrapper\Wrapper.py in _query(self)
1071 response = urlopener(request, timeout=self.timeout)
1072 else:
-> 1073 response = urlopener(request)
1074 return response, self.returnFormat
1075 except urllib.error.HTTPError as e:

C:\ProgramData\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):

C:\ProgramData\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532
533 return response

C:\ProgramData\Anaconda3\lib\urllib\request.py in http_response(self, request, response)
638 # request was successfully received, understood, and accepted.
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http', request, response, code, msg, hdrs)
642

C:\ProgramData\Anaconda3\lib\urllib\request.py in error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that knows when it makes

C:\ProgramData\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
500 for handler in handlers:
501 func = getattr(handler, meth_name)
--> 502 result = func(*args)
503 if result is not None:
504 return result

C:\ProgramData\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

lawlesst · 2021-07-26T20:38:12Z

I think that's an error returned by the actual Wikidata SPARQL endpoint. It aggressively rate limits.

lbockenrs · 2021-11-22T16:42:35Z

Hello,

Try passing post=True. E.g.:

sparql_dataframe.get(endpoint_url, query, post=True)

You can see in the unit tests that queries against Wikidata should work fine with post=True: https://github.com/lawlesst/sparql-dataframe/blob/master/tests/test_sparql_dataframe.py#L65

How would you read the query saved into a separate file? Thanks for your help !

lawlesst · 2021-11-22T17:03:45Z

If your queries are saved in a text file, then you would just read them in like any other text file in Python and save them to a query variable that you would use with sparql_dataframe.get.

Here's a tutorial on reading and writing files in Python: https://realpython.com/read-write-files-python/#reading-and-writing-opened-files

lbockenrs · 2021-11-22T18:28:02Z

This works :

import sparql_dataframe endpoint_url = "https://query.wikidata.org/sparql" with open('query.rq', 'r') as file: query = file.read() df = sparql_dataframe.get(endpoint_url, query, post = True) df

hbruch · 2023-02-26T21:39:25Z

Just had the same issue issue querying wikidata. First thought, it might be caused by a version change (SPARQLWrapper was installed in version 2.0.0). It now already contains get_sparql_dataframe, so the code below was successful.

Nevertheless, thanks for creating this lib which made it directly into the wrapper!

from SPARQLWrapper import get_sparql_dataframe

endpoint = "https://query.wikidata.org/sparql"

query = """#Countries
SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P31 wd:Q6256.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
"""
df = get_sparql_dataframe(endpoint, query)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptation of sparql_dataframe to Wikidata #10

Adaptation of sparql_dataframe to Wikidata #10

lbocken commented Jul 26, 2021 •

edited

Loading

lawlesst commented Jul 26, 2021

lbocken commented Jul 26, 2021

lawlesst commented Jul 26, 2021

lbockenrs commented Nov 22, 2021

lawlesst commented Nov 22, 2021

lbockenrs commented Nov 22, 2021 •

edited

Loading

hbruch commented Feb 26, 2023

Adaptation of sparql_dataframe to Wikidata #10

Adaptation of sparql_dataframe to Wikidata #10

Comments

lbocken commented Jul 26, 2021 • edited Loading

lawlesst commented Jul 26, 2021

lbocken commented Jul 26, 2021

lawlesst commented Jul 26, 2021

lbockenrs commented Nov 22, 2021

lawlesst commented Nov 22, 2021

lbockenrs commented Nov 22, 2021 • edited Loading

hbruch commented Feb 26, 2023

lbocken commented Jul 26, 2021 •

edited

Loading

lbockenrs commented Nov 22, 2021 •

edited

Loading