This repository has been archived by the owner on Dec 14, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 87
stories_public/list empty story_tags
when more than 100 rows requested
#729
Labels
Comments
Prep: >>> import mediacloud.api, json, datetime as dt
>>> import pprint
>>> pp = pprint.PrettyPrinter(indent=4)
>>> mc = mediacloud.api.MediaCloud('YOUR_KEY')
>>> tag_sets_id = mediacloud.tags.TAG_SET_NYT_THEMES_VERSION
>>> q = '*'
>>> fq = mc.dates_as_query_clause(dt.date(2020,8,20), dt.date(2020,8,24)) 99 stories - >>> pp.pprint(mc.storyList('robot', mc.dates_as_query_clause(dt.date(2020,8,2), dt.date(2020,8,3)), rows=99)[0])
{ 'ap_syndicated': False,
'collect_date': '2020-03-09 18:44:54.488650',
'feeds': None,
'guid': 'https://www.sinembargo.mx/08-02-2020/3727176',
'language': 'es',
'media_id': 105831,
'media_name': 'sinembargo.mx',
'media_url': 'http://sinembargo.mx/#spider',
'metadata': { 'date_guess_method': { 'stories_id': 1543287159,
'tag': 'guess_by_unknown',
'tag_set': 'date_guess_method',
'tag_sets_id': 508,
'tags_id': 50741492},
'extractor_version': { 'stories_id': 1543287159,
'tag': 'readability-lxml-0.7',
'tag_set': 'extractor_version',
'tag_sets_id': 1354,
'tags_id': 81092444},
'geocoder_version': None,
'nyt_themes_version': None},
'processed_stories_id': 1950370689,
'publish_date': '2020-08-02 00:00:00',
'stories_id': 1543287159,
'story_tags': [ { 'stories_id': 1543287159,
'tag': 'guess_by_unknown',
'tag_set': 'date_guess_method',
'tag_sets_id': 508,
'tags_id': 50741492},
{ 'stories_id': 1543287159,
'tag': 'readability-lxml-0.7',
'tag_set': 'extractor_version',
'tag_sets_id': 1354,
'tags_id': 81092444}],
'title': 'Penaut, el robot que alimenta a personas en cuarentena por '
'Coronavirus en un hotel de China',
'url': 'https://www.sinembargo.mx/08-02-2020/3727176',
'word_count': None} 100 stories - >>> pp.pprint(mc.storyList('robot', mc.dates_as_query_clause(dt.date(2020,8,2), dt.date(2020,8,3)), rows=100)[0])
{ 'ap_syndicated': False,
'collect_date': '2020-03-09 18:44:54.488650',
'feeds': None,
'guid': 'https://www.sinembargo.mx/08-02-2020/3727176',
'language': 'es',
'media_id': 105831,
'media_name': 'sinembargo.mx',
'media_url': 'http://sinembargo.mx/#spider',
'metadata': { 'date_guess_method': { 'stories_id': 1543287159,
'tag': 'guess_by_unknown',
'tag_set': 'date_guess_method',
'tag_sets_id': 508,
'tags_id': 50741492},
'extractor_version': { 'stories_id': 1543287159,
'tag': 'readability-lxml-0.7',
'tag_set': 'extractor_version',
'tag_sets_id': 1354,
'tags_id': 81092444},
'geocoder_version': None,
'nyt_themes_version': None},
'processed_stories_id': 1950370689,
'publish_date': '2020-08-02 00:00:00',
'stories_id': 1543287159,
'story_tags': [ { 'stories_id': 1543287159,
'tag': 'guess_by_unknown',
'tag_set': 'date_guess_method',
'tag_sets_id': 508,
'tags_id': 50741492},
{ 'stories_id': 1543287159,
'tag': 'readability-lxml-0.7',
'tag_set': 'extractor_version',
'tag_sets_id': 1354,
'tags_id': 81092444}],
'title': 'Penaut, el robot que alimenta a personas en cuarentena por '
'Coronavirus en un hotel de China',
'url': 'https://www.sinembargo.mx/08-02-2020/3727176',
'word_count': None} 101 rows - >>> pp.pprint(mc.storyList('robot', mc.dates_as_query_clause(dt.date(2020,8,2), dt.date(2020,8,3)), rows=101)[0])
{ 'ap_syndicated': False,
'collect_date': '2020-03-09 18:44:54.488650',
'feeds': None,
'guid': 'https://www.sinembargo.mx/08-02-2020/3727176',
'language': 'es',
'media_id': 105831,
'media_name': 'sinembargo.mx',
'media_url': 'http://sinembargo.mx/#spider',
'metadata': { 'date_guess_method': None,
'extractor_version': None,
'geocoder_version': None,
'nyt_themes_version': None},
'processed_stories_id': 1950370689,
'publish_date': '2020-08-02 00:00:00',
'stories_id': 1543287159,
'story_tags': [],
'title': 'Penaut, el robot que alimenta a personas en cuarentena por '
'Coronavirus en un hotel de China',
'url': 'https://www.sinembargo.mx/08-02-2020/3727176',
'word_count': None} |
I think this backend/apps/webapp-api/src/perl/MediaWords/Controller/Api/V2/StoriesBase.pm Lines 279 to 302 in 12a0c0e
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
(Moved from #725.)
More confusingly - asking to page with more rows than 100 seems to make the story_tags disaster in results.
This code returns a story 105831 with story_tags on it:
But this call, with
rows=200
returns the same story with NO story_tags on it:The text was updated successfully, but these errors were encountered: