Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues How JSON is Being Written + Manually Stopping Collection #8

Open
taylorbreannaray opened this issue Dec 11, 2020 · 7 comments
Open

Comments

@taylorbreannaray
Copy link

taylorbreannaray commented Dec 11, 2020

Hi @KonradIT,

First off, thanks for your help before with the prior issue I opened. Second, thank you for developing this unofficial API in the first place—it has been of great use to me!

Now, to get to my point, I am still utilizing your experiment 02_multiple_hashtags.py. I have run into a couple of issues along the way and was wondering if you had any advice/clarification that you could provide:

  1. When the posts and links are being written to JSON files, there is always a closing square (i.e., "]") bracket being written in each of the files at arbitrary places. This, in turn, produces a JSON decoding error that claims it is expecting it to be the end of the file (even though it clearly isn't yet). For now, I have been going into the files myself and removing these so that it decodes to proper JSON.

  2. When I am manually choosing to stop the data collection myself before it has retrieved all possible posts, there are often instances for which it will write more objects to the Links file than it did to the corresponding Posts file. This results in having an uneven ratio of links to posts. Is there a way I can ensure that it stops writing at the same number for each?

This might just be something I have to fix myself if I am going to stop the script at random times. Not sure though.

Thanks!

@KonradIT
Copy link
Owner

Re: 1

I've redone the script and now it outputs to a CSV file, I found this approach much better for dumping it into a sqlite db (https://pypi.org/project/csv-to-sqlite/)

Re 2:

This can be fixed with a signal handler. such as how I did it here: https://github.com/KonradIT/parler-py-api/blob/master/experiments/00_suggested_hashtags.py#L26

@KonradIT
Copy link
Owner

The script has been now adjusted to write to a csv file, please pull from origin and let me know if it works.

@KonradIT
Copy link
Owner

KonradIT commented Jan 1, 2021

Hi @taylorbreannaray , can you confirm the fixes work for your usecase?

@taylorbreannaray
Copy link
Author

Hi @KonradIT,

So sorry for taking awhile to get back to you on that! Yes, the modifications you made seemed to work just fine back then. However, I am revisiting the API and wanting to collect more now that Parler is back up. I essentially didn't change anything, besides the hashtags being collected from (to reflect the latest data), and files aren't being written like they once were.

I also noticed that the JST cookie seems to expire within minutes - is that going to be a problem for collecting data?

@KonradIT
Copy link
Owner

KonradIT commented Mar 5, 2021

Hi, I haven't had my JST+MST keypair expired yet, in fact I've been collecting qanon related messages since a few days ago with no modifications to the library and didn't get any unauthorized error response.

@taylorbreannaray
Copy link
Author

Hmm... Interesting. I never had the issue when using your API prior either. However, now when I look at my JST and MST values for Parler in Chrome, it has my JST set to expire 5 minutes from when it was created. The MST isn't a problem, as it has it set to expire 2 months from the creation time. Is this a Chrome thing?

@KonradIT
Copy link
Owner

KonradIT commented Mar 5, 2021

Ah, yes my API will refresh the JST using the MST value if it expired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants