Merge pull request #43 from andrlik/master

Mastodon support from @andrlik. Closes #42
tommeagher · Feb 12, 2018 · 80d51cd · 80d51cd
2 parents 1519bc9 + 467443f
commit 80d51cd
Show file tree

Hide file tree

Showing 4 changed files with 142 additions and 63 deletions.
diff --git a/README.md b/README.md
@@ -7,20 +7,22 @@ This project should work in the latest releases of Python 2.7 and Python 3. By d
 ## Setup
 
 1. Clone this repo
-2. Create a Twitter account that you will post to.
-3. Sign into https://dev.twitter.com/apps with the same login and create an application. Make sure that your application has read and write permissions to make POST requests.
-4. Make a copy of the `local_settings_example.py` file and name it `local_settings.py`
-5. Take the consumer key (and secret) and access token (and secret) from your Twiter application and paste them into the appropriate spots in `local_settings.py`.
+2. Make a copy of the `local_settings_example.py` file and name it `local_settings.py`
+3. If posting to Twitter, create a Twitter account that you will post to.
+4. Sign into https://dev.twitter.com/apps with the same login and create an application. Make sure that your application has read and write permissions to make POST requests.
+5. Set `ENABLE_TWITTER` to `True`. Take the consumer key (and secret) and access token (and secret) from your Twiter application and paste them into the appropriate spots in `local_settings.py`.
 6. In `local_settings.py`, be sure to add the handle of the Twitter user you want your _ebooks account to be based on. To make your tweets go live, change the `DEBUG` variable to `False`.
-7. Create an account at Heroku, if you don't already have one. [Install the Heroku toolbelt](https://devcenter.heroku.com/articles/quickstart#step-2-install-the-heroku-toolbelt) and set your Heroku login on the command line.
-8. Type the command `heroku create` to generate the _ebooks Python app on the platform that you can schedule.
-9. The only Python requirement for this script is [python-twitter](https://github.com/bear/python-twitter), the `pip install` of which is handled by Heroku automatically.
-9. `git commit -am 'updated the local_settings.py'`
-10. `git push heroku master`
-11. Test your upload by typing `heroku run worker`. You should either get a response that says "3, no, sorry, not this time" or a message with the body of your post. If you get the latter, check your _ebooks Twitter account to see if it worked.
-12. Now it's time to configure the scheduler. `heroku addons:create scheduler:standard`
-13. Once that runs, type `heroku addons:open scheduler`. This will open up a browser window where you can adjust the time interval for the script to run. The scheduled command should be `python ebooks.py`. I recommend setting it at one hour.
-14. Sit back and enjoy the fruits of your labor.
+7. If you also want to include Mastodon as a source set `ENABLE_MASTODON` to `True` and you'll need to create a Mastodon account to send to on an instance like [botsin.space](https://botsin.space).
+8. After creating the Mastodon account, open a python prompt in your project directory and follow the [directions below](#mastodon-setup). Update your `local_settings.py` file with the filenames of the generated client secret and user credential secret files.
+9. Create an account at Heroku, if you don't already have one. [Install the Heroku toolbelt](https://devcenter.heroku.com/articles/quickstart#step-2-install-the-heroku-toolbelt) and set your Heroku login on the command line.
+10. Type the command `heroku create` to generate the _ebooks Python app on the platform that you can schedule.
+11. The only Python requirements for this script are [python-twitter](https://github.com/bear/python-twitter), Mastodon.py, and BeautfulSoup; the `pip install` of which is handled by Heroku automatically.
+12. `git commit -am 'updated the local_settings.py'`
+13. `git push heroku master`
+14. Test your upload by typing `heroku run worker`. You should either get a response that says "3, no, sorry, not this time" or a message with the body of your post. If you get the latter, check your _ebooks Twitter account to see if it worked.
+15. Now it's time to configure the scheduler. `heroku addons:create scheduler:standard`
+16. Once that runs, type `heroku addons:open scheduler`. This will open up a browser window where you can adjust the time interval for the script to run. The scheduled command should be `python ebooks.py`. I recommend setting it at one hour.
+17. Sit back and enjoy the fruits of your labor.
 
 
 ## Configuring
@@ -72,8 +74,25 @@ After that, commit the change and `git push heroku master`. Then run the command
 
 If you want to avoid hitting the Twitter API and instead want to use a static text file, you can do that. First, create a text file containing a Python list of quote-wrapped tweets. Then set the `STATIC_TEST` variable to `True`. Finally, specify the name of text file using the `TEST_SOURCE` variable in `local_settings.py`
 
+## Mastodon Setup
+
+You only need to do this once!
+
+```python
+>>> from mastodon import Mastodon
+>>> Mastodon.create_app('pytooterapp', api_base_url='YOUR INSTANCE URL', to_file='YOUR_FILENAME_HERE')
+```
+
+Then, create a user credential file. NOTE: Your bot has to follow your source account.
+
+```python
+>>> mastodon = Mastodon(client_id='YOUR_FILENAME_HERE', api_base_url='YOUR INSTANCE URL')
+>>> mastodon.log_in('[email protected]','incrediblygoodpassword',to_file='YOUR USER FILENAME HERE')
+```
+
+Commit those two files to your repository and you can toot away.
 
 ## Credit
 As I said, this is based almost entirely on [@harrisj's](https://twitter.com/harrisj) [iron_ebooks](https://github.com/harrisj/iron_ebooks/). He created it in Ruby, and I wanted to port it to Python. All the credit goes to him. As a result, all of the blame for clunky implementation in Python fall on me.
 
-Many thanks to the [many folks who have contributed](CONTRIBUTORS.md) to the development of this project since it was open sourced in 2013. If you see ways to improve the code, please fork it and send a [pull request](https://github.com/tommeagher/heroku_ebooks/pulls), or [file an issue](https://github.com/tommeagher/heroku_ebooks/issues) for me, and I'll address it.
+Many thanks to the [many folks who have contributed](CONTRIBUTORS.md) to the development of this project since it was open sourced in 2013. If you see ways to improve the code, please fork it and send a [pull request](https://github.com/tommeagher/heroku_ebooks/pulls), or [file an issue](https://github.com/tommeagher/heroku_ebooks/issues) for me, and I'll address it.
diff --git a/ebooks.py b/ebooks.py
@@ -2,6 +2,7 @@
 import re
 import sys
 import twitter
+from mastodon import Mastodon
 import markov
 from bs4 import BeautifulSoup
 try:
@@ -16,11 +17,15 @@
 from local_settings import *
 
 
-def connect():
-    return twitter.Api(consumer_key=MY_CONSUMER_KEY,
+def connect(type='twitter'):
+    if type == 'twitter':
+        return twitter.Api(consumer_key=MY_CONSUMER_KEY,
                        consumer_secret=MY_CONSUMER_SECRET,
                        access_token_key=MY_ACCESS_TOKEN_KEY,
                        access_token_secret=MY_ACCESS_TOKEN_SECRET)
+    elif type == 'mastodon':
+        return Mastodon(client_id=CLIENT_CRED_FILENAME, api_base_url=MASTODON_API_BASE_URL, access_token=USER_ACCESS_FILENAME)
+    return None
 
 
 def entity(text):
@@ -34,6 +39,8 @@ def entity(text):
             pass
     else:
         guess = text[1:-1]
+        if guess == "apos":
+            guess = "lsquo"
         numero = n2c[guess]
         try:
             text = chr(numero)
@@ -42,17 +49,18 @@ def entity(text):
     return text
 
 
-def filter_tweet(tweet):
-    tweet.text = re.sub(r'\b(RT|MT) .+', '', tweet.text)  # take out anything after RT or MT
-    tweet.text = re.sub(r'(\#|@|(h\/t)|(http))\S+', '', tweet.text)  # Take out URLs, hashtags, hts, etc.
-    tweet.text = re.sub('\s+', ' ', tweet.text)  # collaspse consecutive whitespace to single spaces.
-    tweet.text = re.sub(r'\"|\(|\)', '', tweet.text)  # take out quotes.
-    tweet.text = re.sub(r'\s+\(?(via|says)\s@\w+\)?', '', tweet.text)  # remove attribution
-    htmlsents = re.findall(r'&\w+;', tweet.text)
+def filter_status(text):
+    text = re.sub(r'\b(RT|MT) .+', '', text)  # take out anything after RT or MT
+    text = re.sub(r'(\#|@|(h\/t)|(http))\S+', '', text)  # Take out URLs, hashtags, hts, etc.
+    text = re.sub('\s+', ' ', text)  # collaspse consecutive whitespace to single spaces.
+    text = re.sub(r'\"|\(|\)', '', text)  # take out quotes.
+    text = re.sub(r'\s+\(?(via|says)\s@\w+\)?', '', text)  # remove attribution
+    text = re.sub(r'<[^>]*>','', text) #strip out html tags from mastodon posts
+    htmlsents = re.findall(r'&\w+;', text)
     for item in htmlsents:
-        tweet.text = tweet.text.replace(item, entity(item))
-    tweet.text = re.sub(r'\xe9', 'e', tweet.text)  # take out accented e
-    return tweet.text
+        text = text.replace(item, entity(item))
+    text = re.sub(r'\xe9', 'e', text)  # take out accented e
+    return text
 
 
 def scrape_page(src_url, web_context, web_attributes):
@@ -96,7 +104,7 @@ def grab_tweets(api, max_id=None):
     if user_tweets:
         max_id = user_tweets[-1].id - 1
         for tweet in user_tweets:
-            tweet.text = filter_tweet(tweet)
+            tweet.text = filter_status(tweet.text)
             if re.search(SOURCE_EXCLUDE, tweet.text):
                 continue
             if tweet.text:
@@ -105,6 +113,20 @@ def grab_tweets(api, max_id=None):
         pass
     return source_tweets, max_id
 
+def grab_toots(api, account_id=None,max_id=None):
+    if account_id:
+        source_toots = []
+        user_toots = api.account_statuses(account_id)
+        max_id = user_toots[len(user_toots)-1]['id']-1
+        for toot in user_toots:
+            if toot['in_reply_to_id'] or toot['reblog']:
+                pass #skip this one
+            else:
+                toot['content'] = filter_status(toot['content'])
+                if len(toot['content']) != 0:
+                    source_toots.append(toot['content'])
+        return source_toots, max_id
+
 if __name__ == "__main__":
     order = ORDER
     guess = 0
@@ -116,18 +138,18 @@ def grab_tweets(api, max_id=None):
         sys.exit()
     else:
         api = connect()
-        source_tweets = []
+        source_statuses = []
         if STATIC_TEST:
             file = TEST_SOURCE
             print(">>> Generating from {0}".format(file))
             string_list = open(file).readlines()
             for item in string_list:
-                source_tweets += item.split(",")
+                source_statuses += item.split(",")
         if SCRAPE_URL:
-            source_tweets += scrape_page(SRC_URL, WEB_CONTEXT, WEB_ATTRIBUTES)
-        if SOURCE_ACCOUNTS and len(SOURCE_ACCOUNTS[0]) > 0:
+            source_statuses += scrape_page(SRC_URL, WEB_CONTEXT, WEB_ATTRIBUTES)
+        if ENABLE_TWITTER_SOURCES and TWITTER_SOURCE_ACCOUNTS and len(TWITTER_SOURCE_ACCOUNTS[0]) > 0:
             twitter_tweets = []
-            for handle in SOURCE_ACCOUNTS:
+            for handle in TWITTER_SOURCE_ACCOUNTS:
                 user = handle
                 handle_stats = api.GetUser(screen_name=user)
                 status_count = handle_stats.statuses_count
@@ -141,53 +163,79 @@ def grab_tweets(api, max_id=None):
                     print("Error fetching tweets from Twitter. Aborting.")
                     sys.exit()
                 else:
-                    source_tweets += twitter_tweets
+                    source_statuses += twitter_tweets
+        if ENABLE_MASTODON_SOURCES and len(MASTODON_SOURCE_ACCOUNTS) > 0:
+            source_toots = []
+            mastoapi = connect(type='mastodon')
+            max_id=None
+            for handle in MASTODON_SOURCE_ACCOUNTS:
+                accounts = mastoapi.account_search(handle)
+                if len(accounts) != 1:
+                    pass # Ambiguous search
+                else:
+                    account_id = accounts[0]['id']
+                    num_toots = accounts[0]['statuses_count']
+                    if num_toots < 3200:
+                        my_range = int((num_toots/200)+1)
+                    else:
+                        my_range = 17
+                    for x in range(my_range)[1:]:
+                        source_toots_iter, max_id = grab_toots(mastoapi,account_id, max_id=max_id)
+                        source_toots += source_toots_iter
+                    print("{0} toots found from {1}".format(len(source_toots), handle))
+                    if len(source_toots) == 0:
+                        print("Error fetching toots for %s. Aborting." % handle)
+                        sys.exit()
+            source_statuses += source_toots
+        if len(source_statuses) == 0:
+            print("No statuses found!")
+            sys.exit()
         mine = markov.MarkovChainer(order)
-        for tweet in source_tweets:
-            if not re.search('([\.\!\?\"\']$)', tweet):
-                tweet += "."
-            mine.add_text(tweet)
-
+        for status in source_statuses:
+            if not re.search('([\.\!\?\"\']$)', status):
+                status += "."
+            mine.add_text(status)
         for x in range(0, 10):
-            ebook_tweet = mine.generate_sentence()
+            ebook_status = mine.generate_sentence()
 
         # randomly drop the last word, as Horse_ebooks appears to do.
-        if random.randint(0, 4) == 0 and re.search(r'(in|to|from|for|with|by|our|of|your|around|under|beyond)\s\w+$', ebook_tweet) is not None:
+        if random.randint(0, 4) == 0 and re.search(r'(in|to|from|for|with|by|our|of|your|around|under|beyond)\s\w+$', ebook_status) is not None:
             print("Losing last word randomly")
-            ebook_tweet = re.sub(r'\s\w+.$', '', ebook_tweet)
-            print(ebook_tweet)
+            ebook_status = re.sub(r'\s\w+.$', '', ebook_status)
+            print(ebook_status)
 
         # if a tweet is very short, this will randomly add a second sentence to it.
-        if ebook_tweet is not None and len(ebook_tweet) < 40:
+        if ebook_status is not None and len(ebook_status) < 40:
             rando = random.randint(0, 10)
             if rando == 0 or rando == 7:
                 print("Short tweet. Adding another sentence randomly")
-                newer_tweet = mine.generate_sentence()
-                if newer_tweet is not None:
-                    ebook_tweet += " " + mine.generate_sentence()
+                newer_status = mine.generate_sentence()
+                if newer_status is not None:
+                    ebook_status += " " + mine.generate_sentence()
                 else:
-                    ebook_tweet = ebook_tweet
+                    ebook_status = ebook_status
             elif rando == 1:
                 # say something crazy/prophetic in all caps
                 print("ALL THE THINGS")
-                ebook_tweet = ebook_tweet.upper()
+                ebook_status = ebook_status.upper()
 
         # throw out tweets that match anything from the source account.
-        if ebook_tweet is not None and len(ebook_tweet) < 110:
-            for tweet in source_tweets:
-                if ebook_tweet[:-1] not in tweet:
+        if ebook_status is not None and len(ebook_status) < 210:
+            for status in source_statuses:
+                if ebook_status[:-1] not in status:
                     continue
                 else:
-                    print("TOO SIMILAR: " + ebook_tweet)
+                    print("TOO SIMILAR: " + ebook_status)
                     sys.exit()
 
             if not DEBUG:
-                status = api.PostUpdate(ebook_tweet)
-                print(status.text.encode('utf-8'))
-            else:
-                print(ebook_tweet)
-
-        elif not ebook_tweet:
-            print("Tweet is empty, sorry.")
+                if ENABLE_TWITTER_POSTING:
+                    status = api.PostUpdate(ebook_status)
+                if ENABLE_MASTODON_POSTING:
+                    status = mastoapi.toot(ebook_status)
+            print(ebook_status)
+
+        elif not ebook_status:
+            print("Status is empty, sorry.")
         else:
-            print("TOO LONG: " + ebook_tweet)
+            print("TOO LONG: " + ebook_status)
diff --git a/local_settings_example.py b/local_settings_example.py
@@ -2,14 +2,25 @@
 Local Settings for a heroku_ebooks account. 
 '''
 
-# Twitter API configuration
+# Configuration for Twitter API
+ENABLE_TWITTER_SOURCES = True # Fetch twitter statuses as source
+ENABLE_TWITTER_POSTING = True # Tweet resulting status?
 MY_CONSUMER_KEY = 'Your Twitter API Consumer Key'
 MY_CONSUMER_SECRET = 'Your Consumer Secret Key'
 MY_ACCESS_TOKEN_KEY = 'Your Twitter API Access Token Key'
 MY_ACCESS_TOKEN_SECRET = 'Your Access Token Secret'
 
-# Sources (Twitter, local text file or a web page)
+# Configuration for Mastodon API
+ENABLE_MASTODON_SOURCES = False # Fetch mastodon statuses as a source?
+ENABLE_MASTODON_POSTING = False # Toot resulting status?
+MASTODON_API_BASE_URL = "" # an instance url like https://botsin.space
+CLIENT_CRED_FILENAME = '' # the MASTODON client secret file you created for this project
+USER_ACCESS_FILENAME = '' # The MASTODON user credential file you created at installation.
+
+# Sources (Twitter, Mastodon, local text file or a web page)
 SOURCE_ACCOUNTS = [""]  # A list of comma-separated, quote-enclosed Twitter handles of account that you'll generate tweets based on. It should look like ["account1", "account2"]. If you want just one account, no comma needed.
+TWITTER_SOURCE_ACCOUNTS = [""]  # A list of comma-separated, quote-enclosed Twitter handles of account that you'll generate tweets based on. It should look like ["account1", "account2"]. If you want just one account, no comma needed.
+MASTODON_SOURCE_ACCOUNTS = [""] # A list, e.g. ["@[email protected]"]
 SOURCE_EXCLUDE = r'^$'  # Source tweets that match this regexp will not be added to the Markov chain. You might want to filter out inappropriate words for example.
 STATIC_TEST = False  # Set this to True if you want to test Markov generation from a static file instead of the API.
 TEST_SOURCE = ".txt"  # The name of a text file of a string-ified list for testing. To avoid unnecessarily hitting Twitter API. You can use the included testcorpus.txt, if needed.
@@ -22,4 +33,4 @@
 ORDER = 2  # How closely do you want this to hew to sensical? 2 is low and 4 is high.
 
 DEBUG = True  # Set this to False to start Tweeting live
-TWEET_ACCOUNT = ""  # The name of the account you're tweeting to.
+TWEET_ACCOUNT = ""  # The name of the account you're tweeting to.
diff --git a/requirements.txt b/requirements.txt
@@ -1,2 +1,3 @@
 python-twitter
+Mastodon.py
 beautifulsoup4