Refactor project to use single problems page

Remove PhantomJS / javascript code Simplify and rename python script Update README
jxu · Jul 28, 2020 · 3d381a0 · 3d381a0
1 parent 474b57d
commit 3d381a0
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 157 deletions.
diff --git a/README.md b/README.md
@@ -1,26 +1,27 @@
 Project Euler Offline
 =====================
-All Project Euler problems, with MathJax and images, as a single PDF. Additional text files are provided. Get the releases [here](https://github.com/wxv/project-euler-offline/releases).
+All Project Euler problems, with MathJax and images, as a single PDF. Additional text files are provided. [Get the releases here.](https://github.com/wxv/project-euler-offline/releases)
 
 Please report any inaccuracies or give feedback. Thanks.
 
 Inspired by [Kyle Keen's original Local Euler](http://kmkeen.com/local-euler/2008-07-16-07-33-00.html).
 
 Installation and Usage
 ----------------------
+
+Note: previously PhantomJS was used to download each problem individually as a PDF, and PyPDF2 was used to combine together all problems. 
+
+Now, use "Print to File" https://projecteuler.net/show=all using Firefox (with no Headers and Footers in options). This is simpler, produces a smaller PDF, and does not rely on the discontinued PhantomJS. The python script to download extra files remains the same functionally. 
+
 Requirements:
-- PhantomJS (`apt install phantomjs`)
-- Node modules system, webpage (`npm install system webpage`)
-- Python 3 and  PyPDF2, BeautifulSoup, lxml, Pillow (`pip install beautifulsoup4 lxml pypdf2 pillow`)
+- Python 3 and BeautifulSoup, lxml, Pillow (`pip install beautifulsoup4 lxml pillow`)
   - BeautifulSoup and Pillow are only required for downloading extra text and images (animated GIF only).
 
-My usage process (replace 1 and 628 with whatever range you like):
+My usage process:
 
-    phantomjs capture.js 1 628
-    python3 combine.py 1 628
-    // Optional: download solutions from https://github.com/luckytoilet/projecteuler-solutions
+    mkdir render
+    # Save render/problems.pdf with Firefox as above
+    python3 download_extra.py
     cd render
-    zip problems problems.pdf *.txt *.gif
+    zip problems.zip problems.pdf *.txt *.gif
 
-Since each page is independent, it is possible to run multiple processes of
-`capture.js` at once, each downloading a certain range.
diff --git a/capture.js b/capture.js
diff --git a/combine.py b/combine.py
diff --git a/download_extra.py b/download_extra.py
@@ -0,0 +1,49 @@
+import sys
+from os import sep
+# Not async for now to keep rate of requests low
+from bs4 import BeautifulSoup
+import requests
+from os.path import basename
+from PIL import Image
+from io import BytesIO
+
+
+RENDER_DIR = "render"
+SITE_MAIN = "https://projecteuler.net/"
+
+
+def download_extra(url):
+    """Finds if available a .txt attachment or animated .gif and downloads it
+    to RENDER_DIR
+    """
+    content = requests.get(url).content
+    soup = BeautifulSoup(content, "lxml")
+    for a in soup.find_all('a', href=True):
+        href = a["href"]
+        if href.endswith(".txt"):
+            print("Writing", href)
+            r = requests.get(SITE_MAIN + href)
+            with open(RENDER_DIR + sep + basename(href), 'wb') as f:
+                f.write(r.content)
+
+    for img in soup.find_all("img"):
+        img_src = img["src"]
+
+        # Skip non-GIFs and spacer.gif
+        if not img_src.endswith(".gif") or img_src.endswith("spacer.gif"): 
+            continue
+
+        r = requests.get(SITE_MAIN + img_src)
+        # Only write animated GIFs
+        if Image.open(BytesIO(r.content)).is_animated:
+            print("Writing", img_src)
+            with open(RENDER_DIR + sep + basename(img_src), 'wb') as f:
+                f.write(r.content) 
+
+
+def main():    
+    download_extra("https://projecteuler.net/show=all")
+
+
+if __name__ == "__main__":
+    main()