Skip to content

Commit

Permalink
Refactor project to use single problems page
Browse files Browse the repository at this point in the history
Remove PhantomJS / javascript code
Simplify and rename python script
Update README
  • Loading branch information
jxu committed Jul 28, 2020
1 parent 474b57d commit 3d381a0
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 157 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
Project Euler Offline
=====================
All Project Euler problems, with MathJax and images, as a single PDF. Additional text files are provided. Get the releases [here](https://github.com/wxv/project-euler-offline/releases).
All Project Euler problems, with MathJax and images, as a single PDF. Additional text files are provided. [Get the releases here.](https://github.com/wxv/project-euler-offline/releases)

Please report any inaccuracies or give feedback. Thanks.

Inspired by [Kyle Keen's original Local Euler](http://kmkeen.com/local-euler/2008-07-16-07-33-00.html).

Installation and Usage
----------------------

Note: previously PhantomJS was used to download each problem individually as a PDF, and PyPDF2 was used to combine together all problems.

Now, use "Print to File" https://projecteuler.net/show=all using Firefox (with no Headers and Footers in options). This is simpler, produces a smaller PDF, and does not rely on the discontinued PhantomJS. The python script to download extra files remains the same functionally.

Requirements:
- PhantomJS (`apt install phantomjs`)
- Node modules system, webpage (`npm install system webpage`)
- Python 3 and PyPDF2, BeautifulSoup, lxml, Pillow (`pip install beautifulsoup4 lxml pypdf2 pillow`)
- Python 3 and BeautifulSoup, lxml, Pillow (`pip install beautifulsoup4 lxml pillow`)
- BeautifulSoup and Pillow are only required for downloading extra text and images (animated GIF only).

My usage process (replace 1 and 628 with whatever range you like):
My usage process:

phantomjs capture.js 1 628
python3 combine.py 1 628
// Optional: download solutions from https://github.com/luckytoilet/projecteuler-solutions
mkdir render
# Save render/problems.pdf with Firefox as above
python3 download_extra.py
cd render
zip problems problems.pdf *.txt *.gif
zip problems.zip problems.pdf *.txt *.gif

Since each page is independent, it is possible to run multiple processes of
`capture.js` at once, each downloading a certain range.
78 changes: 0 additions & 78 deletions capture.js

This file was deleted.

68 changes: 0 additions & 68 deletions combine.py

This file was deleted.

49 changes: 49 additions & 0 deletions download_extra.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import sys
from os import sep
# Not async for now to keep rate of requests low
from bs4 import BeautifulSoup
import requests
from os.path import basename
from PIL import Image
from io import BytesIO


RENDER_DIR = "render"
SITE_MAIN = "https://projecteuler.net/"


def download_extra(url):
"""Finds if available a .txt attachment or animated .gif and downloads it
to RENDER_DIR
"""
content = requests.get(url).content
soup = BeautifulSoup(content, "lxml")
for a in soup.find_all('a', href=True):
href = a["href"]
if href.endswith(".txt"):
print("Writing", href)
r = requests.get(SITE_MAIN + href)
with open(RENDER_DIR + sep + basename(href), 'wb') as f:
f.write(r.content)

for img in soup.find_all("img"):
img_src = img["src"]

# Skip non-GIFs and spacer.gif
if not img_src.endswith(".gif") or img_src.endswith("spacer.gif"):
continue

r = requests.get(SITE_MAIN + img_src)
# Only write animated GIFs
if Image.open(BytesIO(r.content)).is_animated:
print("Writing", img_src)
with open(RENDER_DIR + sep + basename(img_src), 'wb') as f:
f.write(r.content)


def main():
download_extra("https://projecteuler.net/show=all")


if __name__ == "__main__":
main()

0 comments on commit 3d381a0

Please sign in to comment.