Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puppeteer-ruby freezes with multiple threads. #340

Open
ronakjain90 opened this issue Oct 30, 2024 · 3 comments
Open

puppeteer-ruby freezes with multiple threads. #340

ronakjain90 opened this issue Oct 30, 2024 · 3 comments

Comments

@ronakjain90
Copy link

Step To Reproduce / Observed behavior

Hi @YusukeIwaki - Thanks for this open source project, we are currently using this in production and I have noticed the following issue.

When running in parallel threads, puppeteer-ruby (using chrome/firefox) times out very frequently

  1. above a certain thread size
  2. after processing certain number of requests

puppeteer-node doesn't face the same issue. I have created a small script with exact same functionality using both puppeteer ruby and node.

require 'puppeteer'
require 'benchmark-memory'

browser_instance = Puppeteer.launch(
  product: 'firefox',
  channel: 'firefox',
  headless: true,
  executable_path: 'Firefox Nightly.app/Contents/MacOS/firefox',
)

puts browser_instance.ws_endpoint

THREAD_COUNT = 20

def create_thread(browser_instance, i)
  browser = Puppeteer.connect(browser_ws_endpoint: browser_instance.ws_endpoint)
  context = browser.create_incognito_browser_context
  page = context.new_page
  page.set_content("<html> <body> <h1> ok</h1></body></html>")
  Base64.strict_encode64(page.screenshot)
  puts "processed from thread #{i}"
  # page.close
  # context.close
  browser.disconnect
  true
rescue Puppeteer::TimeoutError => e
  puts "NAVIGATION_TIMEOUT: #{e}"
end

def make_threads(browser_instance)
  threads = (0..THREAD_COUNT).to_a.map do |i|
    Thread.new do
      create_thread(browser_instance, i)
    end
  end

  threads.map(&:join)
end

Benchmark.memory do |x|
  x.report("try 1") do
    puts "1st Run"
    make_threads(browser_instance)
  end

  x.report("try 2") do
    puts "2nd Run"
    make_threads(browser_instance)
  end

  x.report("try 3") do
    puts "3rd Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 4") do
    puts "4rd Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 5") do
    puts "5th Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 6") do
    puts "6th Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 7") do
    puts "7th Run"
    make_threads(browser_instance)
    nil
  end

  x.compare!
end

puppeteer-node with the exact same steps as ruby.

const puppeteer = require('puppeteer-core');

(async () => {
  const THEREAD_SIZE = 100

  const browser = await puppeteer.launch({
    browser: 'firefox',
    product: 'firefox',
    headless: true,
    executablePath: 'Firefox Nightly.app/Contents/MacOS/firefox',
    protocol: 'cdp', //'webDriverBiDi',
  });

  console.log(browser.wsEndpoint());

  const createScreenshot = async (i) => {
    const browserInstance = await puppeteer.connect({ browserWSEndpoint: browser.wsEndpoint() })
    const context = await browserInstance.createBrowserContext();
    const page = await browser.newPage();
    await page.setContent("<html> <body> <h1> ok</h1></body></html>");
    let ss = await page.screenshot({ path: `screenshot_${i}.jpg` });
    // await context.close();
    await browserInstance.disconnect();
    return i
  }

  const times = Array.from(Array(THEREAD_SIZE).keys());

  let tasks = []

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(THEREAD_SIZE + i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(THEREAD_SIZE * 2 + i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  console.log("DONE!")
})()

Comparing the node.js and ruby code you'd notice that it's using the exact same workflow for better comparison, but the puppeteer-ruby hangs if THREAD_COUNT is set to 50/100 or doesn't complete the entire script if the THREAD_COUNT is set to 25. Somehow I'm noticing degraded performance beyond 100 runs.

Expected behavior

puppeteer-ruby should not freeze

Environment

Ubuntu 22 / MacOS

Paste the output of ruby --version
ruby-3.3.4

@ronakjain90 ronakjain90 changed the title Firefox Parallel requests. puppeteer-ruby freezes with multiple threads. Oct 30, 2024
@bufordtaylor
Copy link

Just chiming in here. I found that 'browser.disconnect' leaves the browser instance open. While 'browser.close' actually closes the application entirely. Might help.

@ronakjain90
Copy link
Author

Actually that's intended. I don't want to open/close browser application for every screenshot.

@ryanckulp
Copy link

friendly nudge for @YusukeIwaki , happy to do another sponsorship if it helps with this upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants