From 5f83f3aa5b9a0734915429c9ccee83fea46035f2 Mon Sep 17 00:00:00 2001 From: Christophe Simonis Date: Thu, 6 Jun 2024 14:40:36 +0200 Subject: [PATCH] [IMP] snippets.convert_html_columns: a batch processing story MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TLDR: RTFM Once upon a time, in a countryside farm in Belgium... At first, the upgrade of databases was straightforward. But, as time passed, the size of the databases grew, and some CPU-intensive computations took so much time that a solution needed to be found. Hopefully, the Python standard library has the perfect module for this task: `concurrent.futures`. Then, Python 3.10 appeared, and the usage of `ProcessPoolExecutor` started to sometimes hang for no apparent reasons. Soon, our hero finds out he wasn't the only one to suffer from this issue[^1]. Unfortunately, the proposed solution looked overkill. Still, it revealed that the issue had already been known[^2] for a few years. Despite the fact that an official patch wasn't ready to be committed, discussion about its legitimacy[^3] leads our hero to a nicer solution. By default, `ProcessPoolExecutor.map` submits elements one by one to the pool. This is pretty inefficient when there are a lot of elements to process. This can be changed by using a large value for theĀ *chunksize* argument. Who would have thought that a bigger chunk size would solve a performance issue? As always, the response was in the documentation[^4]. [^1]: https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load [^2]: https://github.com/python/cpython/issues/74028 [^3]: https://github.com/python/cpython/pull/114975#pullrequestreview-1867070041 [^4]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map closes odoo/upgrade-util#94 Signed-off-by: Nicolas Seinlet (nse) --- src/util/snippets.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/snippets.py b/src/util/snippets.py index a12ed7b2..6f100c5a 100644 --- a/src/util/snippets.py +++ b/src/util/snippets.py @@ -279,7 +279,7 @@ def convert_html_columns(cr, table, columns, converter_callback, where_column="I convert = Convertor(converters, converter_callback) for query in util.log_progress(split_queries, logger=_logger, qualifier=f"{table} updates"): cr.execute(query) - for data in executor.map(convert, cr.fetchall()): + for data in executor.map(convert, cr.fetchall(), chunksize=1000): if "id" in data: cr.execute(update_query, data)