Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement for Writing Large Datasets #6

Open
michaelforsterivoflow opened this issue Jan 16, 2025 · 10 comments
Open

Performance Improvement for Writing Large Datasets #6

michaelforsterivoflow opened this issue Jan 16, 2025 · 10 comments

Comments

@michaelforsterivoflow
Copy link

Hi synchronal

First of all, I want to express my sincere appreciation for the work you have done on Exceed. The library is incredibly powerful and offers a unique set of features that make it stand out in the field. It has been a valuable tool for our projects, and we are truly grateful for your efforts.

However, we have encountered a performance issue when writing large datasets. While Exceed excels in many areas, it becomes quite slow when processing large amounts of data. In comparison, Elixlsx handles the same task much more efficiently, which has led us to consider using it as an alternative in scenarios where performance is critical.

We understand that optimizing performance for large datasets can be a complex task, but we believe that this improvement would significantly enhance the usability of Exceed in a broader range of applications.

Steps to Reproduce:
Create a dataset with 100,000 records, each with 10 columns.
Write this dataset to an Excel file using Exceed.
Observe the time it takes to complete the operation.

Expected Behavior: The operation should complete in a reasonable amount of time, comparable to Elixlsx.

Actual Behavior: The operation takes significantly longer than expected, making it impractical for real-world use cases with large datasets.

LOG Exceed:
[info] Batch size 1 completed in 14ms, rate: 71.43 rows/sec
[info] Benchmarking with batch size: 10
[info] Batch size 10 completed in 8ms, rate: 1250.0 rows/sec
[info] Benchmarking with batch size: 100
[info] Batch size 100 completed in 64ms, rate: 1562.5 rows/sec
[info] Benchmarking with batch size: 1000
[info] Batch size 1000 completed in 652ms, rate: 1533.74 rows/sec
[info] Benchmarking with batch size: 10000
[info] Batch size 10000 completed in 6401ms, rate: 1562.26 rows/sec
[info] Benchmarking with batch size: 20000
[info] Batch size 20000 completed in 12949ms, rate: 1544.52 rows/sec
[
{1, 14, 71.43},
{10, 8, 1250.0},
{100, 64, 1562.5},
{1000, 652, 1533.74},
{10000, 6401, 1562.26},
{20000, 12949, 1544.52}
]

LOG ELIXLSX:
[info] Processed 684771 rows at 5939.81 rows/sec

Best regards,

Michal Forster

@sax
Copy link
Member

sax commented Jan 16, 2025

Hi Michal,

Thank you so much for the positive feedback!

An option was recently added to the stream! function to disable internal buffering/chunking, which we saw was slowing down processing on a project we've been working on recently. If you set buffer: false, does that solve your problem?

I honestly don't remember why buffering was added in the first place, but maybe something changed further down the stack in Stream or in zip that we are running into by chunking the output.

If that doesn't solve the problem for you, let me know. If it does, I'll release a new version with buffering defaulting to false, so that people can have a better default experience.

@sax
Copy link
Member

sax commented Jan 16, 2025

Actually before you retest, I have a fix that seems to run even faster than disabling the buffering. On my phone at the moment, but should be able to push it later today when I get my laptop on an internet connection.

@michaelforsterivoflow
Copy link
Author

Hey @sax,
sounds great!

I retested with buffer: false. First with less columns (2) it was great if I added 10 or then 17 it slows down!

2 columns
12:32:49.588 [info] Benchmarking with batch size: 1
12:32:49.620 [info] Batch size 1 completed in 32ms, rate: 31.25 rows/sec
12:32:49.620 [info] Benchmarking with batch size: 10
12:32:49.622 [info] Batch size 10 completed in 2ms, rate: 5.0e3 rows/sec
12:32:49.622 [info] Benchmarking with batch size: 100
12:32:49.629 [info] Batch size 100 completed in 7ms, rate: 14285.71 rows/sec
12:32:49.629 [info] Benchmarking with batch size: 1000
12:32:49.682 [info] Batch size 1000 completed in 53ms, rate: 18867.92 rows/sec
12:32:49.682 [info] Benchmarking with batch size: 10000
12:32:50.103 [info] Batch size 10000 completed in 420ms, rate: 23809.52 rows/sec
12:32:50.103 [info] Benchmarking with batch size: 20000
12:32:51.147 [info] Batch size 20000 completed in 1044ms, rate: 19157.09 rows/sec
12:32:51.147 [info] Benchmarking with batch size: 50000
12:32:53.306 [info] Batch size 50000 completed in 2160ms, rate: 23148.15 rows/sec
12:32:53.306 [info] Benchmarking with batch size: 100000
12:32:57.484 [info] Batch size 100000 completed in 4177ms, rate: 23940.63 rows/sec
12:32:57.484 [info] Benchmarking with batch size: 200000
12:33:05.900 [info] Batch size 200000 completed in 8416ms, rate: 23764.26 rows/sec
12:33:05.900 [info] Benchmarking with batch size: 500000
12:33:27.450 [info] Batch size 500000 completed in 21551ms, rate: 23200.78 rows/sec
12:33:27.450 [info] Benchmarking with batch size: 1000000
12:34:09.111 [info] Batch size 1000000 completed in 41660ms, rate: 24003.84 rows/sec
[
{1, 32, 31.25},
{10, 2, 5000.0},
{100, 7, 14285.71},
{1000, 53, 18867.92},
{10000, 420, 23809.52},
{20000, 1044, 19157.09},
{50000, 2160, 23148.15},
{100000, 4177, 23940.63},
{200000, 8416, 23764.26},
{500000, 21551, 23200.78},
{1000000, 41660, 24003.84}
]

17 columns
12:35:09.842 [info] Benchmarking with batch size: 1
12:35:09.848 [info] Batch size 1 completed in 5ms, rate: 200.0 rows/sec
12:35:09.848 [info] Benchmarking with batch size: 10
12:35:09.852 [info] Batch size 10 completed in 4ms, rate: 2.5e3 rows/sec
12:35:09.852 [info] Benchmarking with batch size: 100
12:35:09.873 [info] Batch size 100 completed in 22ms, rate: 4545.45 rows/sec
12:35:09.873 [info] Benchmarking with batch size: 1000
12:35:10.069 [info] Batch size 1000 completed in 196ms, rate: 5102.04 rows/sec
12:35:10.069 [info] Benchmarking with batch size: 10000
12:35:12.135 [info] Batch size 10000 completed in 2066ms, rate: 4840.27 rows/sec
12:35:12.135 [info] Benchmarking with batch size: 20000
12:35:16.724 [info] Batch size 20000 completed in 4589ms, rate: 4358.25 rows/sec
12:35:16.724 [info] Benchmarking with batch size: 50000
12:35:26.467 [info] Batch size 50000 completed in 9743ms, rate: 5131.89 rows/sec
12:35:26.467 [info] Benchmarking with batch size: 100000
12:35:45.365 [info] Batch size 100000 completed in 18898ms, rate: 5291.57 rows/sec
12:35:45.365 [info] Benchmarking with batch size: 200000
12:36:23.825 [info] Batch size 200000 completed in 38459ms, rate: 5200.34 rows/sec
12:36:23.825 [info] Benchmarking with batch size: 500000

Benchmark Code:
`def benchmark_excel_generation(report_name) do
batch_sizes = [1, 10, 100, 1000, 10_000, 20_000]

Enum.map(batch_sizes, fn batch_size ->
  Logger.info("Benchmarking with batch size: #{batch_size}")

  start_time = System.monotonic_time(:millisecond)

  Repo.transaction(fn ->
    limit = batch_size + 1

    rows =
      Stream.unfold(1, fn
        ^limit ->
          nil

        row_count ->
          {[
             "Row #{row_count} Cell 1",
             "Row #{row_count} Cell 2",
             "Row #{row_count} Cell 3",
             "Row #{row_count} Cell 4",
             "Row #{row_count} Cell 5",
             "Row #{row_count} Cell 6",
             "Row #{row_count} Cell 7",
             "Row #{row_count} Cell 8",
             "Row #{row_count} Cell 9",
             "Row #{row_count} Cell 10",
             "Row #{row_count} Cell 11",
             "Row #{row_count} Cell 12",
             "Row #{row_count} Cell 13",
             "Row #{row_count} Cell 14",
             "Row #{row_count} Cell 15",
             "Row #{row_count} Cell 16",
             "Row #{row_count} Cell 17"
           ], row_count + 1}
      end)

    opts = [buffer: false]

    Exceed.Workbook.new(report_name)
    |> Exceed.Workbook.add_worksheet(
      Exceed.Worksheet.new(
        "Sheet Name",
        [
          "Heading 1",
          "Heading 2",
          "Heading 3",
          "Heading 4",
          "Heading 5",
          "Heading 6",
          "Heading 7",
          "Heading 8",
          "Heading 9",
          "Heading 10",
          "Heading 11",
          "Heading 12",
          "Heading 13",
          "Heading 14",
          "Heading 15",
          "Heading 16",
          "Heading 17"
        ],
        rows
      )
    )
    # Pass options as a keyword list
    |> Exceed.stream!(opts)
    |> Stream.into(File.stream!("test_exceed_#{batch_size}.xlsx"))
    |> Stream.run()
  end)

  duration = System.monotonic_time(:millisecond) - start_time
  rate_per_row = Float.round(batch_size / (duration / 1000), 2)
  Logger.info("Batch size #{batch_size} completed in #{duration}ms, rate: #{rate_per_row} rows/sec")

  {batch_size, duration, rate_per_row}
end)

end`

@sax
Copy link
Member

sax commented Jan 17, 2025

I finally got to a place where I could push my changes. Please try with the latest commit on main.

I combined some of your benchmark code into what I was using on my flight yesterday into this:

defmodule Test do
  require Logger

  def buffered(column_count \\ 10, row_count \\ 100_000) do
    file = File.stream!("/tmp/workbook.xlsx")
    headers = headers(column_count)
    stream = stream(column_count, row_count)

    benchmark(column_count, row_count, fn ->
      Exceed.Workbook.new("Creator Name")
      |> Exceed.Workbook.add_worksheet(Exceed.Worksheet.new("Sheet Name", headers, stream))
      |> Exceed.stream!()
      |> Stream.into(file)
      |> Stream.run()
    end)
  end

  def unbuffered(column_count \\ 10, row_count \\ 100_000) do
    file = File.stream!("/tmp/workbook.xlsx")
    headers = headers(column_count)
    stream = stream(column_count, row_count)

    benchmark(column_count, row_count, fn ->
      Exceed.Workbook.new("Creator Name")
      |> Exceed.Workbook.add_worksheet(Exceed.Worksheet.new("Sheet Name", headers, stream))
      |> Exceed.stream!(buffer: false)
      |> Stream.into(file)
      |> Stream.run()
    end)
  end

  defp headers(column_count) do
    Stream.iterate(1, &(&1 + 1))
    |> Stream.map(&"Header #{&1}")
    |> Enum.take(column_count)
  end

  defp benchmark(column_count, batch_size, fun) do
    {duration, _} = :timer.tc(fun, :millisecond)

    rate_per_row = Float.round(batch_size / (duration / 1_000), 2)
    Logger.info("Batch size #{column_count}*#{batch_size} completed in #{duration}ms, rate: #{rate_per_row} rows/sec")
  end

  def stream(column_count, row_count) do
    Stream.iterate(1, &(&1 + 1))
    |> Stream.chunk_every(column_count)
    |> Stream.take(row_count)
  end
end

I'm finding that with unbuffered output on my latest commit:

...> Test.unbuffered(10, 100_000)
Batch size 10*100000 completed in 9602ms, rate: 10414.5 rows/sec
Batch size 10*100000 completed in 9689ms, rate: 10320.98 rows/sec

...> Test.unbuffered(10, 1_000_000)
Batch size 10*1000000 completed in 98556ms, rate: 10146.52 rows/sec

With buffered output:

...> Test.buffered(10, 100_000)
Batch size 10*100000 completed in 2806ms, rate: 35637.92 rows/sec

...> Test.buffered(10, 1_000_000)
Batch size 10*1000000 completed in 28500ms, rate: 35087.72 rows/sec

So at the very least it's linear over row count.

However when I increase the column count it degrades:

...> Test.buffered(15, 100_000)
Batch size 15*100000 completed in 4212ms, rate: 23741.69 rows/sec

...> Test.buffered(20, 100_000)
Batch size 20*100000 completed in 5806ms, rate: 17223.56 rows/sec

I think because of the XML generation there will be some expected performance cost to increasing the column count, but it's surprising to me that there is such a dramatic impact. I'll see if I can get better tracing into where the slowdown is happening, whether it's in the XML generation or in the zipping.

@sax
Copy link
Member

sax commented Jan 18, 2025

I think that this might be hitting the limit of what I can improve without rewriting the XML generation, for which I'm currently relying on the XmlStream library. I'd be very happy to be proven wrong, if you can find ways to rewrite the chunking in Exceed.File or the stream transformations or enum operations in Exceed.Worksheet.

 mix profile.tprof -r benchmark/bench.exs -e "Benchmark.buffered(10)"
#                                                             CALLS      %     TIME µS/CALL
...
anonymous fn/1 in XmlStream.flatten/1                       9400362   1.01   281219    0.03
anonymous fn/3 in Stream.do_transform_result/6             10500545   1.05   292758    0.03
:zlib.deflate_nif/4                                            6648   1.13   316552   47.62
XmlStream.Printer.encode_name/1                             6300352   1.26   350775    0.06
XmlStream.Printer.Ugly.print/2                              5200188   1.35   376638    0.07
XmlStream.Printer.validate_name!/1                          6300352   1.35   377735    0.06
Stream.do_transform_result/6                               14702975   1.51   420087    0.03
anonymous fn/3 in Enumerable.List.reduce/3                 14602966   1.55   431782    0.03
Stream.do_transform_step/2                                 14702975   1.55   433052    0.03
Enumerable.impl_for!/1                                      8400424   1.68   469505    0.06
anonymous fn/3 in Stream.flat_map/2                         9400394   1.80   502917    0.05
Enumerable.reduce/3                                         8400415   1.99   555679    0.07
:lists.reverse/1                                           20103276   2.05   571744    0.03
:erlang.iolist_size/1                                      14705412   2.10   584959    0.04
anonymous fn/2 in Exceed.File.buffer/1                     14700602   3.13   874676    0.06
Stream.do_transform_inner_list/7                           15803118   3.31   924784    0.06
anonymous fn/4 in Stream.chunk_while_fun/2                 15700602   3.37   941275    0.06
XmlStream.Printer.escape_binary/1                          16369780   3.51   979294    0.06
anonymous fn/7 in Stream.do_transform_inner_enum/7         41108158   4.22  1177340    0.03
anonymous fn/3 in Stream.do_transform/7                    41108159   4.31  1204344    0.03
Stream.do_transform_user/6                                 33606155   4.78  1332956    0.04
Stream.do_transform/5                                      27303592   6.68  1863117    0.07
Stream.do_transform_each/3                                 41108159   7.97  2223174    0.05
Enumerable.List.reduce/3                                   58605061   8.95  2499299    0.04
Stream.do_transform_inner_enum/7                           45308352  10.48  2925620    0.06

@sax
Copy link
Member

sax commented Jan 19, 2025

I just release v0.7.0 with my updates for buffered output. I also shipped some benchmark code in /benchmark, and I'll fiddle with those in some spare time to see if I can get clearer numbers comparing Exceed to Elixlsx.

I'll leave this issue open until you have a chance to test, and am also happy to leave it open as a place for discussion if you can think of other ways to make things faster or more performant for spreadsheets with more columns.

@michaelforsterivoflow
Copy link
Author

Hi @sax,

I'm impressed by your exceptional engagement and outstanding work on developing Exceed. Your ability to solve complex problems and find innovative solutions is truly remarkable, and I'm grateful for the opportunity to learn from you and work with you. I hope that we can work together to make Exceed an even better tool, and I look forward to your help and support in this endeavor.

I've run some tests to compare the performance of Exceed and Elixlsx, and I wanted to share the results with you. Here they are:

Elixlsx Version (Reference):

Total time: 3942ms
Average rate: 25367.83 rows/sec
File size: 7.3 MB
Exceed Version:

Total time: 15129ms
Average rate: 6609.82 rows/sec
File size: 7.73 MB
Comparison: 3.838x slower than Elixlsx
I'm surprised to see that Exceed is so much slower than Elixlsx.

However, I've also run additional tests using 1,000,000 database records. Here are the results:

Excel generation comparison with 1,000,000 database records:

Elixlsx Version (Reference):

Total time: 102579ms
Average rate: 9748.58 rows/sec
File size: 73.93 MB
Exceed Version:

Total time: 105207ms
Average rate: 9505.07 rows/sec
File size: 77.59 MB
Comparison: 1.026x compared to Elixlsx
It's interesting to see that Exceed is only slightly slower than Elixlsx in this test run. This suggests that there may be a specific factor that affects Exceed's performance with smaller data sets. I think it would be worthwhile to investigate this further to determine the cause of these performance differences.

Thank you for your time and support. I look forward to hearing from you and working together to find a solution.

@sax
Copy link
Member

sax commented Jan 20, 2025

If there is a way to measure memory allocation as well, that's going to be the true measure of why you'd want to use a tool like Exceed on large data sets. With Elixlsx you need to load the entire dataset and generate the xml files in memory.

I'm pretty sure benchee will measure memory. Just need some time and a better workspace to knock it out... I'm mostly on my phone this week with short periods with internet connection on my laptop.

Elixlsx writes most strings as shared strings, using a ets table to track them if memory serves. Exceed does not, in the interest of less complex code and the fact that the shared strings file needs to precede all worksheets in the zipped xlsx file. That will some if not all of the file size difference.

I am also surprised at how much slower it is than Elixlsx, though. My suspicion is that it's the sheer number of stream transformations, between iterating over rows, transforming to xml, then transforming to zip. It would be helpful if I could find a way to measure each part of it—tprof and like just give function names, and a stream is a stream is a stream.

@michaelforsterivoflow
Copy link
Author

Hey @sax,
I think you are right for creating it was much more faster to do it with another library. Maybe this one : https://hexdocs.pm/saxy/readme.html#benchmarking

@sax
Copy link
Member

sax commented Jan 20, 2025

Another thing I was thinking was to try to move the zipping to another process. I have no clue what scope of changes is required for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants