This project has moved. For the latest updates, please go here.

Performance in "SaveAs" Operation

Jan 5, 2011 at 8:11 PM

It seems that spreadsheets above a certain size (records x fields = total cells) experience a dramatic slowdown in performance. I have tried to test a variety of content and remove all external circumstances from the equation in my testing. I have traced the slowdown to a single step in the process - the "SaveAs" call to save the workbook to a file or stream. Regardless of whether a file or stream output is chosen, you see a dramatic slowdown in this step once the "tipping point" in size is reached. For example, a dataset with 140K cells (8280 records x 17 fields) required 4:04 to display in the browser - based on 12 seconds to retrieve the data, 13 seconds to load the workbook, 3:33 to "SaveAs" to a stream, and 6 seconds to flush to the browser. The same type of content at half that size required only 57 seconds to "SaveAs". The same type of content at double that size just stalled and never executed (I shut down after grinding for 12 minutes). In contrast to the 4:04 output time, the "simple HTML flush" case required only 18 seconds to display in the browser (and the "double that size" case ran successfully in 27 seconds).

What can be done to make the "SaveAs" routine run more efficiently? The observed behavior implies that system resources are being maxed out such that the process slows and/or stalls as resource demands approach 100%. In a previous post, you mentioned that the SDK was the source of the inefficiency - is that still the case here?

Coordinator
Jan 5, 2011 at 8:43 PM

Is it possible for you to create a piece of code that generates something similar to what you're working with? (no actual data of course)

This would help me pinpoint the issue because on my end I can save 200K cells in 95 seconds. I still need to work on the performance but I'm not getting the times you're describing.

Thanks,

Jan 5, 2011 at 9:29 PM

My direct email is gfraser@cri2000.com. If you send me your email, I can attach a text file with my VB.NET code that processes any dataset into an excel file, then outputs to file/browser or stream/browser. That code could be easily converted into C# or whatever and any "real" data could be processed.

The performance issue might be related to something other than just content size - that could explain the difference in our results. One obvious difference is the equipment we are running - CPU, RAM, other processes, etc. Another potential difference is the "features" we are applying while creating the file - formatting, style, alignment, freeze, etc. If you run some "real" data through my routine, you may get some insight into factors other than pure size.

Coordinator
Jan 5, 2011 at 11:40 PM

Thanks for the sample code. I found some bottlenecks when saving the files. I'll make the corrections and they'll be on the next release (I'm close to finishing implementing Tables).

-Manuel

 

Coordinator
Jan 9, 2011 at 4:13 AM

v0.40 has substantial performance improvements saving the files. I still have a few things to do to improve the performance but I wanted to release this version because of the improvements already made. To give you an idea I can now save 200K cells in 23 seconds whereas before it took me 95 seconds. The code you gave me now saves the 140K cells in 10 seconds.

One thing I wanted to mention is that your code adjusts the width of the columns based on the entire worksheet. IMO this is not necessary because users usually only care about the top rows. When you have a worksheet with 100K rows, users won't care if on row 15,131 there's a column that doesn't fit neatly. I'd recommend using ".AdjustToContents(1, 500)" as 500 rows should be enough for most cases.

Let me know if you find anything,

-Manuel

Jan 10, 2011 at 5:28 PM

Great improvement in speed! Also, your suggestion to limit "AdjustToContents" to the first 500 rows is a great trade-off! It took 58 seconds to download a typical "sales forecast" file with "all rows" being adjusted, 38 seconds with "no rows" being adjusted and 44 seconds with the "first 500 rows" being adjusted. So, I saved 14 seconds and got a result that is likely to be the equivalent of the full "adjustment" process - very nice!

Coordinator
Jan 14, 2011 at 11:46 PM
Edited Jan 15, 2011 at 4:30 AM

Improved the performance even further. For 200K cells the average save time is now 14 seconds and the average load time is 13 seconds.

Mar 16, 2011 at 2:39 AM

I'm having similar issues with performance. I am saving a workbook with about 100k cells and it's taking 12 minutes, on a 4GHz Q9650 (overclocked). There is a lot of parsing and other operations going on in my program, however most of the time is taken by the Worksheet.SaveAs function.

The 100k cells are spread across a couple of worksheets, maybe saving multiple worksheets is causing the slowdown? Also some of the worksheet are all equations, maybe saving about 34,000 equations is causing the slowdown?

 

I'm using the latest version as of 3/15/2011

Coordinator
Mar 16, 2011 at 4:40 PM

nvman90,

The formulas are causing the slowdown. I'll work on it (it shouldn't be hard to fix).

Thanks for the feedback,

Mar 28, 2011 at 10:37 PM

Thanks for fixing this! Now the same program takes about 15 seconds to run instead of 12 minutes. :-)