This project has moved. For the latest updates, please go here.

AdjustToContent is very slow

Nov 11, 2014 at 9:25 AM
Calling this function on a 9M xlsx file takes about 2 minutes time. In contrast, saving the file only takes several seconds.

My profiling shows the hotspot is Bitmap creation in FontBaseExtensions.GetWidth function (or maybe GraghUtils.MeasureString?). Maybe a quick fix is to cache the bitmap object (or the graphics object instead) in MeasureString to avoid creating them every time.

Here's my profiling result:
Image
Image
Coordinator
Nov 11, 2014 at 2:37 PM
There are threading issues. Limit the number of rows to adjust to 50. Will a user really care if cell 19,874 isn't adjusted? Nope.
Nov 11, 2014 at 3:32 PM
You are right. I somehow missed the overload.
Thanks.
Feb 15, 2015 at 3:00 PM
Edited Feb 16, 2015 at 7:05 PM
@MDeLeon, can you elaborate on the threading issues?
I can't find no threading in ClosedXml.

I have the same issue with AdjustToContents beeing slow.
We generate a quite large excel sheet. ~20k rows and 50 columns.
For most of our columns the data is quite uniform and adjusting only the top few 100 would be enough, but some of our columns are quite sparse. Mostly empty, but some rows have long strings. If we don't get any strings in the top x rows the width adjustment of that column would be way of.

I'm considering to invest some time to create a pull request with a few performance improvements in this area.
I have a few options that I'm considering.
  1. Move the creation of the graphics object from GraphicsUtils.MeasureString to XLColumn.AdjustToContents and pass the graphics along with the font cache
  2. Create a dictionary from <IXLFontBase, string> to SizeF to cache the size of strings to speed up adjustment when there are lots of duplicates.
  3. Sort the columns by simple string character count and measure the top x% to make it more likely to find the longest columns.
However before I start out I'm curios to hear about the threading issues to make sure I don't mess anything up.

By the way, I can't find any contributor guide lines. Anything special to think about more than to make sure the code looks tidy and the tests are green?
Feb 16, 2015 at 7:04 PM
I have made some preliminary work on this in a branch: https://closedxml.codeplex.com/SourceControl/network/forks/AlbinSunnanbo/closedxml?branch=PerformanceImprovements
In a simple example I have cut the times for AdjustToContents down to about one third of the original time. Now AjustToContents is faster than the XLWorkBook.SaveAs method for an example file with 10 columns, 20k rows.
I have a few other hot spots I want see if they could be improved, both in AdjustToContents and in Save, before I wrap up a pull request.
Feb 16, 2015 at 7:54 PM
I suppose the thread issues are the ones fixed in commit https://closedxml.codeplex.com/SourceControl/changeset/9e618a7ce0b6ef822f190be3da76704949f2eee9
Then my branch should have no issues.