Memory Usage

Mar 4, 2011 at 2:28 PM

This project makes dealing with excel files quite trivial, excellent job on this.  I've only been using this for a few weeks and I'm already in love with it.  I do seem to be running into a recurring issue with memory usage and currently debugging it myself now as it's becoming too big of a problem to ignore (pun intended).  Memory usage seems to climb above 1GB during usage before the server puts a kibosh on it.  In my opinion 1GB is pretty excessive for what's eventually being spit out as an excel document that's only a few MB.  At first glance it seems to XLBorder, XLFont, XLAlignment, XLAddress, and XLStyle seem to be eating up the most memory as each are taking up several million instances and well over a 100MB in memory.  Before I get shoulder deep in debugging memory usage/profiling and experimenting with it to see if I can somehow minimize some of the overhead here has anyone else run into a problem like this before that can offer any tips or tricks of their own? 


In general a single excel doc is created with a summary on the first sheet containing around 50,000 - 100,000 cells and then drawn out into several subsequent sheets that are segments of the initial summarized sheet.  These sheets are also all pretty and nicely stylized as well, I'm thinking this may be a primary culprit.



Mar 4, 2011 at 6:35 PM

Interesting. Would you be able to make a test code that creates a sample file with dummy data and the styles as you use them?

Thanks for the feedback.

Mar 6, 2011 at 4:37 AM

I ran some tests and while I didn't get the memory usage you're describing, ClosedXML does consume a lot more memory than expected.  I just ran the profiler for the memory allocation and I admit to be stumped. Profiling for memory usage is not as straightforward as profiling for performance.

To everyone who reads this: Can you recommend any good articles or books that explain how to use the profiler's information and reduce the memory footprint?

Thanks for the help.

Mar 8, 2011 at 4:17 PM

Wholeheartedly agree there on the point of profiling memory usage.  Other than finding glaring problems of resources not being released soon enough or memory leaks, problems are hard to spot.  To me it looks like maybe there are more instances being created than needed, the numbers do seem excessive when comparing to the number of possible cells.  But that's an opinion coming from someone that's only vaguely familiar with the Microsoft Office SDK for OpenXml so this might be expected behavior.  I've somewhat dropped this to a lower priority myself as I learned when talking to users about the problem and stepping through it that it was actually only coming up when they were putting bad parameters to begin with.  They were inadvertently pulling down far more data then they actually intended - I'm just protecting them a bit better from shooting themselves in the foot from the front end now.  Unfortunately I can't really provide a data set sample to work with primarily because of time constraints but also because the info is a little sensitive.  I think I can at least show you the code snippet that involves just the ClosedXml piece to give anyone rough idea, though I don't expect it's anything far from the norm:

Mar 15, 2011 at 7:20 PM

I am also encountering the same issue. Memory usage is climbing to the 1-1.5 GB Range. I'm creating a fairly simple spreadsheet with data in the tens of thousands of rows.

Although I haven't done comprehensive testing, like memory profiling, I have noticed that the largest spike in memory usage occurred when I invoke IXLColumns.AdjustToContents(int). I can guess why. You probably need to walk through all the data in the column, calculate the size, to be able to figure out what the appropriate size for the column should be.

I haven't looked at the source, but if you are storing all the values in order to determine which is the largest (and therefore have the appropriate value for adjusting the column as a whole), that may lead to the problem. If my hypothesis is correct, then it would be a better approach to simply keep the max value, and not store the value of every cell in memory.

Of course I am just speculating. You've done a great job with the library. It's a great pleasure to use. Let me know if I can do anything to help fix this problem.

Mar 15, 2011 at 7:36 PM

AdjustToContents only persists the max value, not the entire range. One thing you could do is limit the number of rows that are used to calculate the column width.

I will take a stab at memory utilization after fixing pending issues. If you know a good article on .Net memory optimizations, please let me know.

Mar 15, 2011 at 8:42 PM

Yes, you are right. After doing some more testing on my code, I realized I was wrong. To test, I removed the call to AdjustToContents, and put in hard coded column values. There was little change in memory utilization.

Most of the memory utilization is done while populating the large amounts of data on one of the worksheets, and persisting that out to an xlsx file.

I wish I could point you in the right direction as far as memory profiling is concerned, but doing .net as part of the day job. Don't have vast experience in it. I will do the same as you. Ask google. Don't have any existing knowledge on the matter.