This project has moved. For the latest updates, please go here.

Copy Worksheet is very very slow

Jun 15, 2014 at 4:33 PM
I have a process that creates about 200 sheets. In the end, I want them all in one workbook. (long story, I know it's not the best design, but I'm stuck with it).

Originally I tried doing just that, one workbook and adding all the sheets, but I kept running out of memory. So I decided to create a workbook for each sheet, and then at the end, combine all of the sheets into one. I thought this would be more efficient, but it takes far longer to copy a sheet than to create it. Almost by a factor of 10 or so, which makes no sense at all. The copy should be very fast I would think. I would have expected the copy to be faster than the create.

Any ideas on this? Due to memory constraints, I think my approach of separate sheets and combining them is the way to go, but the copy is so dreadfully slow.
        XLWorkbook wb = new XLWorkbook(XLEventTracking.Disabled);
        foreach (SheetPart mySheetPart in mySheetParts)
        {
            XLWorkbook wbSource = new XLWorkbook(mySheetPart.FileName, XLEventTracking.Disabled);
            foreach (IXLWorksheet copySheet in wbSource.Worksheets)
            {
                copySheet.CopyTo(wb, copySheet.Name);
            }
            wbSource.Dispose();
            File.Delete(mySheetPart.FileName);
        }
        wb.SaveAs(FileName + "Final.xlsx");
Coordinator
Jun 18, 2014 at 9:39 PM
I'd say compile to x64 and create the file directly without the two steps.

Another thing that doesn't make sense is why would it run out of memory when you're creating a file with 200 sheets and not when you're copying 200 sheets. It doesn't make sense.
Jun 18, 2014 at 9:59 PM

I compile with ‘ANY CPU’, and I’ve verified that it is not running 32 bit. (Task manager does not have the ‘*32’ by it). So would compiling my app as X64 really make a difference?

Through ask manager, I’ve seen the memory hit 11 GB on this task, during the creation of the sheets. The resulting sheet is about 23 MB (which I know is compressed). So the 11 GB seems large.

A GREAT solution would be if you could tell ClosedXML to keep files on disk, rather than in memory.

Coordinator
Jun 18, 2014 at 10:04 PM
Can't do that, there are so many scenarios where you don't have access to a disk.

Back to your problem, if you can copy 200 sheets into a single file then you should be able to create the file one sheet at a time. You're probably leaving objects in memory.
Jun 18, 2014 at 10:19 PM
Originally, I opened a workbook, and wrote everything to it, and then did a 'saveas' at the end. But it would never finish, it kept blowing for memory.

The change I made was to put each section into a {} block, where I first do a 'create' and right before the } I do a 'save as'. I then dispose it. Now it runs fine and has no memory issues.

Building the sheets takes 60 to 80 minutes depending on a few variables. This step takes very little memory now.
Merging the sheets takes 2 to 3 hours. This step takes a LOT of memory, and I am worried that as business grows, it could be an issue. I really need to find a better way to merge the sheets.

But then I have to merge them all at the end. At least now I can get the job done, and the user is happy with the ending results, they are just not happy with the time.

These worksheets are really ridiculous in size, but people making 3 times my salary are making those decisions, so they are smarter than me, right? <sarcasm>





Coordinator
Jun 18, 2014 at 10:29 PM
I can't get past the fact that either way you end up with a huge file you need to save, what happens before shouldn't be an issue (you still need to save it). In the case where you copy the 200 sheets into a single file you still have a big file that needs to be saved, just as if you wrote the file in one shot.

Are you sure you're disposing objects when you write everything to it?
Jun 18, 2014 at 10:51 PM
Other than putting the code block inside of a {} and calling the .Dispose() method before the }, I'm not sure what else I can do?




Coordinator
Jun 18, 2014 at 11:05 PM
have you tried something like this?
        private static void Main(string[] args)
        {
            var wb = new XLWorkbook();
            foreach (var sheetNum in Enumerable.Range(1, 200))
            {
                CreateSheet(wb, sheetNum);
            }
            wb.SaveAs("saved.xlsx");
        }

        private static void CreateSheet(XLWorkbook wb, Int32 sheetNum)
        {
            using (var ws = wb.AddWorksheet("Sheet " + sheetNum))
            {
                // Fill sheet
            }
        }
Jun 18, 2014 at 11:09 PM
Instead of having the using from the 'createsheet' as a separate method, I had the using inline. However, unless I am mistaken, that } after CreateSheet should cause a garbage collect no different than having the using be in a different method.


Coordinator
Jun 19, 2014 at 2:59 PM
Unfortunately you're going to have to go with SAX: http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx

ClosedXML isn't the right tool for this job.

Sorry =(
Jun 19, 2014 at 3:04 PM
I will read up on this. If I can just replace the merge I am good.


Jun 19, 2014 at 4:54 PM

Did you have any idea why the copyto function takes so much time and memory? Is there anything I could do to cut down on either the processing time or the amount of memory used? I am sure it is keeping the entire book in memory. It would be nice if there was a way to ‘close’ a sheet to disk to take a sheet out of memory.

<p class="Ms
Coordinator
Jun 19, 2014 at 5:46 PM
Yes, it does keep the entire workbook in memory before saving it to disk. This is why I find it so odd that it works one way but not the other. Either way the entire workbook is kept in memory. You can only save data to the disk as you're creating it if you can make assumptions about the workbook. This is something ClosedXML can't do so it's the reason why you should go with SAX.