Big files, too slow parsing

Oct 9, 2012 at 12:49 PM
Edited Oct 9, 2012 at 12:55 PM

I have file, that contains for about 50k rows, each contains 2 columns: string (9 characters) and hyperlink url.

I am going through all rows and extracting data to list<SomeObj>

 

            try
            {
                var excel = new XLWorkbook(filePath);

                var worksheet = excel.Worksheet(1);

                for (int i = 2; i <= worksheet.LastRowUsed().RowNumber(); i++)
                {
                    try
                    {
                        string phone = worksheet.Cell(i, 1).Value.ToString();
                        
                        var hyperlink = worksheet.Cell(i, 2).Value as XLHyperlink;
                        string url = string.Empty;
                        if (hyperlink != null && hyperlink.IsExternal)
                            url = hyperlink.ExternalAddress.ToString();
                        
                        resDataList.Add(new SomeMyObj(phone, url));
                    }
                    catch (Exception ex)
                    {
                        throw new Exception("row " + i.ToString() + " " + ex.Message);
                    }
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show("Error while reading data from file " + fi.FullName + ": " + ex.Message, "Error", MessageBoxButton.OK,
                                MessageBoxImage.Error);
            }

 

 

It works without exceptions or memory leaks, but it takes too much time: 5min on i5 4 cores 3.4Ghz/12GB/SSD and 10-15min on older notebook.

Is there a way to make it faster?

Coordinator
Oct 9, 2012 at 3:59 PM

You're calculating the last row used for every row. Cache that value:

int lastRow = worksheet.LastRowUsed().RowNumber()
for
(int i = 2; i <= lastRow; i++)

I would write it this way (a little cleaner IMO):

            var excel = new XLWorkbook(filePath);
            var worksheet = excel.Worksheet(1);
 
            int lastRow = worksheet.LastRowUsed().RowNumber();
            foreach (var row in worksheet.Range(2, 1, lastRow, 2).Rows())
            {
                try
                {
                    var phone = row.Cell(1).GetString();
                    var hyperlink = row.Cell(2).GetHyperlink();
                    var url = hyperlink == null ? String.Empty : hyperlink.ExternalAddress.ToString();
                    resDataList.Add(new SomeMyObj(phone, url));
                }
                catch (Exception ex)
                {
                    throw new Exception("row " + row.RowNumber() + " " + ex.Message);
                }
            }
Oct 9, 2012 at 4:10 PM

Oh, thank you. I'm new to C#, thought that it calculates only before first iteration.