Ok, I found what the problem was. Chances are slim, that someone would run into a similar issue, but since it took me quite a while to find the cause of the issue (and with quite a bit of googling), I'll post it here also with the hope that the topic will
be indexed by Google and might be useful for other developers should they face a similar problem.
My application had to parse xlsx files generated by another application. Initially I was using OLEDB for extracting xlsx file data, but then I noticed that for some files it failed throwing an exception "External table is not in the expected format.",
which normally means that the format of the file is not correct.
I checked files and found that those which OLEDB could parse were twice as big in size as those which it couldn't, while when opened in Excel (Excel could load both group of files with no problems) the amount of data contained was around the same. This looked
strange to me, of course. I opened internal XML files and saw that in "smaller" files some metadata was missing, for instance - in the worksheet document of the "bigger" files cells had their addresses explicitly added in cell attributes - similar to what
Then I found that xlsx files were generated by
SpreadsheetGear component in that other application. The application (i.e. the SG component) was generating "smaller" files always. The reason why some of them ended up being twice bigger is that those "bigger" files were opened by users in Excel and then
I contacted SpreadsheetGear support and they confirmed that they know about this issue. Indeed, SG component by default "optimizes" generated xlsx files by omitting certain metadata like explicit cell addresses. Reading such files is no problem for Excel
(which is the most probable scenario) but OLEDB will fail. Starting from version 5 there is a workaround in SG API added especially to be able to generate non-optimized and wellformed from OLEDB point of view xlsx files - i.e. files which would "look" exactly
like they would've been saved by Excel.
For those who might be interested, here's the line that should be added in SG code to make it generate non-optimized files - it's in bold in the code snippet below:
SpreadsheetGear.IWorkbookSet workbookSet = SpreadsheetGear.Factory.GetWorkbookSet();
// next line below makes SG generate non-optimized xlsx files
workbookSet.Experimental = "OleDbOpenXmlWorkaround";
SpreadsheetGear.IWorkbook workbook = workbookSet.Workbooks.Open(@"C:\myWorkbook.xlsx");
While developers of that other application were fixing their xlsx generation part, I took some time to see if there are other solutions to this problem. I found that "smaller" (i.e. "optimized") xlsx files could be opened and parsed with OpenXML SDK.
But instead of using pure OpenXML SDK for parsing I decided to use ClosedXML as it makes the process a lot easier. My idea was that given that ClosedXML is built on top of OpenXML SDK, it would have no problem reading these files.
In reality, though, for some reason ClosedXML fails throwing NullReferenceException mentioned in my first post. Once I try to feed wellformed xlsx application - the same code works fine.
I am a bit surprised that the library using OpenXML would fail reading files which are in general readable by OpenXML, but at least now it's clear what causes the exception upon XLWorkbook instantiation.