Bijective base-26

Aug 8, 2014 at 9:31 AM
Hi, FWIW, i've written a much less featured xlsx reader/writer for a specific purpose in our code base, then i found your project and had a look at the code - after some fun writing different implementations we came across the concept of the bijective base-26 counting system. Perhaps this might be helpful to share the following code which i think is an elegant and possibly most optimal way from a performance point of view of doing the 1-based column number to column reference conversion -
    public static class ExcelCellRefs
    {
        public const Int32 MaxRows = 1048576;
        public const Int32 MaxCols = 16384;

        /// <summary>
        /// 1-based column number -> column reference e.g. 1 -> "A", 16384 => "XFD"
        /// </summary>
        public static Dictionary<String, Int32> ColRefToNum { get; private set; }

        /// <summary>
        /// Column reference -> 1-based column number e.g. "A" -> 1, "XFD" -> 16384
        /// </summary>
        public static Dictionary<Int32, String> ColNumToRef { get; private set; }

        static ExcelCellRefs()
        {
            var colRefs = GenerateColRefs(MaxCols).ToArray();
            var tuples = Enumerable.Range(1, MaxCols).
                Select(c => new {Num = c, Ref = colRefs[c - 1]}).
                ToArray();
            ColRefToNum = tuples.ToDictionary(t => t.Ref, t => t.Num);
            ColNumToRef = tuples.ToDictionary(t => t.Num, t => t.Ref);
        }

        /// <summary>
        /// Generates Excel column reference names
        /// e.g. the last column name for GenerateColRefs(16384) is "XFD"
        /// </summary>
        public static IEnumerable<String> GenerateColRefs(Int32 columnCount)
        {
            var names = new List<String>();
            var i = 1;
            while (i <= columnCount)
            {
                names.Add(ToBijectiveBase26(i));
                i++;
            }
            return names;
        }

        /// <summary>
        /// No concept of 0 in the bijective base-26 counting system, 
        /// it's a 26-adic counting system and not a pure base-26 numbering
        /// (Thank you Internet for explaining that to me as i would have otherwise
        /// had no idea...)
        /// </summary>
        /// <param name="n">The 1-based column number to convert</param>
        /// <returns>The column name as used in Excel e.g. 16384 is "XFD"</returns>
        public static String ToBijectiveBase26(Int32 n)
        {
            var chars = new List<char>();
            while (n > 0)
            {
                --n;
                chars.Insert(0, (char)('A' + n % 26));
                n /= 26;
            }
            return String.Join(String.Empty, chars);
        }

        /// <summary>
        /// No concept of 0 in the bijective base-26 counting system, 
        /// it's a 26-adic counting system and not a pure base-26 numbering
        /// (Thank you Internet for explaining that to me as i would have otherwise
        /// had no idea...)
        /// </summary>
        /// <param name="colRefName">The column name as used in Excel to convert</param>
        /// <returns>The 1-based column number e.g. "XFD" will be 16384</returns>
        public static Int32 FromBijectiveBase26(String colRefName)
        {
            var chars = colRefName.ToCharArray();
            var result = 0;
            var power = 0;
            for (var i = chars.Length - 1; i >= 0; i--)
            {
                result += ((chars[i] - 'A') + 1) * (Int32)Math.Pow(26, power);
                power++;
            }
            return result;
        }

        /// <summary>
        /// Parses an Excel cell ref name into the 1-based row and column numbers
        /// </summary>
        /// <param name="cellRef">The cell ref name e.g. XFD123</param>
        /// <param name="rowNumber">The 1-based row number e.g. XFD123 => 123</param>
        /// <param name="columnNumber">The 1-based col number e.g. XFD123 => 16384</param>
        /// <returns>true if the cell ref name is valid, false otherwise</returns>
        public static Boolean ParseCellRef(String cellRef, out Int32 rowNumber, out Int32 columnNumber)
        {
            var cellRefLen = cellRef.Length;
            if (cellRefLen < 2)
            {
                rowNumber = columnNumber = -1;
                return false;
            }

            // using reg ex here proved to be too slow hence the hand-rolled parsing
            var index = 0;
            while (!(cellRef[index] >= 48 && cellRef[index] <= 57)) // fast !isdigit detection
                index++;

            var colRef = cellRef.Substring(0, index);
            var rowNum = cellRef.Substring(index, cellRefLen - index);

            columnNumber = ColRefToNum[colRef];
            if (columnNumber > MaxCols)
            {
                rowNumber = columnNumber = -1;
                return false;
            }

            var rowNumParsed = Int32.TryParse(rowNum, out rowNumber);
            if (!(rowNumParsed && rowNumber <= MaxRows))
            {
                rowNumber = columnNumber = -1;
                return false;
            }

            return true;
        }
    }
Coordinator
Aug 8, 2014 at 5:19 PM
Edited Aug 8, 2014 at 5:24 PM
First time I see p-adic numbers being used outside pure mathematics =)

Ran some tests and the results are...

XLHelper.GetColumnLetterFromNumber is slightly better than ToBijectiveBase26.
FromBijectiveBase26 is slightly better than XLHelper.GetColumnNumberFromLetter.

In this case "slightly better" really means "it doesn't matter" because the difference is a whooping 1 second (if that) for every 10 million operations.

Each test ran 10M times.

ToBijectiveBase26:

Test 1 random column (1-26)
Took: 2.2978524 secs

Test 2 random columns (27-702)
Took: 2.9170196 secs

Test 3 random columns (703-16384)
Took: 3.6836017 secs

XLHelper.GetColumnLetterFromNumber:

Test 1 random column (1-26)
Took: 0.8154559 secs

Test 2 random columns (27-702)
Took: 1.2372901 secs

Test 3 random columns (703-16384)
Took: 1.8447462 secs



FromBijectiveBase26:

Test 1 random column (A-Z)
Took: 0.3456822 secs

Test 2 random columns (AA-ZZ)
Took: 0.5361384 secs

Test 3 random columns (AAA-XFD)
Took: 1.1249966 secs

XLHelper.GetColumnNumberFromLetter

Test 1 random column (A-Z)
Took: 1.3690976 secs

Test 2 random columns (AA-ZZ)
Took: 1.5478759 secs

Test 3 random columns (AAA-XFD)
Took: 1.488285 secs


I rewrote your ToBijectiveBase26 function to be more performant:
        public static String ToBijectiveBase26(Int32 n)
        {
            var ret = String.Empty;
            while (n > 0)
            {
                --n;
                ret = (char)('A' + n % 26) + ret;
                n /= 26;
            }
            return ret;
        }
(This stupid editor keeps replacing the plus sign with +)

NEW ToBijectiveBase26:

Test 1 column (1-26)
Took: 0.7168028 secs

Test 2 columns (27-702)
Took: 1.1513819 secs

Test 3 columns (703-16384)
Took: 1.7939328 secs


In the interest of having "the best" we'll use FromBijectiveBase26 and the new ToBijectiveBase26.

Thanks!
Coordinator
Aug 8, 2014 at 5:45 PM
Ran another test and GetColumnNumberFromLetter is actually an order of magnitude faster if you take out the checks that FromBijectiveBase26 doesn't perform...

Test 1 column (A-Z)
Took: 0.0390483 secs

Test 2 columns (AA-ZZ)
Took: 0.0819978 secs

Test 3 columns (AAA-XFD)
Took: 0.1435579 secs

That said, it still falls under the "it doesn't matter" category =)

So we'll keep using GetColumnNumberFromLetter and implement the new ToBijectiveBase26.