Abbreviating Huge and Minuscule Numbers with Math and JavaScript

The Problem

Let’s say we want to write an algorithm for formatting very large and very small numbers using the standard SI decimal prefixes (because we really, really do!). We want something that will take numbers and create the abbreviated output, as below:

  1. 123784692876928714 → ‘123.78P’
  2. 1237846924 → ‘1.24G’
  3. 0.0000000000000002342 → ‘23.42f’
  4. 0.0000000002342 → ‘23.42n’
  5. 0.00000002342 → ‘234.22µ’
  6. 0.000012345 → ‘123.45m’
  7. 1234 → ‘1.23k’

I’m sure there are quite a few ways to do this. What if we could write something with only two non-repeated conditionals using a few math concepts in eight lines of code? Would you be more interested in learning about the math behind it? Let get started. First, let take a look at this algorithm and the lookup table it uses:

The Proposed Solution

// This is the list of standard SI unit prefixes
var symbols =  {
  '-8': 'y',
  '-7': 'z',
  '-6': 'a',
  '-5': 'f',
  '-4': 'p',
  '-3': 'n',
  '-2': 'µ',
  '-1': 'm',
   '0':  '',
   '1': 'k',
   '2': 'M',
   '3': 'G',
   '4': 'T',
   '5': 'P',
   '6': 'E',
   '7': 'Z',
   '8': 'Y',
   '9': 'H'  // Though not official, 'hella' is hella big → 10^(9*3) or 10^27
};

function formatNumber(val, decimalPlaces) {
  var exponent = Math.log(val) / Math.log(10);
  var magnitudeExp = Math.floor(exponent);
  var hasIntegerComponent = magnitudeExp >= 0;
  var sign = (hasIntegerComponent) ? -1 : 1;
  var adjustment = sign * (magnitudeExp % 3);
  var significand = val / Math.pow(10, magnitudeExp + adjustment);
  var index = (hasIntegerComponent) ? Math.floor(magnitudeExp / 3) : Math.ceil(magnitudeExp / 3);
  return significand.toFixed(decimalPlaces) + symbols[''+index];
}

That’s it! That’s all there is to the algorithm! It will handle numbers having up to 29 digits and small numbers having a significant decimal place with 24 zeros in front of it. If that’s not good enough, an index and symbol for it should be added in the lookup table.

If math like this is foreign, a little head scratching might be in order. The math is actually pretty simple. There’s some interesting things about number bases and logarithms here.

Number Bases

Number bases? As a programmer, one might be familiar with a few number bases — like decimal, hexadecimal, binary, and, perhaps, octal. A number base signifies how many symbols can be used to represent one digit. For instance, in decimal — or base 10 — there are 10 symbols that can represent a digit. These digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. In binary, two symbols are used — 0 and 1. In Hexadecimal, 16 symbols are used — 0-9 and A-F. In octal, the digits 0-7 are used — though, octal is rare these days.

Each digit in a number in a certain base has a certain value, which is related to the number base. Each number position, starting with the ones place in the 0th position (the rightmost side) and continuing to the last or nth position, has a value which is a multiple of the base. Here are examples in a few number bases.

Decimal

1,234,567

Position: 6 5 4 3 2 1 0
Digit value: 1 2 3 4 5 6 7
Place value: 10^6 10^5 10^4 10^3 10^2 10^1 10^0
Position value: 1,000,000 200,000 30,000 4,000 500 60 7

To get the value of the number in decimal, simply add the values up:

\displaystyle 1,000,000 + 200,000 + 30,000 + 4,000 + 500 + 60 + 7 = 1,234,567

So each position has a value of the place times the value of the digit. In decimal, there is the ones place, the tens place, the hundreds place, the thousands place, the ten thousands place, and so on. Each digit happens to have the value that we normally give it, because the this is what we normally use and are used to.

What about other systems? Let us take a look.

Binary
10101010
Position: 7 6 5 4 3 2 1 0
Digit value: 1 0 1 0 1 0 1 0
Place value: 2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0
Position value: 128 0 32 0 8 0 2 0

To get the value of 10101010 in decimal, simply add the values up:

\displaystyle 128 + 0 + 32 + 0 + 8 + 0 + 2 + 0 = 170

Again, the value of the position if the value of the digit times the place value. It only has two digits and they still have the same values as the decimal versions of those digits. What about a system with more symbols than ours?

Hexadecimal

DEAD

“Wait, what!? That doesn’t even look like a number!”

In hexadecimal, it’s a perfectly valid number. The digits A-F are used in addition to digits 0-9. In fact, the digit A comes right after 9 and it has a value of 10. The digits B-F have the values 11-15, respectively.

Position: 3 2 1 0
Digit value: D = 13 E = 14 A = 10 D = 13
Place value: 16^3 16^2 16^1 16^0
Position value: 13 \times 4,096 14 \times 256 10 \times 16 13 \times 1

To get the value of the number in decimal, simply add the values up:

\displaystyle 53,248 + 3,584 + 160 + 13 = 57,005

Summary

Whew! What a whirlwind tour of number systems! This is all of the basic knowledge needed to understand what comes next. This covered the basics of number systems:

  • Each number system in base b uses b digits.
  • Each digit is in a position. Starting from the rightmost digit at 0 and increasing with each digit moving leftward.
  • Each digit in a number system has a value from 0 to b - 1 .
  • Each position, p , has a place value, or weight, which is b^{p} .
  • A digit with value, v , in a position, p , has a position value of v \times b^{p}
  • Number has a value of the sum of the digit values times their weights.

This also means that every number is actually a polynomial in disguise:

\displaystyle number = \sum^{places}_{i=0}{digit_{i} \times b^{i}}

The number of digits or places in a number indicates the order of the polynomial.

Logarithms

What is a logarithm? A logarithm is a special mathematical function that gives us a special number. Normally, it is written \log or \ln — the second means the base is \mathrm{e} (Euler’s constant). It is a special function that looks like this when graphed:

Graph ln

This graph conveys a few useful facts about the logarithm: The logarithm of a value is always less than the value. If an identity line of y = x was drawn on the above graph of y = \ln x , the two graphs would actually never intersect. \ln x is only zero at x = 1 . It could also be argued that \ln x doesn’t have a value at x = 0 . The heart of a logarithm lies in the mathematics defining it.

The base b logarithm gives the value of x in the equation:

\displaystyle b^x = n

The logarithm in base b of a number n is written:

\displaystyle x = \log_{b} n

In the case of JavaScript’s Math.log(), it is a natural logarithm. It will solve for x in this equation:

\displaystyle \mathrm{e}^x = n

The base 10 logarithm, \log_{10} , can be found by taking \ln n / \ln 10 . Generally, the base b logarithm can be found by \log_{b} n = \frac{\log n}{\log b} , where \log is a logarithm in any base available for you to use. Most likely it will be \ln , since it is so useful in math.

Now why is a logarithm useful?

Logarithms are useful for numerous reasons:

  • They can turn multiplication into addition. For instance, a^{x} \times a^{y} = a^{x+y} . Anyone can take the shortcut and mentally add up the numbers, but a computer can’t. However it can be programmed to use the properties a logarithms to change simplify parts of calculations without having to compute full values of repeated exponentiation until the final step. This also makes sure the numbers are smaller until the last step.
  • If one wants to encode a certain number of states as digits in a system, one would like to know how many digits that many states would take up:
    • In base 2, \log_{2} 2 = 1 . We can use 1 bit to represent two states. We can use 3 bits to represent 8 states (000, 001, 010, 011, 100, 101, 110, 111), \log_{2} 8 = 3 .
    • In base 10, \log_{10} 10 = 1 . This means we can use one decimal digit to represent 10 states — the range 0-9 encompasses ten digits. We can represent 1000 states with 3 digits (0-999), log_{10} 1000 = 3 .
    • Generally, digitsForNStates_{b} = \log_{b} N where N is the number of states to represent in base b .
  • In general, 1 + \log_{b} n gives us the length of digits used to represent a number n in the base b number system. For example 1 + \log_{2} 256 = 9 , which makes sense since we can represent 256 values from 0 through 255 with 8 bits. We cannot represent 256 with 8 bits, but we can with 9.
  • In physics and information theory, a logarithm is related to a value called the entropy of the system.

The primary reason why a logarithm is useful in this case is because it yields an exponent that encodes a lot about a number. Here’s a an example with a number in decimal:

\displaystyle 1,234 = 1 \times 10^{3} + 2 \times 10^{2} + 3 \times 10^{1} + 4 \times 10^{0}

This could also be written as:

\displaystyle 1.234 \times 10^{3}

In this case, 1.234 is the significand and 10^{3} is the magnitude.

Taking the base 10 log of 1,234 yields an interesting number. This number will be between 3 and 4. Why is it between 3 and 4? Well, \log_{10} 1,000 = 3 and \log_{10} 10,000 = 4 and this is somewhere between those two numbers. For instance, \log_{10} 1,000,000,000 = 9 for any number n between 1,000,000,000 and 10,000,000,000 (not including 10,000,000,000), the log_{10} n will be between 9 and 10 (not including 10).

This is also a valid relationship:

\displaystyle 1,234 = 10^{(3 + fraction)}

Now why is it 3 and a fraction? Taking the integer part of \log_{10}1,234 gives three. Taking 10^{fraction} gives 1.234. This means 10^{3} \times 10^{fraction} = 1,234 . So taking this and turning it into a bunch of math gives us everything anyone would ever want to know about the number and the different logarithms. It also means this information can be used to derive the magnitude of the number and get it’s most significant digits for any number.

Here is every identity and step used to formally derive these quantities:

  1. value = significand \times magnitude
  2. significand = 10^{significandExp}
  3. magnitude = 10^{magnitudeExp}
  4. value = 10^{significandExp} \times 10^{magnitudeExp}
  5. \log_{10} value = \log_{10} 10^{magnitudeExp} \times \log_{10} 10^{significandExp}
  6. \begin{array}[b]{l}      \log_{10} 10^{magnitudeExp} \times \log_{10} 10^{significandExp}\\      =\log_{10} 10^{(magnitudeExp + significandExp)}    \end{array}
  7. \log_{10} value = \log_{10} 10^{(magnitudeExp + significandExp)}
  8. exponent = \log_{10} value
  9. exponent = \log_{10} 10^{(magnitudeExp + significandExp)}
  10. exponent = magnitudeExp + significandExp
  11. value = 10^{exponent}
  12. value = 10^{magnitudeExp + significandExp}

By definition, magnitudeExp is an integer. The floor operation on the \log_{10} yields magnitudeExp . Also by definition, significandExp must be less than one or greater or equal to zero. But why must it be greater than or equal to zero and less than one? A decimal significand is going to be between 1 and 9. If the significand was zero, the whole number would be zero. If the significand was less than 1, then the magnitude was wrong. If the significand is greater than 9, the magnitude was wrong. These numbers map to values of 10^{x} , where 0 \leq x < 1 .

And of course, there’s the most straight forward way to get the significand:

  1. value = significand \times magnitude
  2. magnitude = 10^{magnitudeExp}
  3. value = significand \times 10^{magnitudeExp}
  4. significand = \frac{value}{10^{magnitudeExp}}

This method, just using exponentiation and division, is probably more accurate than using the significandExp in most cases, since finding the exponent using the Math.log function is just an approximation. Using it to extrapolate the significant digits can lead to errors, especially with very large or small numbers.

Back to the algorithm

The first few lines should be pretty obvious, given the explanation above:


var exponent = Math.log(val) / Math.log(10);
var magnitudeExp = Math.floor(exponent);

This figures out a few things:


var hasIntegerComponent = magnitudeExp >= 0;
var sign = (hasIntegerComponent) ? -1 : 1;
var adjustment = sign * (magnitudeExp % 3);
  • If the number has just a fractional component (magnitudeExp is less than zero) or if it has a whole number part, too (magnitudeExp is greater than zero).
  • The sign of the adjustment. If the number has a whole number part, the sign is negative and the adjustment is subtracted. If the the number is a fraction, the sign is positive and the adjustment is added.
  • The adjustment itself. This is how many digits away from a grouping of 3 digits the number would be given the magnitudeExp.

If scientific notation was all that was required, the significand and magnitudeExp would satisfy the problem. This algorithm is not actually converting to scientific notation. The algorithm should not show only one leading digit and a few decimal places. It should group digits into a maximum of three leading digits. Numbers between 000 – 999 will always be displayed before the suffix. It should be the equivalent of dividing the number by 1000, 1000000 and so forth (these numbers have counts of zeros which are multiples of three). This will group numbers into groups of 3 digits at most before the decimal place — this is where subtracting magnitudeExp % 3, (“magnitudeExp modulo 3”, the remainder of dividing by 3) comes from. That makes sure the abbreviations are only for place holders for the thousands, millions, billions, and so forth and never a placeholder for, say, ten thousand or a hundred million.

This code takes the adjustment into account when figuring out the significand to make sure that there is a group of at most 3 digits in the significand:


var significand = val / Math.pow(10, magnitudeExp + adjustment);

This code makes sure the suffix appended to the number matches up with the grouping of three most significant digits:


var index = (hasIntegerComponent) ? Math.floor(magnitudeExp / 3) : Math.ceil(magnitudeExp / 3);
return significand.toFixed(decimalPlaces) + symbols[''+index];

Depending on the logarithm of the number being positive or negative, we either take the floor or ceiling of magnitudeExp / 3, respectively. This makes sure we get the right integer for the index in the suffix lookup table.

That integer part of \frac{magnitude}{3} is actually the 1000, 1000000, 1000000000, and so forth in disguise. In fact Math.pow(10, Math.floor(magnitude / 3)) gives those values. This compresses the table of symbols so we don’t have something like the following, but rather the look-up table in the topmost code listing:


{
  ...
  -1: 'm', -2: 'm', -3: 'm',
   0:  '',  1:  '',  2:  '',
   3: 'k',  4: 'k',  5: 'k',
  ...
}

I suppose that the algorithm embodies a lot of knowledge about number systems, bases, and logarithms; but it’s very elegant. 😀 I haven’t tested how efficient it is compared to other methods, but, given the flexibility it has from the way it’s derived, it’s probably a decent trade-off.

Advertisements