Floating Point Representation

1. Senior Member
Join Date
Dec 1969
Posts
11,334

## Floating Point Representation

I was having a problem awhile back computing a sum of floats, and ran across this article today, luckily, and in it it states:<BR><BR>"The problem is that some decimal numbers, like 0.1, have no exact floating-point representation."<BR><BR>*** is up with that??? I thought 0.1 IS the floating point number.... am I missing something that&#039;s going on underneath the sheets?<BR><BR>This explains a lot now why a programmer here passes back to me a float as 12000, when it really means 120.00 (and I just divide by 100). Does anybody know the reason?

2. Senior Member
Join Date
Dec 1969
Posts
96,118

## RE: Floating Point Representation

The *internal* format of a floating point number is a *BINARY* floating point representation.<BR><BR>That is, it is a binary number with a power-of-2 exponent!<BR><BR>120.0 is 1.20e+2 in *decimal* floating point notation. In binary floating point it is...ummm...let me get my fingers and toes out here...<BR>&nbsp; &nbsp; 01111000.0 e +0000<BR>or, in "normalized" form:<BR>&nbsp; &nbsp; 0.1111000 e +0111<BR><BR>(Normalized form in ANSI binary floating point puts the binary point [*not* decimal point!] before the first "1" bit of the number. The zero and the binary point don&#039;t really appear there. I just threw them in to make it clearer.]<BR><BR>Now...let&#039;s go back to decimal floating point for a bit.<BR><BR>How do you represent 1/3 (one-third, the fraction) using, say, 15 decimal digits? Hmmm???<BR>&nbsp; &nbsp; 0.333333333333333 E + 00<BR><BR>right?<BR><BR>*BUT* is that the *exact* representation???? OF COURSE NOT! <BR><BR>In actuality, 1/3 has an *infinite* number of repeating digits, yes?<BR><BR>Okay...now how do you represent 1/10th in *binary* floating point? With 53 bits of precision? [A "double" floating point number has 53 bits dedicated to the "mantissa" and 11 to the binary "exponent"....total 64 bits, 8 bytes.]<BR><BR>Gonna make me do the math are you... Okay...<BR><BR>1/16 + 1/32 + 1/256 + 1/512 + ...oh the heck with this:<BR><BR>Binary representation of 1/10 is 0.000110011001100110011001100110011001100110011001 1001101 <BR><BR>I don&#039;t think you&#039;ll be surprised to find thet this representation is just as incomplete as 0.333333333333333 is for 1/3 in decimal.<BR><BR>Incidentally, normalizing that number we would get:<BR>0.1100110011001100110011001100110011001100 110011001101 e - 00000000011<BR><BR>If you care, by this point.<BR><BR>So...does this help? Or hurt?<BR><BR>Remember, there&#039;s only 53 bits there. So 0.1 is represented with reasonable accuracy. The error is at most 1 divided by 2^53, right?<BR><BR>But what about 123456.1??? Hey, it takes *17 bits* to represent 123456.0, so that leaves a *LOT* fewer bits to represent the 0.1 as best as can be done. Meaning that now the error can be as much as 1 divided by 2^36.<BR><BR>Hokay?<BR><BR>And, incidentally, I converted 1/10 to binary this way:<BR><BR>&#060;HTML&#062;&#060;BODY&#062;<BR>&# 060;%<BR>n = 0.1<BR>p = 0.5<BR>sum = 0.0<BR>str = "0."<BR><BR>For i = 1 to 55<BR>&nbsp; &nbsp; If (sum+p) &#062; n Then <BR>&nbsp; &nbsp; &nbsp; &nbsp; str = str & "0"<BR>&nbsp; &nbsp; Else<BR>&nbsp; &nbsp; &nbsp; &nbsp; str = str & "1"<BR>&nbsp; &nbsp; &nbsp; &nbsp; sum = sum + p<BR>&nbsp; &nbsp; End If<BR>&nbsp; &nbsp; p = p / 2<BR>Next<BR>%&#062;<BR>Binary representation of 1/10 is &#060;% = str %&#062;<BR>&#060;/BODY&#062;&#060;/HTML&#062;<BR><BR><BR><BR>

3. Senior Member
Join Date
Dec 1969
Posts
96,118

## Just for the fun of it...

Here&#039;s a version of that ASP page that will show you the binary representation of numbers in the range of roughly 1/1,000,000 to 1,000,000. When the numbers get too small, it puts too many zeroes in right after the binary point to give a usable result. And if the number is too large it will just return all "1" bits. So it&#039;s more a toy than the "real thing", but at least it should give you an idea of how things work.<BR><BR>Try your own values in that range for the initial value of "sn":<BR><BR>*******************************<BR>&# 060;HTML&#062;&#060;BODY&#062;<BR><BR>&#060;%<BR>s n = "12345.1"<BR>n = CDbl(sn)<BR>p = 1024*1024 &#039; up to 1 million<BR>sum = 0.0<BR>str = ""<BR>started = False<BR><BR>For i = 1 to 55<BR>&nbsp; &nbsp; If (sum+p) &#062; n Then <BR>&nbsp; &nbsp; &nbsp; &nbsp; If started Then str = str & "0"<BR>&nbsp; &nbsp; Else<BR>&nbsp; &nbsp; &nbsp; &nbsp; str = str & "1"<BR>&nbsp; &nbsp; &nbsp; &nbsp; sum = sum + p<BR>&nbsp; &nbsp; &nbsp; &nbsp; started = True<BR>&nbsp; &nbsp; End If<BR>&nbsp; &nbsp; If p = 1 Then<BR>&nbsp; &nbsp; &nbsp; &nbsp; If Len(str) = 0 Then str="0." Else str=str & "."<BR>&nbsp; &nbsp; &nbsp; &nbsp; started = True<BR>&nbsp; &nbsp; End If<BR>&nbsp; &nbsp; p = p / 2<BR>Next<BR>%&#062;<BR>Binary representation of &#060;% = sn %&#062; is &#060;% = str %&#062;<BR>&#060;/BODY&#062;&#060;/HTML&#062;<BR>

4. Senior Member
Join Date
Dec 1969
Posts
11,334

## RE: Floating Point Representation

I bought Wrox&#039;s Beginning C++ last night (I like the author Ivan Horton, who also wrote the Java book I have), and I started seeing how he was doing it... conceptually, in a float, numbers may look like this:<BR><BR>00000000000000123<BR>123.000000000000 0<BR><BR>and so on (don&#039;t quote me with the length of the bits here). He says the decimal point just &#039;floats&#039; along the bits.<BR><BR>So it seems floats are a PITA -- I think a processor manufacture a few years back had major problems with handling floating point numbers.<BR><BR>Probably explains why the C guy returns me floating point numbers as integers (as I stated in my first post). I think (?) C++ has a function that does that for you -- from what I&#039;ve read last night.<BR><BR>Makes much more sense now, why you never use conditions based on floating point numbers, such as:<BR><BR>float f = 0.1;<BR><BR>do {<BR><BR>//something<BR>} while (f == .1) //or something<BR><BR>since we may see .1 == .1, but as you explained, it could be two different binary numbers, correct? Which would make sense why you never check for equality between floating point numbers.<BR><BR>**** Bill, you need to write a book on this stuff. You explain it better than some of these authors.<BR><BR>BTW -- this Ivan Horton guy, reminds me of you. Kinda looks like you too. For instance, he started talking about decimals, and HAD to interject that decimal is from Latin decimalis (sp?), which meant tithe (or tax at the time), meaning the Romans only had to pay 10%, which is why we&#039;re based on a system of 10. Then goes on about "those were the days..."<BR><BR>If anyone can squeeze that into a C++ book, I&#039;m sure you could also ;)<BR><BR>Thanks for the explanation!

5. Senior Member
Join Date
Dec 1969
Posts
11,334

## BTW - Here's the link

That explains what the problem is, and how to get around it:<BR><BR>http://developer.java.sun.com/developer/JDCTechTips/2001/tt0807.html<BR><BR>If anyone&#039;s interested.

6. Senior Member
Join Date
Dec 1969
Posts
96,118

## You can do it in VBS, too...

&#062; Probably explains why the C guy returns me floating point <BR>&#062; numbers as integers (as I stated in my first post). I think <BR>&#062; (?) C++ has a function that does that for you <BR><BR>Ummm...you can&#039;t *REALLY* return floating point numbers as integers. How would you return 0.0000000071282 as an integer???<BR>Or 3.14E+75??? <BR><BR>But, yes, you can return *currency* as an integer number of pennies, and lots of systems do that, rather than mess with the vagaries of floating point numbers. I might point out that the normal range of integers is roughly +/- two billion. Make that two billion pennies, and you can see that "scaled integers" used for money break down past \$20,000,000.00 -- so fine for your personal finances, but many corporations will bust right through that and the government....well, the government could scale the other way and use integers to keep track of millions of dollars, maybe.<BR><BR>ANYWAY...<BR><BR>You can easily scale in VBS just as well as in C/C++:<BR><BR>&#060;%<BR>amount = 3456.78 &#039; \$3,456.78<BR>scaledCurrency = CLng( 100 * amount )<BR>%&#062;<BR><BR>No different than doing this in C/C++:<BR><BR>double amount = 3456.78;<BR>long scaledCurrency = (long) (amount * 100.0);<BR><BR>

7. Senior Member
Join Date
Dec 1969
Posts
96,118

## Shows same 'bug' my code did!

It shows that 0.1 is represented as:<BR><BR>100110011001100110011001100110011001100 1100110011010<BR><BR>Actually, there is an understood 1 at the front of that...that&#039;s how normalized numbers work in IEEE floating point, so it&#039;s really (if we put in the "binary point"):<BR><BR>0.00011001100110011001100110011001 100110011001100110011010<BR><BR>But the imporant thing is to look at the tail of that number:<BR><BR>11010<BR><BR>OOPS! As we&#039;ve discovered, 0.1 is an infinitely repeating pattern in binary floating point. The pattern being 1100. So shouldn&#039;t the end of that be<BR><BR>11001<BR><BR>instead of 11010 ???<BR><BR>Well, this is *EXACTLY* where the "bug" in exact representation of decimals lies! Remember, somebody had to write a program that converted the STRING form "0.1" to internal floating point. And at *SOME* point, when you have a number that needs more bits [such as an infinite number] to represent it than the hardware has available, you have to ROUND the ending bits.<BR><BR>And 1100110011001100 rounded to 5 bits is...you guessed it...11010<BR><BR>And it is really that rounding error that gets us in trouble!<BR><BR>We&#039;re only off by half a bit, yes? But if you multiply that number (0.1) by 100, say, now you are off 100 *times* a half a bit! And now the difference between 10.0 (which *can* be represented exactly in floating point) and (100 * 0.1) becomes evident. <BR><BR>

8. Senior Member
Join Date
Dec 1969
Posts
11,334

## Ah, clear now

I guess in my mind I kept thinking decimal... I mean, I know .33333 goes on forever, but I seen .1 as .1000000000000 and so forth, and didn&#039;t get that.<BR><BR>Very dangerous observation also... I found it very interesting that 56 * .01 = 5.6000000000000005<BR><BR>Equally as interesting is this statement:<BR><BR>&#062;&#062;In other words, 0.09375 is exactly representable, and 0.1 is not<BR><BR>It&#039;s hard to think binary at times... seems .1 would be much easier, but now that you show me that odd little &#039;bit&#039; on the end, that makes perfect sense!<BR><BR>Oh, and you were right... we only use the int type to pass money back and forth.<BR><BR>Thanks!

9. Senior Member
Join Date
Dec 1969
Posts
11,334

## Irk, make that 56 * .1

Long day so far.

10. Senior Member
Join Date
Dec 1969
Posts
96,118

## One more fun example...

Try this ASP page:<BR><BR>******** productVersusSum.asp ***********<BR><BR>&#060;HTML&#062;&#060;BODY&#062 ;<BR><BR>&#060;%<BR>onetenth = 0.1<BR>sum = 0<BR><BR>For i = 1 TO 10000<BR> sum = sum + onetenth<BR>Next<BR><BR>product = 10000 * onetenth<BR>%&#062;<BR><BR>0.1 added to itself 10,000 times totals &#060;% = sum %&#062;&#060;P&#062;<BR>10,000 multiplied by 0.1 gives &#060;% = product %&#062;&#060;P&#062;<BR>The difference in the two is &#060;% = ( product - sum ) %&#062;&#060;P&#062;<BR>The difference between 1,000 and the product is &#060;% = ( 1000.0 - product ) %&#062;&#060;P&#062;<BR>The difference between 1,000 and the sum is &#060;% = ( 1000.0 - sum ) %&#062;&#060;P&#062;<BR>&#060;/BODY&#062;&#060;/HTML&#062;<BR><BR>******************************** *******<BR><BR>Why does 10,000 times 0.1 give the exact answer? Because, even in binary floating point, when you get a final answer you have to round it. And that product, rounded to the requisite 53 bits, just *HAPPENS* to be exactly right!<BR>

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•