Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
286 views
in Technique[技术] by (71.8m points)

python - Pandas Float Precision - Apparently Identical Numbers Showing as Not Equal

Hopefully a very simple solution to this. I have tried the solutions for two similar questions on SO, but these haven't worked for me.

Essentially I have a process which evaluates whether numbers in two columns of a dataframe are equal or not. For the vast majority this appears correct. However, in a very small number of cases, seemingly equal numbers are showing (to 6 decimal places) as not equal.

Clearly this is down to how my numbers are stored vs what I am seeing. But bizarrely, the data source of these numbers only stores them as 6dp, and trying to increase display.precision doesn't have any effect - I still only see 6dp.

a=df[df['Timestamp']=='2018-03-04 22:29:57']['Limit'].copy()

b=df[df['Timestamp']=='2018-03-04 22:29:57']['Quote'].copy()

pd.options.display.precision
Out[152]: 10

a
Out[153]: 
15571027   25.850000
Name: Limit, dtype: float64

b
Out[154]: 
15571027   25.850000
Name: Quote, dtype: float64

a==b
Out[155]: 
15571027    False
dtype: bool

a-b
Out[156]: 
15571027   -0.000000
dtype: float64

b>a
Out[157]: 
15571027    True
dtype: bool

I am hoping some kind soul might be able to suggest the next logical steps I could try here - clearly b is greater than a, but 1) I cannot display this, and 2) I would ultimately like to create boolean comparisons which I know will be accurate to the same precision as I am displaying.

Many thanks in advance!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

OK, so I found the root of the problem.

Initially, all of these numbers were identical as floats at the start of the process, then by the end a handful of numbers had somehow been altered as floats, even though they hadn't changed in terms of how they were displayed.

The culprit was joining them via a large SQLite operation within pandas - there was too much data to attempt to merge, so they were imported as tables, joined as needed, and written back out as a dataframe. Even though no numbers were actally altered (ie by division / aggregation etc), it was during this process that the stored definition of some of the floats were altered.

Thanks @Artyom Akselrod for the education - by implicitly rounding the output in the SELECT statement, this sorted the problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...