libera/commonlisp - IRC Chatlog
Search
9:03:13
_death
ieee754-2008 has a section 5.12 Details of conversion between floating-point data and external character sequences.. I'm not an expert on floats, but skimming it they talk about "correct rounding" which 2.1.12 defines as the rounding determined by the applicable rounding direction
9:06:38
beach
Oh, but that's only valid when the result is exactly in the middle between two values.
9:09:11
beach
So if the next higher floating-point number is strictly closer to 7.78, then that next higher number should be returned.
9:11:27
_death
the definition of roundTiesToEven does not refer only to ties, it includes the notion of nearest
9:13:01
_death
JeromeLon: btw duckduckgo gave me https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf
9:46:29
lisp123
Is there a curated list of hall of fame posts from c.l.l (information / discussion)?
9:49:26
lisp123
flip214: I was thinking more broadly. I read it from time to time and there's a lot of wisdom in there
9:50:20
_death
there are too many.. just get the c.l.l archive and a good news reader (say gnus), sit back, relax, and spend a couple of months getting up to date
9:53:14
lisp123
I'm also trying to download all of CMU AI Repository at the moment, will put it on GitHub once done
9:56:51
JeromeLon
Sorry, I should have been clearer: the next single float was closer to the correct result
9:57:43
JeromeLon
PARSE-FLOAT is doing (+ 7.0 0.78) as its last step, loosing accuracy in the addition
9:59:32
JeromeLon
7.7800002 is closer than 7.7799997, but maybe that's how IEEE 745 addition is specified?
10:03:58
pjb
JeromeLon: (loop for *read-default-float-format* in '(short-float single-float double-float long-float) collect (+ 7 (read-from-string "0.78"))) #| --> (7.7799997 7.7799997 7.78D0 7.78D0) |#
10:04:40
pjb
JeromeLon: if you're into high precision floating point computations, you should put (setf *read-default-float-format* 'double-float) in your rc file.
10:04:46
beach
JeromeLon: And it is entirely possible that the addition is rounded correctly while still giving that result.
10:14:55
JeromeLon
Also, in C, the float addition result in rounding above: https://onlinegdb.com/pLSxkTMR-o
10:16:29
pjb
The point is that 7.0 = 111000000000000000000000e-21 0.78 = 111110001111010111000011e-24 = 000110001111010111000010e-21 and 7.0 + 0.78 = 111110001111010111000010e-21 = 7.7799997
10:17:26
_death
JeromeLon: you know the single float is converted to double when passed to printf right?
10:21:23
JeromeLon
pjb: "000110001111010111000010e-21" this looks truncated instead of rounded. 000110001111010111000011e-21 is closer
10:22:11
_death
JeromeLon: but you didn't give it 0.78, which is not representable as a floating point value
10:23:34
JeromeLon
_death: 0.78 can be represented as either .07799997 or 0.07800002, I believe the debate is about which one is more correct.
10:25:54
JeromeLon
0.78 can only be represented as 0.7799997 or 0.7800002 when we are missing 3 digits in single float.
10:28:55
_death
would it be fair to say that you want to ask why 0.78 gets converted to 0.7799997 and not 0.7800002
10:31:20
JeromeLon
_death: no (because it's not converted to that). I'll do a nice diagram and post back.
10:42:14
JeromeLon
Legent: when summing 7 0.779999971389770507812500 (which is the single-float representation of 0.78), the 2 possible roundings (7.779999732971191406250 and 7.780000209808349609375) have the same distance to non-rounded value.
10:45:37
_death
I didn't check the actual float value 0.78 gets converted to; only took the values discussed
10:49:50
JeromeLon
so the conclusion is that PARSE-FLOAT doing a sum (integer part + decimal part) can introduce an error when the rounding of the decimal part changes in turn the rounding of the sum.
10:55:03
_death
(loop for x in (list (parse-float:parse-float "0.78") (+ 7.0 0.78) 0.78) collect (decimals:format-decimal-number x :round-magnitude -24))
12:20:49
lisp123
Actually on that, can I compare two uninterned symbols for equality via (equal (symbol-name #:me) (symbol-name #:you))?
12:23:53
beach
The names are strings, so you can use whatever comparison on strings that works for you.
13:01:48
saltrocklamp[m]
<_death> "salt rock lamp: it's not bummer,..." <- that was quick! you found the issue?
13:02:49
saltrocklamp[m]
i see, very small fix: https://github.com/sharplispers/parse-number/pull/10
13:03:59
saltrocklamp[m]
has anyone here used `check-it` before https://github.com/DalekBaldwin/check-it? something like `parse-number` would be a great place for property-based testing
15:28:16
saltrocklamp[m]
https://bpa.st/LRTQ does anyone see an obvious reason why the lisp version of my code (with sbcl) is not only ~5x slower than the python version of my code, but also isn't giving the right answers? the correct output numbers should be something like `249.9008` and `288.6628`
15:32:30
Bike
the latter of which you could deal with by reusing a preallocating string instead of allocating a new one each time read-line is called; maybe python is smart enough to do that for the "for line in" construct
15:37:25
Krystof
I think python defaults to double float; SBCL definitely defaults to single-float. That might be enough to explain the different answers
15:39:24
Krystof
well, actually: your wrapper around parse-number returns (values 0.0 nil) on a parse failure, but your check tests for the primary value being null
15:40:37
saltrocklamp[m]
that said, i had a previous version of this that used READ + TYPEP to "parse" floats (returning NIL if it didn't actually read a float), and i think the answer was wrong there too, even when i set *READ-DEFAULT-FLOAT-FORMAT* to double-float
15:43:46
saltrocklamp[m]
and i will try re-using the string, i assume you mean i should `setq`/`setf` it instead of using a step-form in `do`?
16:30:13
saltrocklamp[m]
that's much nicer pjb , i was wondering if there was a tidy loop version. Bike , wouldn't that cause problems if the current line is shorter than the previous line?
16:34:24
saltrocklamp[m]
this is a great demo of advanced `loop`ing. setting `*read-default-float-format*` did fix the accuracy, but it's still ~7.5 seconds while the python version is ~1.5
16:35:15
saltrocklamp[m]
hard to tell which calls are slow, as opposed to just frequent. let me try the deterministic profile
16:40:13
saltrocklamp[m]
yep it looks like `read` is really the culprit, 0.000002 seconds per call at 2909618 calls, that's ~5.8 seconds spent on just `read`ing
16:46:45
saltrocklamp[m]
i'm open to suggestions for how to fix this.. i admit i'm disappointed, i was expecting lisp to at least be comparable to python
17:16:58
saltrocklamp[m]
weird that there's `parse-integer` in the standard but not `parse-float` - i saw some discussion about it above, maybe sicl will turn out to be the fast implementation :)
17:18:41
Krystof
read is a full parser; parse-number is presumably a limited tokenizer; parse-float will be even more limited
17:19:13
Krystof
I would also be a bit suspicious of the deterministic profiler; the overhead is substantial and subtracting the overhead off is not necessarily 100% correct
17:25:59
pjb
saltrocklamp[m]: now, reading in lisp involves decoding octet sequences from files, into text, sequences of characters.
17:26:36
pjb
saltrocklamp[m]: in C, and python does like C, one only processes the octets, and almost never decode them into actual characters.
17:27:26
pjb
saltrocklamp[m]: if you want to attain the same I/O performance, you must read octets in CL as well.
17:28:41
saltrocklamp[m]
python 3 strings are sequences of unicode code points, by default the input encoding is utf-8. so it's definitely doing full string parsing in my example (although i could probably make the python version faster by dropping down to use raw bytes)
17:29:23
saltrocklamp[m]
and in fact i think the internal storage is utf-16 or something like that, so not only is it parsing utf-8 but it's also converting it to another format
17:31:42
saltrocklamp[m]
i'm definitely open to suggestions though, maybe this is a missing piece in the library ecosystem
17:34:14
pjb
this is an intermediate solution: we still convert to string, but assuming pure ascii input (this could also be done with :external-format :us-ascii, but it is highly implementation dependent whether it's possible to set the external format of *standard-input*.
17:35:06
pjb
we could avoid converting to characters (which in sbcl take 32-bit each), by processing the octets directly. The float parsing function would have to be changed to use vectors of octets instead of strings.
17:35:18
saltrocklamp[m]
is this slurping the entire thing into memory? i forgot to mention this earlier, but one of the other requirements was to assume that the data is "huge" to the point where it can't be reasonably loaded all at once
17:36:00
pjb
saltrocklamp[m]: then yes, looping on reading a buffer with read-sequence, and processing the octets instead of converting to string would be best.
17:36:28
saltrocklamp[m]
i am curious what python, lua, and nim are doing that make this so much more efficient than whatever sbcl and ccl are doing
17:36:58
pjb
saltrocklamp[m]: you can use https://github.com/informatimago/lisp/blob/master/common-lisp/cesarum/ascii.lisp#L382 to help processing octets of ascii codes.
17:37:41
pjb
Notably, if you want to split the lines on newlines: https://github.com/informatimago/lisp/blob/master/common-lisp/cesarum/ascii.lisp#L584
17:39:56
saltrocklamp[m]
i think i was confused before. you are talking about the number parsing part?
17:40:00
pjb
which, for a file that contains mostly 10, 43, 45, 46, and 48-57, let you avoid a lot of processing…
17:41:49
saltrocklamp[m]
`for line in sys.stdin` in python iterates over true unicode strings, not byte sequences. but it would make sense if e.g. `float(s)` operated on the underlying bytes of the string `s`
17:43:16
saltrocklamp[m]
your `contents-from-stream` function seems to implement Bike's suggestion to re-use the string
17:53:30
saltrocklamp[m]
it's not utf-8 internally in cpython at least, they use some wider encoding in order to do string lookups in constant time
17:54:22
saltrocklamp[m]
i'll have to spend some time reading these code snippets, my understanding of how "streams" work in list is hazy still
17:55:10
saltrocklamp[m]
i do wonder about the memory allocation, i am trying to look at the generated c code from the nim version to see if that's what nim does
20:25:13
James`
Anybody know how I can write a defmacro within a defun? Similar to this post on SO, but for defmacros
20:26:12
Bike
you can use macro-function instead of symbol-function there, but you probably don't want to actually do this
20:31:23
James`
Now I want to automate the name generation, so I can write each test like a reader macro #T(...) and have the names automatically created for each
20:32:01
James`
So I just write the basic test, and I extract from it which function I'm testing and a counter for that function (stored in a hash table) to then build the deftest form
20:37:43
Bike
can you just do like, (defmacro my-deftest (&body body) `(deftest ,(generate-a-name) ,@body)), or what am i missing here
20:49:34
James`
It doesn't seem to create the test macro (e.g. I do my-test-1231 and I get an error 'No test ....)
20:49:54
James`
Whereas if I manually do it, it works. I think it may have something to do with interning symbols...
20:50:39
James`
(defun generate-a-name () (incf *counter*) (make-symbol (concatenate 'string "my-test-123" (write-to-string *counter*))))
20:51:58
Bike
i'm not sure i understand your goal here. You want nameless tests except you actually do need to know the name?
20:53:31
Bike
okay, well how about you store the names somewhere. do (defvar *test-names* nil) and then (defmacro my-deftest (&body body) (let ((name (generate-a-name))) `(progn (push ',name *test-names*) (deftest ,name ,@body)))
20:57:54
James`
Thanks, I will play around with it, don't want to take up more of your time. I faced the same issue, 'no test ....' after running it, but I gotta figure this out for myself, sounds like its possible to do it the way you say and what I thought initially so its something else