libera/commonlisp - IRC Chatlog

6:59:36 dickbar__ Just a remark: "7.78" should be printed as 7.78000000... (infinit number of zero's). You wil hear from me when i'am ready checking:-)

7:00:28 beach I believe that's right. So that suggests that it was parsed incorrectly.

7:03:39 dickbar__ Still checking:-)

7:04:09 jackdaniel "7.78" as a string should be printed as "7.78" or, aesthetically, as 7.78 - but it is not a number ;)

7:09:51 nckx_ ** NICK nckx

7:22:52 saltrocklamp[m] does anyone here use the `parse-number` library? it seems to triggers some kind of deep internal error in both ccl and sbcl when attempting to parse a string with a literal tab character in it. e.g. in ccl i get "Error: The value NIL is not of the expected type UNSIGNED-BYTE. While executing: (:INTERNAL CCL::BAD-SEQUENCE-INTERVAL CCL::CHECK-SEQUENCE-BOUNDS)"

7:42:44 semz not for me

7:43:16 semz do you have an example string?

7:43:37 semz saltrocklamp[m]: sorry, forgot to highlight

7:47:20 saltrocklamp[m] semz: just `(string #\tab)`

7:48:04 semz ah, now I'm getting it too

7:48:25 saltrocklamp[m] bummer. i guess it's a bug

8:00:10 kpoeck ::notify cracauer static-vectors loads fine in clasp

8:00:10 Colleen kpoeck: Got it. I'll let cracauer know as soon as possible.

8:26:12 JeromeLon oh no, I missed one of my favorite descussions on binary numbers. I think a good answer was: 7.78 is 111.1100011110101110000101000111101011100001010001111010111000010100011110101110000101000111101011100001... in binary, which is truncated as specified by IEEE754 single precision to 111.110001111010111000010, which is exactly 7.7799997 in decimal.

8:28:58 _death saltrocklamp[m]: it's not bummer, it's an opportunity (submitted a PR)

8:30:47 beach JeromeLon: But the next higher single float is closer to 7.78 than the truncated one I think. So that means that PARSE-FLOAT truncates rather than returning the best approximation. Right?

8:32:25 jackdaniel if ieee754 specifies truncating then it should truncate (well, not that common lisp tandard stipulates that its floats behave exactly like ieee754)

8:33:00 jackdaniel marcoxa works on a spec for common lisp floats, it was mentioned during last (or one before last) els

8:33:11 beach Does IEEE745 specify that it should truncate when a decimal representation is converted to a single float?

8:33:12 jackdaniel s/on a spec/on a better spec/

8:33:41 jackdaniel I don't know, I'm repeating what JeromeLon said "which is truncated as specified by IEEE754 single precision"

8:34:13 beach But how is the original binary number obtained, and why was it truncated?

8:34:18 beach There is no reason for that.

8:34:33 beach That's just a choice by PARSE-FLOAT it seems.

8:34:58 beach Whereas there are papers published on how to obtain the closest float for a particular decimal representation.

8:36:04 JeromeLon IEEE745 specifies several rounding methods. Does it recommend one of them? I don't know actually

8:36:22 beach JeromeLon: But isn't that beside the point?

8:36:53 beach JeromeLon: There is no reason for PARSE-FLOAT to generate the original value that you showed.

8:37:02 JeromeLon beach: yes, I agree with you, OC should ponder the source of the number 7.78. If it's a currency, it should not be parsed as a float.

8:37:23 beach That's again a different issue.

8:38:00 beach The best action on the part of PARSE-FLOAT would, in my opinion, be to return the best approximation. Not the truncation.

8:39:10 beach And this opinion has nothing to do with IEEE truncation or rounding, nor anything about currency. Just how PARSE-FLOAT has decided to do it.

8:40:21 beach JeromeLon: But, your remark certainly explains the observation by opcode.

8:41:45 JeromeLon beach: I agree that the best approximation would be more correct. I disagree that it has nothing to do with IEEE 754 truncation or rounding. If IEEE 754 specifies clearly which rounding should be done, that PARSE-FLOAT should just do that.

8:42:09 JeromeLon I can't find which rounding is recommended, though.

8:45:34 beach JeromeLon: Where does IEEE say what rounding should be used when a decimal representation is converted to an IEEE float?

8:46:29 beach Notice "when a decimal representation is converted", not "when a higher-precision binary representation is converted".

8:49:55 JeromeLon beach: from wikipedia: "754-2008 requires correctly rounded base conversion between decimal and binary floating point within a range which depends on the format"

8:50:24 JeromeLon but I can't find an online version of 754-2008 to check what "correctly" means.

8:50:28 beach In that case, truncation is not the right thing to do here.

8:51:31 JeromeLon I agree. And it means that whatever 754 defines as the correct rounding is what PARSE-FLOAT should be doing

8:58:13 beach As far as I can tell, IEEE rounding has to do with the result of operations between floating-point numbers.

8:58:21 beach But this is not such an operation.

9:01:55 beach The IEEE standard says "they shall use correct rounding" for the conversion, which sounds to me like "they shall return the closest floating-point number".

9:02:24 jdz_ ** NICK jdz

9:03:13 _death ieee754-2008 has a section 5.12 Details of conversion between floating-point data and external character sequences.. I'm not an expert on floats, but skimming it they talk about "correct rounding" which 2.1.12 defines as the rounding determined by the applicable rounding direction

9:05:53 beach Hmm.

9:06:08 _death 4.3.3 says roundTiesToEven should likely be the default

9:06:38 beach Oh, but that's only valid when the result is exactly in the middle between two values.

9:06:42 JeromeLon which means round to closest and tie to even

9:07:10 beach Yeah.

9:08:57 _death right, so I'm guessing it also implies 4.3.1 round to nearest

9:09:11 beach So if the next higher floating-point number is strictly closer to 7.78, then that next higher number should be returned.

9:11:27 _death the definition of roundTiesToEven does not refer only to ties, it includes the notion of nearest

9:13:01 _death JeromeLon: btw duckduckgo gave me https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf

9:46:29 lisp123 Is there a curated list of hall of fame posts from c.l.l (information / discussion)?

9:48:44 flip214 lisp123: Naggum posts are available at https://www.xach.com/naggum/articles/

9:49:26 lisp123 flip214: I was thinking more broadly. I read it from time to time and there's a lot of wisdom in there

9:50:20 _death there are too many.. just get the c.l.l archive and a good news reader (say gnus), sit back, relax, and spend a couple of months getting up to date

9:51:27 lisp123 _death: Good idea. Will see if I can put into my kindle..

9:52:05 _death more reading material here https://github.com/death/gnus-friendly-archives

9:52:47 lisp123 Awesome, thanks!

9:53:14 lisp123 I'm also trying to download all of CMU AI Repository at the moment, will put it on GitHub once done

9:55:14 JeromeLon SBCL: (+ 7 0.78) => 7.7799997

9:55:43 flip214 (+ 7d0 0.78d0)

9:55:52 _death JeromeLon: try (read-from-string "7.78") though

9:56:51 JeromeLon Sorry, I should have been clearer: the next single float was closer to the correct result

9:57:43 JeromeLon PARSE-FLOAT is doing (+ 7.0 0.78) as its last step, loosing accuracy in the addition

9:59:32 JeromeLon 7.7800002 is closer than 7.7799997, but maybe that's how IEEE 745 addition is specified?

10:03:58 pjb JeromeLon: (loop for *read-default-float-format* in '(short-float single-float double-float long-float) collect (+ 7 (read-from-string "0.78"))) #| --> (7.7799997 7.7799997 7.78D0 7.78D0) |#

10:04:12 beach JeromeLon: I see now. Yes, that's the wrong algorithm for PARSE-FLOAT to use.

10:04:40 pjb JeromeLon: if you're into high precision floating point computations, you should put (setf *read-default-float-format* 'double-float) in your rc file.

10:04:46 beach JeromeLon: And it is entirely possible that the addition is rounded correctly while still giving that result.

10:04:53 pjb JeromeLon: or explicitely set it in your programs.

10:05:23 beach pjb: That's not what this is about. It is about a single observation by opcode.

10:07:00 _death https://plaster.tymoon.eu/view/2674#2674

10:12:06 JeromeLon _death: I would claim that :nearest is incorrect.

10:13:19 _death well, first we need to figure out the exact value of single-float 0.78

10:14:55 JeromeLon Also, in C, the float addition result in rounding above: https://onlinegdb.com/pLSxkTMR-o

10:16:29 pjb The point is that 7.0 = 111000000000000000000000e-21 0.78 = 111110001111010111000011e-24 = 000110001111010111000010e-21 and 7.0 + 0.78 = 111110001111010111000010e-21 = 7.7799997

10:17:26 _death JeromeLon: you know the single float is converted to double when passed to printf right?

10:17:45 _death JeromeLon: then %f rounds it back

10:17:54 JeromeLon _death: No, I didn't know, ok this test is wrong.

10:18:37 pjb JeromeLon: in C, floating-point operations can convert to extended float first…

10:21:23 JeromeLon pjb: "000110001111010111000010e-21" this looks truncated instead of rounded. 000110001111010111000011e-21 is closer

10:21:39 _death JeromeLon: closer to what?

10:21:48 JeromeLon to 0.78

10:22:11 _death JeromeLon: but you didn't give it 0.78, which is not representable as a floating point value

10:23:34 JeromeLon _death: 0.78 can be represented as either .07799997 or 0.07800002, I believe the debate is about which one is more correct.

10:24:12 JeromeLon Sorry, I meant 0.7799997 or 0.7800002

10:24:15 _death so, let's first agree that the addition is irrelevant

10:24:43 _death (in this case)

10:25:06 JeromeLon I disagree, but it's because I did not express myself clearly

10:25:54 _death (= (+ 7.0 0.7799997) (+ 7.0 0.7800002)) ==> NIL

10:25:54 JeromeLon 0.78 can only be represented as 0.7799997 or 0.7800002 when we are missing 3 digits in single float.

10:27:55 robin_ ** NICK robin

10:28:55 _death would it be fair to say that you want to ask why 0.78 gets converted to 0.7799997 and not 0.7800002

10:31:20 JeromeLon _death: no (because it's not converted to that). I'll do a nice diagram and post back.

10:34:10 JeromeLon OMG it's a tie!

10:40:33 JeromeLon https://pastebin.com/TmrN9krC

10:42:14 JeromeLon Legent: when summing 7 0.779999971389770507812500 (which is the single-float representation of 0.78), the 2 possible roundings (7.779999732971191406250 and 7.780000209808349609375) have the same distance to non-rounded value.

10:42:18 JeromeLon *legend

10:43:03 _death ok, and the first is even

10:45:37 _death I didn't check the actual float value 0.78 gets converted to; only took the values discussed

10:49:50 JeromeLon so the conclusion is that PARSE-FLOAT doing a sum (integer part + decimal part) can introduce an error when the rounding of the decimal part changes in turn the rounding of the sum.

10:54:35 _death correct

10:55:03 _death (loop for x in (list (parse-float:parse-float "0.78") (+ 7.0 0.78) 0.78) collect (decimals:format-decimal-number x :round-magnitude -24))

10:56:11 _death I mean 7.78 as the last element

10:57:10 _death (and what's passed to parse-float..)

12:00:21 rotateq this is why i like symbolic computing and exact arithmetic :)

12:07:18 Odin-FOO ** NICK Odin-

12:20:49 lisp123 Actually on that, can I compare two uninterned symbols for equality via (equal (symbol-name #:me) (symbol-name #:you))?

12:21:03 lisp123 I presume EQL doesn't work for them?

12:21:15 lisp123 (sorry should be #:me and #:me to give T in the above example)

12:23:53 beach The names are strings, so you can use whatever comparison on strings that works for you.

12:25:14 Xach They are also string designators, so you can use string= directly also.

12:25:27 pjb lisp123: (string= '#:me 'me) #| --> t |#

12:31:51 lisp123 pjb: Thanks

12:32:04 lisp123 beach & Xach: thanks too

13:01:48 saltrocklamp[m] <_death> "salt rock lamp: it's not bummer,..." <- that was quick! you found the issue?

13:02:49 saltrocklamp[m] i see, very small fix: https://github.com/sharplispers/parse-number/pull/10

13:03:59 saltrocklamp[m] has anyone here used `check-it` before https://github.com/DalekBaldwin/check-it? something like `parse-number` would be a great place for property-based testing

13:54:50 Nilby just add a few more to the 1d308 problems floating point has caused

15:28:16 saltrocklamp[m] https://bpa.st/LRTQ does anyone see an obvious reason why the lisp version of my code (with sbcl) is not only ~5x slower than the python version of my code, but also isn't giving the right answers? the correct output numbers should be something like `249.9008` and `288.6628`

15:28:41 saltrocklamp[m] any recommendations for a profiler would be appreciated too

15:30:32 Bike sbcl has two built in profilers http://sbcl.org/manual/#Profiling

15:32:03 Bike if i had to guess, slow points could be parse-number and read-line

15:32:30 Bike the latter of which you could deal with by reusing a preallocating string instead of allocating a new one each time read-line is called; maybe python is smart enough to do that for the "for line in" construct

15:37:25 Krystof I think python defaults to double float; SBCL definitely defaults to single-float. That might be enough to explain the different answers

15:39:24 Krystof well, actually: your wrapper around parse-number returns (values 0.0 nil) on a parse failure, but your check tests for the primary value being null

15:39:39 saltrocklamp[m] oop, that was from an old version

15:39:41 Krystof so you don't handle invalid lines correctly in your Lisp version

15:39:48 saltrocklamp[m] yeah let me try fixing

15:40:37 saltrocklamp[m] that said, i had a previous version of this that used READ + TYPEP to "parse" floats (returning NIL if it didn't actually read a float), and i think the answer was wrong there too, even when i set *READ-DEFAULT-FLOAT-FORMAT* to double-float

15:43:12 saltrocklamp[m] Bike: i'll try `sb-sprof`, that looks nice and easy

15:43:46 saltrocklamp[m] and i will try re-using the string, i assume you mean i should `setq`/`setf` it instead of using a step-form in `do`?

16:27:07 saltrocklamp[m] urgh, i ended up having to rewrite this with `prog`

16:27:32 Bike by reusing the string i was thinking more like read-sequence

16:27:48 pjb saltrocklamp[m]: I'd use loop instead: https://termbin.com/4snu

16:30:13 saltrocklamp[m] that's much nicer pjb , i was wondering if there was a tidy loop version. Bike , wouldn't that cause problems if the current line is shorter than the previous line?

16:30:57 Bike getting it right is more involved, yeah

16:34:24 saltrocklamp[m] this is a great demo of advanced `loop`ing. setting `*read-default-float-format*` did fix the accuracy, but it's still ~7.5 seconds while the python version is ~1.5

16:35:15 saltrocklamp[m] hard to tell which calls are slow, as opposed to just frequent. let me try the deterministic profile

16:40:13 saltrocklamp[m] yep it looks like `read` is really the culprit, 0.000002 seconds per call at 2909618 calls, that's ~5.8 seconds spent on just `read`ing

16:46:45 saltrocklamp[m] i'm open to suggestions for how to fix this.. i admit i'm disappointed, i was expecting lisp to at least be comparable to python

16:52:18 Bike how's parse-number compared to read? i'd expect parse-number to be faster

17:13:44 saltrocklamp[m] not appreciably faster in some of the tests i ran, but i can try again

17:14:01 Bike hm. well that sucks.

17:14:08 saltrocklamp[m] i'm not sure if it's possible to write the `loop` version using it

17:16:58 saltrocklamp[m] weird that there's `parse-integer` in the standard but not `parse-float` - i saw some discussion about it above, maybe sicl will turn out to be the fast implementation :)

17:17:47 Krystof the more generic your parsing thing, the slower it is likely to be

17:18:11 Krystof I'd start by trying to use parse-float, though I don't know how optimized it is

17:18:41 Krystof read is a full parser; parse-number is presumably a limited tokenizer; parse-float will be even more limited

17:19:13 Krystof I would also be a bit suspicious of the deterministic profiler; the overhead is substantial and subtracting the overhead off is not necessarily 100% correct

17:25:10 saltrocklamp[m] does sbcl have `parse-float`?

17:25:42 saltrocklamp[m] i thought only lispworks had it

17:25:49 Catie It's a library, loadable through quicklisp

17:25:59 pjb saltrocklamp[m]: now, reading in lisp involves decoding octet sequences from files, into text, sequences of characters.

17:26:15 saltrocklamp[m] oh, the library. i was wondering if it would be faster than `parse-number`

17:26:19 saltrocklamp[m] i can try it

17:26:36 pjb saltrocklamp[m]: in C, and python does like C, one only processes the octets, and almost never decode them into actual characters.

17:26:49 pjb saltrocklamp[m]: so called, "utf-8 octet sequences"…

17:27:04 pjb saltrocklamp[m]: that's where a lot of time (and memory) is spent when reading in CL.

17:27:26 pjb saltrocklamp[m]: if you want to attain the same I/O performance, you must read octets in CL as well.

17:28:41 saltrocklamp[m] python 3 strings are sequences of unicode code points, by default the input encoding is utf-8. so it's definitely doing full string parsing in my example (although i could probably make the python version faster by dropping down to use raw bytes)

17:29:23 saltrocklamp[m] and in fact i think the internal storage is utf-16 or something like that, so not only is it parsing utf-8 but it's also converting it to another format

17:29:41 saltrocklamp[m] however it is written in c, and i'm sure it's been heavily optimized

17:29:57 saltrocklamp[m] not sure if that's what you meant, or something else?

17:30:13 saltrocklamp[m] (in python 2, strings were raw octet/byte sequences)

17:31:42 saltrocklamp[m] i'm definitely open to suggestions though, maybe this is a missing piece in the library ecosystem

17:33:06 pjb saltrocklamp[m]: https://termbin.com/r555

17:34:13 saltrocklamp[m] wow, you just wrote all that?

17:34:14 pjb this is an intermediate solution: we still convert to string, but assuming pure ascii input (this could also be done with :external-format :us-ascii, but it is highly implementation dependent whether it's possible to set the external format of *standard-input*.

17:34:25 pjb saltrocklamp[m]: no, copy-and-paste from my libraries.

17:35:06 pjb we could avoid converting to characters (which in sbcl take 32-bit each), by processing the octets directly. The float parsing function would have to be changed to use vectors of octets instead of strings.

17:35:18 saltrocklamp[m] is this slurping the entire thing into memory? i forgot to mention this earlier, but one of the other requirements was to assume that the data is "huge" to the point where it can't be reasonably loaded all at once

17:35:20 pjb eg. testing for 48 instead of #\0 etc.

17:35:47 saltrocklamp[m] i see, hm. i wonder if that's what `parse-number` is doing

17:36:00 pjb saltrocklamp[m]: then yes, looping on reading a buffer with read-sequence, and processing the octets instead of converting to string would be best.

17:36:28 saltrocklamp[m] i am curious what python, lua, and nim are doing that make this so much more efficient than whatever sbcl and ccl are doing

17:36:58 pjb saltrocklamp[m]: you can use https://github.com/informatimago/lisp/blob/master/common-lisp/cesarum/ascii.lisp#L382 to help processing octets of ascii codes.

17:37:41 pjb Notably, if you want to split the lines on newlines: https://github.com/informatimago/lisp/blob/master/common-lisp/cesarum/ascii.lisp#L584

17:38:36 pjb saltrocklamp[m]: I told you: they process octets, instead of characters.

17:39:56 saltrocklamp[m] i think i was confused before. you are talking about the number parsing part?

17:40:00 pjb which, for a file that contains mostly 10, 43, 45, 46, and 48-57, let you avoid a lot of processing…

17:41:49 saltrocklamp[m] `for line in sys.stdin` in python iterates over true unicode strings, not byte sequences. but it would make sense if e.g. `float(s)` operated on the underlying bytes of the string `s`

17:43:16 saltrocklamp[m] your `contents-from-stream` function seems to implement Bike's suggestion to re-use the string

17:47:40 pjb AFAIK, python keeps the string as a utf-8 sequence.

17:53:30 saltrocklamp[m] it's not utf-8 internally in cpython at least, they use some wider encoding in order to do string lookups in constant time

17:54:22 saltrocklamp[m] i'll have to spend some time reading these code snippets, my understanding of how "streams" work in list is hazy still

17:55:10 saltrocklamp[m] i do wonder about the memory allocation, i am trying to look at the generated c code from the nim version to see if that's what nim does

18:00:08 saltrocklamp[m] found it, but holy moly that's a complicated c program