freenode/#lisp - IRC Chatlog

14:16:11 Murii what does 'aref' stand for when using with vectors?

14:16:20 Xach Murii: array reference.

14:16:29 Murii but it's a vector :O

14:16:49 Shinmera vectors are arrays.

14:16:50 Xach A foolish consistency is the hobgoblin of little minds.

14:17:07 Murii Shinmera, in a way you can consider it

14:17:20 Shinmera No, it's literally defined that way.

14:17:21 Murii Anyway, how can I get the length of a vector?

14:17:29 Shinmera by using LENGTH

14:17:56 Xach And also with array-dimensions! But that's more oblique.

14:19:13 Murii Shinmera, LENGTH also works on lists right?

14:19:28 Shinmera it works on sequences. lists and vectors are sequences.

14:20:32 jackdaniel fun fact - strings are vectors (hence sequences) too

14:21:10 Murii jackdaniel, well they are in any language

14:21:23 jackdaniel are they?

14:21:26 pjb Murii: nope.

14:21:37 pjb Murii: eg. in scheme, strings are not vectors.

14:22:19 pjb Murii: also, if your language supports unicode, it may not be the best choice to make strings vectors.

14:23:01 jackdaniel technically string is a set of symbols from a specified alphabet, no vector required

14:23:12 jackdaniel tfu, not set

14:23:13 jackdaniel sequence

14:23:18 pjb Since the properties of unicode strings are so strange, vectors of glyfs, vectors of code-points, or vectors of characters are all inconvenient. Some argue that even having characters is inconvenient, when you use unicode.

14:23:38 jackdaniel ("tfu" means in polish "bleh", not some kind of "fu" derivative)

15:03:34 hajovonta pjb: why? a sequence is just a predefined order for the elements

15:03:45 hajovonta and a string is a predefined order for the characters.

15:04:06 Shinmera not... really.

15:04:27 Shinmera things get ~weird~ when you consider unicode.

15:05:04 hajovonta the representation of the characters is a different matter

15:05:12 hajovonta this is my - naive - thought :)

15:06:08 Bike with unicode it's nontrivial to decide what the elements of a string are.

15:07:19 TMA hajovonta: if you restrict yourself to English the intuition is fine. but for more complex languages you soon run full speed into a wall with it

15:08:14 hajovonta why? I can't imagine an example

15:09:48 Shinmera Often times a unicode string will be taken as a vector of code points. Unicode allows you to compose characters through multiple individual code points, where the order does not necessarily matter. Thus, as a representation of a "character string" different vectors represent the same thing, but are no longer identical.

15:10:13 TMA because the properties of sequences no longer hold -- if you concatenate sequences you would probably assume that the resulting length will be the sum of lengths

15:10:39 dlowe docstrings - word wrapped or not?

15:10:59 Shinmera dlowe: I do summary not word wrapped, rest word wrapped.

15:11:13 Shinmera Or rather, summary single line, rest word wrapped.

15:12:01 hajovonta we have several strange characters in our language (Hungarian) like á, é, í, ú, and ö, ő, ü, ű

15:12:35 hajovonta but, when I write "árvíz" and concatenate "tükör" then "árvíztükör" is a sequence of 10 characters

15:12:48 hajovonta is that not always so?

15:12:50 Shinmera No

15:13:16 Shinmera Unicode has code points for just the ticks, so you could write a+tick, as two code points, which would be a string of length 2 in most implementations.

15:13:27 Shinmera even though it's one character.

15:13:39 hajovonta ok, so it's an implementation thing that causes the problems

15:13:44 TMA hajovonta: say that you concatenate 'hajov' and 'onta' in a CV sylabic script ... the first has 3 characters (ha-jo-v) the second has 3 (o-n-ta) ; but the concatenation has 5 not 6 (ha-jo-vo-n-ta)

15:14:17 Shinmera hajovonta: Not really. There's just different ways of looking at the same thing.

15:15:22 hajovonta but it doesn't matter how many code points make up a character. We can just count the characters, can't we?

15:15:58 hajovonta or is it problematic to get to the character count from the code points ?

15:16:31 Shinmera Well for instance looking up a single character in a string becomes O(n)

15:17:58 TMA hajovonta: why yes. sometimes. in other times what constitutes a character is hard to tell. ... IIRC, in Spanish ll is a single character though two codepoints. ch is a single character in Czech

15:18:15 dlowe the notion of a "character" may be antiquated at this point.

15:18:38 hajovonta yes, we also have double-character characters, like gy, ny, ty, dz, ...

15:19:07 hajovonta but we don't have keys for those on Hungarian layout keyboards.

15:19:15 hajovonta we use two keypresses.

15:19:35 TMA hajovonta: so when you see the string "chill", you do not know how many characters you have

15:20:20 TMA hajovonta: it's 5 code points but some indeterminate number of characters

15:20:35 dlowe generally, when you want to know the "length" of the string, you want to know a) how many bytes does it take up or b) how many pixels will it take to render this string, both of which have satisfactory answers.

15:21:28 dlowe A perverse mind might want to know it in order to find the maximum valid index of a character.

15:21:29 TMA counting characters does not help in determining either

15:21:40 hajovonta dlowe: I'm usually not interested in any of the two answers :)

15:22:02 dlowe so why do you need the length of a string then?

15:23:03 hajovonta 15:23:02 - jackdaniel: technically string is a sequence of symbols from a specified alphabet, no vector required

15:23:34 hajovonta I was just asking why can't a string be a sequence of characters

15:24:04 hajovonta based on jackdaniel's definition, a string is a sequence of symbols from a specified alphabet

15:24:08 jackdaniel characters are symbols of "text" alphabet, and that's what usually languages implement

15:24:16 pjb hajovonta: unicode strings are decomposed in glyps, that are defined by a sequence of code-points of variable length.

15:24:29 hajovonta I agree that the number of characters can be different when using different alphabets

15:24:30 dlowe well, in CL a string is exactly that - a requence (vector) of characters

15:24:46 pjb hajovonta: the notion of string as a sequence of character would imply that a character is an object of variable length.

15:25:09 hajovonta pjb: yes, but that's an implementation problem

15:25:26 pjb hajovonta: no implementation implement characters this way, because it makes for very complex objects, compared to the usual single byte for C char, or 32-bit word for unicode code-point.

15:25:29 jackdaniel pjb: code-points of variable length may constitute one alphabet element

15:25:33 hajovonta I think the "sequence of characters" as a theoretical definition is pretty good

15:25:38 pjb code-points are of fixed length.

15:25:51 pjb It's glyphs that are made of a variable number of combining code-points.

15:26:21 jackdaniel (what I meant, that one glyph is element of the alphabet, not its code-points, so it is irrevelant to the definition)

15:26:29 pjb There's also the problem of normalization, such as the various unicode representation of á, or the problem of ligatures.

15:26:50 jackdaniel so I think that was very in point, that it is implementation detail

15:26:58 pjb jackdaniel: ok, you're implementor. I dare you to implement ecl characters as variable-length sequences of code points…

15:27:24 pjb This is probably what I'd do if I was a CL implementor, but I'm not yet.

15:27:36 jackdaniel again - irrevelant to hajovonta's question :(

15:27:39 pjb But I'd make sure to read the unicode standard through, first.

15:28:02 jackdaniel happily they were implemented that way before I took over

15:28:13 pjb So basically, implementations store strings as vectors of code-points instead.

15:28:21 pjb But code-points are not characters!

15:28:45 hajovonta a character is a symbol _in a given alphabet_

15:29:00 pjb hajovonta: by the way, even without going full unicode, just with ascii, you have the distinction between characters or ASCII control codes.

15:29:14 jackdaniel yes, and when you say (nth *string* 18) it won't take 18-th code-point, but 18-th character

15:29:34 pjb hajovonta: usually implementations make strings vectors of characters, with virtual characters corresponding to ascii control codes, which has no meaning.

15:29:55 pjb hajovonta: who said this given alphabet is finite?

15:30:39 jackdaniel right, the only important thing is that the sequence is finite, alphabet may have infinite number of possible symbols

15:30:43 pjb hajovonta: so far, even with unicode, it's finite (and way bigger than 2^21), but just let the user combine the code-points without limiting the number of combinaisons, and you get an infinite number of characters!

15:31:28 pjb hajovonta: in any case, your question is irrelevant: it could indeed (and should IMO) be done that way, but it is not done in practice by implementations!

15:32:09 TMA स्कृ

15:32:10 dlowe I believe there's an O(1) access guarantee for CHAR.

15:33:00 hajovonta pjb: I don't see how it is a problem to have an infinite number of characters...

15:33:15 hajovonta pjb: yes, we agree

15:33:20 dlowe which makes it impossible to both support the notion of a unicode character and the CL spec.

15:34:13 hajovonta dlowe: I think it's possible, but it's impractical. (?)

15:34:43 pjb hajovonta: (length (concatenate 'string "árvíz" "tükör")) -> 14 ; !!!

15:35:28 pjb dlowe: AFAIK, characters can be more complex than a fixnum.

15:35:46 hajovonta CL-USER> (length (concatenate 'string "árvíz" "tükör"))

15:35:46 hajovonta 10

15:35:57 pjb hajovonta: you see the problem.

15:36:05 dlowe pjb: sure, but you'll need some way to access them without decomposing them on the fly.

15:36:19 dlowe you could do it with a seperate decomposing index vector, I guess.

15:36:33 pjb No, don't store code-points, store characters!

15:36:54 pjb CL says vector of character, not vector of code-points or codes…

15:37:17 pjb The problem is more in the lisp reader, that now has to convert and normalize unicode.

15:37:25 hajovonta but a character can be anything, like "djshfkjdhskh". In a hypothetical alphabet, this can be one character.

15:37:37 pjb well, not the reader properly, the external-format handling…

15:37:50 pjb hajovonta: well, unicode has rules.

15:38:11 pjb basically, IIRC, you can have up to ten combining code-points following a non-combining code-points.

15:38:50 pjb So you could consider using bigints to encode them, after normalization.

15:39:32 pjb This would have the advantage, that you could represent most common characters as fixnums.

16:21:29 slondr ()

16:30:45 flip214 pjb: fixnums are awfully large... even the 32bit character on 64bit machines hurts, if you need to store some larger text body

16:32:31 foom Yea, there's really no point in storing text as a vector of characters.

16:33:00 foom That's a bogus representation, only used for historical reasons.

16:35:59 pjb Yes, there's foom's argument.

16:36:10 pjb If you have large anything, you need to consider your own data structures and algorithms.

16:36:28 pjb And indeed, the sequence of character representation of large body of text is not often the best one.

16:36:32 Shinmera You can either be flexible or efficient.

16:36:52 pjb See for example, lisp source code: it's read and not represented as strings, but instead as sexps!

16:37:22 foom The problem is that it's not the best representation for a small body of text either, except where "best" is defined as "works within existing standard".

16:37:29 pjb If you had to read wikipedia, probably you'd start by storing words instead of characters… And they perhaps you'd even try to store relationships infered from the sentences…

16:38:51 foom What you really want is to store the text as utf8, and provide APIs to iterate over encoded bytes, codepoints, grapheme clusters, words, etc.

19:13:36 nowhereman ** NICK Guest94287

20:06:12 Xach Ahh, time to build with the sbcl prerelease and see what breaks!

20:06:44 fe[nl]ix luis: around ?

20:54:38 emaczen Is there a common condition for not being able to find a package?

20:55:00 phoe clhs package-error

20:55:00 specbot http://www.lispworks.com/reference/HyperSpec/Body/e_pkg_er.htm

21:10:01 pierpa Peter Norvig just made a pdf of PAIP available for free. No more excuses for not studying it.

21:12:11 _death awesome

21:13:19 sjl nice, where at?

21:13:20 _death https://github.com/norvig/paip-lisp

21:13:30 pierpa there :)

21:13:52 sjl I already have it from the ACM archive, but really nice that it's free for everyone now

21:14:18 stacksmith Posted 10 minutes ago. That's what I call being 'in the loop'.

21:14:51 pierpa I call it being subscribed to the repository :)

21:15:13 Xach Wowww!

21:15:17 Xach That is fantastico

21:15:25 pierpa He says: "The .txt version has a lot of errors; I got it from the default Save as other / ...Text menu item in Acrobat. An automated tool could rejoin the lines that end in hyphens, and perhaps find missing spaces, as in programmingpractices and anunfortunate. Other errors would require significant human labor to clean up."

21:16:51 Xach pierpa: it never occurred to me to subscribe to the repo; why would it ever change? but that's awesome.

21:17:06 stacksmith ditto.

21:17:12 pierpa yes, I subscribed just in case

21:18:13 _death pdf doesn't look as nice as the book ;).. but good to have it searchable

21:19:31 pierpa maybe it's a defect that can be fixed? bad fonts?

21:19:36 Xach Hmm

21:21:43 sjl looks like it's partially OCR'ed from a scan, and the scanned words replaced with text in some font

21:29:11 pierpa what about setting up a group of volunteers for fixing the .txt? split the file in small chunks, distribute the chunks to volunteers, etc...

21:31:04 Xach That would be pretty cool.

21:33:02 _death good thing it has page feeds.. so you can have one-page-per-day thing

21:43:18 pierpa "Elsevier has reverted the copyright on the book to the author (me, Peter Norvig), so we are now free to do with it what we want. Robert Smith, @tarballs-are-good, is interested in putting in some work towards this end."

21:45:37 Xach Awesome!

21:47:21 rme that is so good

21:49:11 Xach pierpa: Where's that text?

21:49:34 pierpa github issues

21:50:00 Xach Cool.

21:50:31 pierpa https://github.com/norvig/paip-lisp/issues/3

21:52:41 Xach Hmm, it feels like the sbcl prerelease is slower than the previous release.

21:56:39 mishoo_ heh, found that exact PDF years ago (via thepiratebay, iirc). typographical quality isn't great :-/ but the content is gold

21:56:53 scymtym Xach: anything in particular?

21:57:23 Xach scymtym: just a feel. when it finishes i'll have a better idea of whether the feeling is correct.

21:57:51 pierpa sjl: the copy you have from ACM is better quality?

21:58:06 sjl pierpa: yes, let me compare a page

21:58:16 pierpa hmmm

22:00:54 sjl https://imgur.com/a/POWHI

22:01:20 phoe has anyone compared the PDFs available from libgen?

22:01:44 mishoo_ sjl: the left side is evidently better

22:02:10 _death you can "save as text" and have some probabilistic code to merge ;)

22:03:42 sjl the ACM's is a scan of the book. It's OCRed and searchable/copyable, but they didn't actually replace the image of the scan with text like the version in the repo

22:03:52 sjl mishoo_: yeah, the left side is the ACM scan

22:04:12 pierpa yes, clearly better

22:04:15 sjl the ACM version also have a table of contents in the PDF

22:04:16 mishoo_ oh, as in, bitmap... :-/

22:09:15 scymtym Xach: ok, please let us know if anything shows up

22:36:37 Xach I miss boinkmarks.

22:43:47 AeroNotix has anyone ever made a reader macro to emulate erlang's binary pattern matching?

22:54:22 Xach That's palatino, isn't it?

22:57:35 AeroNotix Xach: what's palatino?

23:00:01 pjb a font.

23:00:28 pjb AeroNotix: https://en.wikipedia.org/wiki/Palatino

23:00:55 pjb You should know: https://www.dafont.com/aero2.font

23:05:15 AeroNotix pjb: is that the fast and the furious font?

23:05:17 AeroNotix the aero2 one

23:05:26 AeroNotix palatino is very nice

23:06:49 pjb I'm not sure; there's http://www.allmoviefonts.com

23:07:09 pjb http://www.allmoviefonts.com/?s=furious

23:15:31 AeroNotix thx

23:22:08 jasom if # is a non-terminating macro character why does sbcl print :foo#bar as :|FOO#BAR| and slime higlight :foo#bar as different words?

23:23:28 pjb Because it's not terminating, it doesn't terminate the foo#bar token.

23:24:02 jasom pjb: my question is why sbcl puts spurious || around it and slime highlights it incorrectly. I agree it doesn't terminate the token

23:24:07 pjb If it was terminating, say, like ', then in foo'bar the quote terminates the foo token, and then a further read will read 'bar ( (quote bar) ).

23:24:18 Shinmera Slime highlights a bunch of things incorrectly.

23:24:24 Shinmera So it's just buggy.

23:24:31 pjb jasom: the printing is in part implementation dependent, and in part directed by the *print-…* variables.

23:24:46 pjb check *print-readably* and *print-escape* in particular.

23:25:03 jasom pjb: I'm not saying sbcl is doing something wrong, but the fact that it chooses to escape tokens with # in them make me wonder if I ought to do so in my code

23:26:40 pjb clall -r '(prin1-to-string (read-from-string ":foo#bar"))'

23:26:40 pjb Armed Bear Common Lisp --> ":FOO#BAR"

23:26:40 pjb Clozure Common Lisp --> ":FOO\\#BAR"

23:26:45 pjb CLISP --> ":FOO#BAR"

23:26:45 pjb ECL --> ":|FOO#BAR|"

23:26:45 pjb SBCL --> ":|FOO#BAR|"

23:26:45 pjb

23:26:57 aeth It's probably just being safe because reader macros use #

23:27:05 aeth I guess?

23:27:17 pjb Yes.

23:27:21 Shinmera jasom: 22.1.3.3 seems to imply that it's allowed to do this, even if there's no strict need to.

23:27:29 jasom Shinmera: I agree

23:27:40 pjb You can (setf *print-escape* nil)

23:28:02 pjb well, it doesn't seem to change.

23:28:06 jasom pjb: I know how print-escape works. I was merely expressing a concern that the sbcl devs know something I don't with regard to internal # in symbol names

23:28:53 pjb jasom: theorically, they would have to check in the read table whether a character is a terminating macro character or not. Instead if you systematically escape, you can print faster!

23:28:54 jasom pjb: PRINT and friends bind *print-escape*

23:29:31 pjb right. we have to use write-to-string.

23:30:08 pjb jasom: so I would say it's mostly a speed optimization.

23:30:31 pjb It's worth it, eg. in the case of ccl, since it uses plists for readtables…

23:31:05 jasom pjb: that makes no sense because it has to check all the alphabetic characters which are much more common

23:31:32 pjb well, to be sure, read the source.

23:31:52 jasom perhaps it escapes all macro characters, regardless of position?

23:32:23 jasom that would make sense because # is the only non-terminating macro character in the standard readtable

23:51:14 energizer What's the difference between nil and 'nil?

23:52:27 Shinmera One is the form NIL and one is the form (QUOTE NIL)

23:52:39 sjl The first reads as the symbol nil, the second reads as the list (quote nil). When evaluated they result in the same thing, because nil is special and evaluates to itself.

23:53:34 stacksmith What's interesting here is 'nil evaluates to nil too!

23:54:09 Shinmera But not ''nil.

23:55:00 stacksmith But it looks trickier, because nil is also a symbol - as well as type null.

23:55:23 energizer Shinmera: (eval ''nil) is giving me nil i think

23:55:56 stacksmith (subtypep nil 'symbol) => T

23:56:02 sjl energizer: that's because ''nil is evaluated before it gets passed to eval

23:56:09 sjl and then eval evaluates it AGAIN

23:56:23 sjl (eval (read-from-string "''nil"))

23:58:33 energizer interesting

23:59:56 pjb (defpackage "MY-NULL" (:export "NIL") (:use)) (defconstant my-null:nil 0) (let ((*package* (find-package "MY-NULL"))) (list (eval (read-from-string "'nil")) (eval (read-from-string "nil")))) --> (my-null:nil 0)

0:00:06 pjb energizer: it depends on *package*!

0:00:28 pjb energizer: on the other hand: (let ((*package* (find-package "MY-NULL"))) (list (eval (read-from-string "'()")) (eval (read-from-string "()")))) --> (nil nil)

0:00:28 stacksmith Nil is special: it's kind of like a keyword - a subtype of symbol that evaluates to itself. It is also considered a list with 0 items (listp nil) => t

0:00:45 pjb energizer: but in this case, it depends on *readtable* where, the reader macros for ' and ( are defined.

0:01:07 pjb energizer: you could change those reader macro to read something else than CL:QUOTE and CL:NIL.

0:01:28 energizer ok

0:01:29 stacksmith Nil is a list, but it is not a cons: (consp nil) => nil

0:02:11 energizer Shinmera: what's a form?

0:03:54 stacksmith form n. 1. any object meant to be evaluated. 2. a symbol, a compound form, or a self-evaluating object. 3. (for an operator, as in ``<<operator>> form'') a compound form having that operator as its first element. ``A quote form is a constant form.''

0:05:35 stacksmith energizer: try #clnoobs - it's a better place for basic questions about CL.

0:05:41 energizer thanks

1:16:53 sysfault_ ** NICK sysfaukt

1:16:57 sysfaukt ** NICK sysfault

1:56:39 ``Erik_ ** NICK ``Erik

2:01:04 stylewarning pierpa: id like to find a way to recreate latex source

2:01:41 stylewarning I’m the one who helped get copyright reverted, and it looks like elsevier might have lost the source

2:08:00 pierpa aaaargh!

2:08:18 pierpa but thank you for helping this!

2:08:52 pierpa stylewarning: I suppose you already asked PN?