freenode/#lisp - IRC Chatlog

7:11:55 fiddlerwoaroof Is there a generic way to get a value for CONCATENATE's first argument that matches the type of the first of the sequences you're concatenating?

7:12:44 shka fiddlerwoaroof: type-of?

7:13:01 fiddlerwoaroof That doesn't work

7:13:29 fiddlerwoaroof e.g. something like (concatenate (type-of seq1) seq1 seq2 seq3)

7:13:30 shka specialized vectors?

7:14:03 shka why it does not work?

7:14:14 fiddlerwoaroof No, the issue is that the type it gives is too specialized: it includes the length of seq1

7:14:25 shka right

7:14:37 shka well, it is though!

7:14:52 shka depends how specialized your type is supposed to be

7:15:19 fiddlerwoaroof in the case of vectors, it would have to be (vector (+ (length seq1) (length seq2)...))

7:15:20 shka if just list or vector, it is easy to do

7:15:33 shka i see

7:15:47 shka you will have to do it manually

7:15:49 fiddlerwoaroof Obviously, I can just do an ad-hoc typecase or something, but that's ugly

7:16:40 fiddlerwoaroof Hmm, I guess I could just make my new method include the result-type

7:16:47 aeth Looks like the perfect use for this library: https://github.com/markcox80/specialization-store/

7:16:49 fiddlerwoaroof And pass it up a level

7:17:12 aeth just have type-based dispatch

7:17:22 aeth no typecase

7:19:02 shka aeth: won't do

7:19:08 aeth shka: why not?

7:19:38 shka because variable length of arguments, and he wants to have size in the type

7:19:48 shka imho it is overcomplicated

7:19:53 aeth no, the problem is that he gets the size in type-of

7:20:05 aeth specialization-store can dispatch on e.g. simple-vector

7:20:53 aeth (defspecialization foo ((v1 simple-vector) (v2 simple-vector)) (concatenate 'simple-vector v1 v2))

7:21:12 aeth Well, you probably want that inlined because it's trivial and unchanging and the inline will probably be more efficient.

7:21:18 fiddlerwoaroof Yeah, if I were going to do dispatch, I'd just use generic functions :)

7:21:24 shka (defun foo (seq &rest more-seq))?

7:21:32 fiddlerwoaroof speed doesn't matter too much, and there's a reasonable mapping from class -> result-type

7:21:57 aeth fiddlerwoaroof: but CLOS generics do not dispatch on types and the classes aren't (afaik) portable, so that's not useful for numbers or sequences, hence specialization-store

7:22:18 fiddlerwoaroof I only really need vector/list/etc.

7:22:23 fiddlerwoaroof And those are portable, afaik

7:22:33 aeth Actually, just noticed, in my foo example, you could probably macroify it to generate all the functions

7:23:04 fiddlerwoaroof e.g. LIST is a class: http://www.lispworks.com/documentation/HyperSpec/Body/t_list.htm#list

7:23:25 fiddlerwoaroof As is VECTOR: http://www.lispworks.com/documentation/HyperSpec/Body/t_seq.htm#sequence

7:24:26 aeth If they have to match in type (but you'd have to generate a name instead of foo): `(defspecialization (foo :inline t) ((sequence-1 ,sequence-type) (sequence-2 ,sequence-type)) (concatenate ,sequence-type sequence-1 sequence-2)))

7:24:36 aeth Surprisingly concise and elegant in specialization-store.

7:24:43 fiddlerwoaroof But, it makes more sense to me, just to have a simple function that goes from thing -> symbol

7:24:44 aeth Then you'd just run that over a list of sequence types

7:25:05 fiddlerwoaroof e.g. (concatenate (sequence-type-like seq1) seq1 seq2 seq3...)

7:25:48 aeth fiddlerwoaroof: all you need to do is replace the length with a wildcard, actually.

7:26:31 aeth like (simple-vector 3) -> (simple-vector *) and (simple-array single-float (3)) -> (simple-array single-float (*))

7:26:55 aeth So you can parse it in type-of

7:27:34 fiddlerwoaroof Yeah, on sbcl there is the added wrinkle that it's possible for something besides LIST and VECTOR to extend SEQUENCE

7:27:38 aeth in the latter case it's the caaddr and in the former case it's just the cadr

7:27:53 aeth well

7:27:55 fiddlerwoaroof Because sbcl implements the extensible sequences proposal

7:28:05 aeth extensible sequences is what should be the case.

7:28:43 aeth lazy sequences, immutable sequences, very niche sequences like perhaps ordered hash tables or whatever.

7:29:02 fiddlerwoaroof Yeah, I have a package of my own that allows for some of this

7:29:27 fiddlerwoaroof e.g. rss feeds where you can map/reduce the entries

7:29:29 fiddlerwoaroof etc.

7:36:18 aeth if you don't need to support custom sequences, though, I wouldn't.

7:38:15 aeth I think for 1D simple-arrays it's always going to be either the cadr (simple-vector) or the caaddr (everything else) so that's just three cases: list, simple-vector, simple-array

7:38:26 aeth (1D simple-array)

7:39:42 aeth And then I think it's the caddr for the other kinds of vectors (vector whatever-type size)

7:39:51 aeth So actually 4 cases

7:42:30 fiddlerwoaroof Cool

7:42:57 fiddlerwoaroof I probably should put this into my utility library, because it's something I've often wanted

7:44:31 aeth whoops, there's also bit vectors

7:46:19 aeth Something like this: (let ((seq (make-array 2 :element-type 'character :adjustable t))) (etypecase seq (list 'list) (simple-bit-vector '(simple-bit-vector *)) (bit-vector '(bit-vector *)) (simple-vector '(simple-vector *)) (simple-array `(simple-array ,(cadr (type-of seq)) (*))) (vector `(vector ,(cadr (type-of seq)) *))))

7:47:28 aeth the cadr will get the type for simple-array and (not simple) vector, with vector having to come last because of all of the special vectors

7:49:12 fiddlerwoaroof there's also simple-string

7:49:19 fiddlerwoaroof and string

7:56:23 aeth Is it non-portable what you get for those?

7:56:46 fiddlerwoaroof Well, there's a specification

7:56:53 aeth (type-of "hello") => (simple-array character (5)) ; in SBCL with *print-case* as :downcase

7:57:09 fiddlerwoaroof So, I'm going to go the prescriptive way: you do a typecase and put the most specific types first

7:57:19 aeth you should probably handle the strings just in case though

7:57:27 fiddlerwoaroof So that the output of the function can be as interesting as possibe

7:59:34 fiddlerwoaroof You could even support extensible sequences by falling back to a generic function call

8:02:02 aeth Although I think the macro on top of specialization-store implementation quickly becomes more concise than the typecase approach

8:13:10 fiddlerwoaroof ugh, paste.lisp.org is gone?

8:15:00 aeth unfortunately, yes

8:15:22 aeth github gists or gitlab snippets are probably the way to go for now

8:15:24 fiddlerwoaroof What happened?

8:16:17 aeth well, someone finally noticed a few months ago that almost every post to paste.lisp.org was spam because the captcha was ridiculously weak, like 012354 in a monospace font you can probably parse automatically.

8:16:34 aeth It was like that for a while.

8:16:52 fiddlerwoaroof ugh

8:16:57 aeth Pretty much every no-registration thing is full of spam, and a lot of registration things, too. e.g. just about any mediawiki has to deal with tons of spam accounts spamming things

8:17:19 fiddlerwoaroof I wonder how much work it would be to plug in a decent captcha

8:17:38 aeth what I think it should do is require bot-based authentication since the main use is for IRC

8:18:04 aeth so /msg some-bot please and it'll give me a URL that creates a session, no registration required, but if I spam they'll trace it to my IRC nick. And it's too complicated for general spam bots to bother with

8:19:00 aeth I think that's simple enough that I could write it in a weekend or so, depending on what system the site uses

8:23:24 aeth e.g. a link to a page called (format t "~9,'0x~%" (random #x1000000000)) or something

8:58:31 knobo1 ** NICK knobo

9:29:10 Shinmera In the meantime there's http://plaster.tymoon.eu for pasting.

9:29:30 Shinmera Beside the brazillion other paste services of course

10:00:37 francogrex cffi question (also some sqlite)

10:01:37 francogrex https://codepaste.net/56qjnv in that snipper, on line 6 when I (cffi:foreign-free buf) which I suppose I should, the results are not maintained anymore, when I remove that, I have what I need. but

10:01:45 francogrex I am afraid of memory leaks

10:01:48 francogrex how come?

10:05:25 flip214 francogrex: you do know about clsql-sqlite3, right?

10:16:04 francogrex flip214: yes but it doesn't have the capacity to create_function

10:16:14 francogrex it should be done in C

10:16:55 francogrex hence using ffis

10:20:59 francogrex sqliteFree() also exists...

11:06:26 pjb` ** NICK pjb

13:39:59 phoe https://pastebin.com/vkcHLRm3

13:40:12 phoe Can someone tell me why Qtools's readtable leaks into LPARALLEL?

13:41:53 Shinmera The only guess I have is if you didn't use in-readtable somewhere and rather setf the readtable. But if I remember correctly ASDF should take care of even that.

13:42:24 phoe No SETF *READTABLE* in my code.

13:42:58 phoe Also, (asdf:operate 'asdf:load-op :lparallel :force t) succeeds, and afterwards my system loads properly.

13:43:26 phoe I am now tempted to open up a very clean Lisp environment, try loading Qtools, and then LPARALLEL.

13:46:54 Shinmera What's stopping you

13:47:06 phoe I'm working.

13:47:14 phoe :(

13:49:36 Xach_ new asdf broke sly, but sly is already fixed, phew.

13:51:11 mfiano Xach_: I'm still waiting for him to get to this https://github.com/joaotavora/sly/issues/135

13:52:26 Xach_ interesting.

15:11:09 papachan hi, is there something like tensorflow in CL ?

15:28:33 Xach_ papachan: what is tensorflow?

15:28:59 Fade it's google's machine learning library

15:34:04 Xach_ mgl is a machine learning library for lisp

15:34:23 papachan Xach_ ah ok will get a look at it. thanks

17:31:57 osune Hoi, I want to dispatch by strings of the form "/cmd foo bar" where "cmd" can be any string. Currently I dispatch via cond and predicates returned by a cl-pprce match. Is there a better solution? E.g. In Python I would probably use a dict([('/cmd', cmd_handler_fn)]) to match and dispatch. The cond solution makes me uneasy because, if understood correctly, chls says it evaluates each test-form until one returns true. While the Python

17:31:57 osune solution would be just a hash lookup. I'm sure I could do the same in CL with a hash table using equal as test. But I lack the experience to tell what the better style would be in CL or if I'm missing something entirely.

17:33:30 beach A hash table sounds fine in CL as well.

17:33:32 jmercouris osune: You are trying to take a prefix to a string and use that as a funcall?

17:33:45 jmercouris osune: As beach said, you could also use a hash table

17:33:58 warweasle osune: The has would work fine. You could extend the repl (meta-circular evaluator). Or you can use a stack to collect arguements and then evaluate when you have them all.

17:34:33 jmercouris osune: Forget my first message, I understood what you were trying to do but express it improperly

17:34:59 jmercouris What is wrong with me today, "Forget my first message, I understood what you were trying to do but expressed it improperly"

17:35:22 jmercouris Speaking many languages is a double edged sword it seems

17:36:08 osune jmercouris: I think you are assuming a more complicated problem. Think more of an IRC bot. Sorry, english isn't my native language either; so you meant I worded it improperly ?

17:36:10 warweasle 10 PRINT CHR$(205.5+RND(1)); : GOTO 10

17:37:02 osune beach: jmercouris: thanks. I'll go with the hash table then.

17:37:21 tfb osune: if you have a finite number of commands you know at compile-time (and if you don't want to do some incremental-reading thing) then interning them & CASE / ECASE might be enough

17:37:55 tfb osune: CASE can be fast (can be implemented with a hash table, say)

17:39:43 osune tfb: It's finite. But I currently I don't know how many. So I would prefer a solution where I can in a central "place" easily add/remove dispatchers I guess. I'll take a look at the CASE variant.

17:39:52 warweasle tfb: There's an old article which compares lists with hash tables. Lists are faster for around 8-12 entries, after which hashes are faster for lookups.

17:41:13 tfb warweasle: Somewhere I have some code which does this benchmark for the implementation at compile time and then changes representation dynamically...

17:41:39 rpg warweasle: I have heard rumors of implementations natively doing the optimization that tfb refers to.

17:42:15 warweasle There are so many cool things hidden on people's hard drives.

17:42:24 tfb rpg: pretty sure Genera did that (it made noises which sounded like it did anyway)

17:43:02 tfb warweasle: well, I think the main cool thing is that in Lisp you *can* do tricks like this, because you always have the language to play with

17:43:52 jmercouris osune: I think you worded it just fine, english is one of my first languages, that's not why I'm having these issues :D

17:45:27 rpg tfb: I have a vague notion that Allegro does this, but could be a faulty memory.

17:46:06 rpg osune: many lisp implementations offer extensions to the standard for hash-tables that contain strings only.

17:48:29 osune rpg: is this an "out-of-the-box" thing, if I use (declare ...) forms? Or something I have a look at the sbcl manual?

17:48:53 Bike an equal test should be fine, no?

17:49:05 tfb rpg: I wouldn't be surprised if they did

17:49:43 rpg Bike: My impression is that EQUAL hash tables can be inefficient, because you can't exploit features of the data items in the hash function (or if you do, you end up needing some type checking at run time)

17:50:16 rpg But yes, EQUAL is what you want for portable code -- but note that this is case-sensitive matching. If you want to match FoO and foo, you need EQUALP

17:50:26 Shinmera Surely dispatching a command from a user facing side does not need to be terribly efficient in any case

17:50:37 Shinmera Even a list would be fine.

17:50:58 Bike that's what i was thinking.

17:51:02 Shinmera Beware the optimisation creep.

17:51:11 rpg true

17:51:12 Bike though i'm not sure how much more efficient a more specific string= test would be anyway.

17:51:25 Bike saves you a type check, i guess. hash function would probably be about the same.

17:51:35 rpg depends a lot on how many table entries you have.

17:52:26 tfb rpg: well, really, the test is how fast you can search compared to how fast your users can type

17:52:27 Shinmera Now here's a question about data structures that actually requires performance: if you have a set of strings and would like to do fast fuzzy matching (find all strings for which a given string is a substring), what kind of data structure would be optimal for this?

17:52:52 osune rpg: no fuzzy command-strings in MY interface! :D

17:53:23 rpg Shinmera: I think it depends a lot on the nature of the strings. This problem arises in DNA/RNA sequencing, and I think there's an extensive literature on that. But those are long strings.

17:53:42 Shinmera My context is an editor where you suggest completion for symbols.

17:53:51 Shinmera So they're shorter strings, but lots of them.

17:54:55 warweasle Shinmera: If there is a "strcmp" function which returns (-1, 0, 1) then you could create your own tree from sexps.

17:55:22 Shinmera warweasle: The string to match is not known ahead of time

17:55:29 Shinmera Otherwise the problem would be trivial.

17:55:41 Bike bit vectors, like agrep. fo sho.......

17:56:05 osune Shinmera: would I dispatch on the list then with (funcall (find ...)) ?

17:56:10 warweasle Shinmera: Oh, are we into state machines then?

17:56:40 Shinmera warweasle: I don't know, I'm just asking. Given a set of strings, what data structure would be optimal to retrieve the subset that contains a certain other string.

17:56:44 m00natic suffix tries

17:56:57 Shinmera *the subset of strings that are a superstring of another string

17:57:16 m00natic https://en.wikipedia.org/wiki/Suffix_trie

17:57:28 warweasle SMUG?

17:57:47 Shinmera osune: If your command list is an alist, you would do (funcall (cdr (assoc command command-list :test #'string-equal)) ..)

17:58:29 osune Shinmera: thanks for the extra bit of information!

17:59:25 Shinmera m00natic: Oh, nice. I'll take a look, thanks for the hint

18:00:21 Shinmera So you would compute a suffix-trie for each entry in the set and then just scan through them all?

18:01:17 Shinmera I'm not sure if that would actually make it fast at all though, as the strings each are relatively short.

18:02:03 m00natic @Shinmera, you'd use a single suffix tree (also look at the radix tree variant) and put all prespecified words in it - then use a single search

18:02:43 Shinmera I see.

18:02:48 Shinmera I'll look into this some more.

18:02:49 m00natic and the scan is linear on the searched word

18:03:36 m00natic (or it could be made such with proper implementation)

18:08:39 sjl Shinmera: https://github.com/jhawthorn/fzy/blob/master/ALGORITHM.md might also be interesting

18:08:58 sjl though it sounds like in your case you can do some precomputation on the data to be searched

18:09:27 Shinmera Right. The set of symbols should not change frequently, and typically only ever grows.

18:09:32 Shinmera So precomputation should be possible.

18:13:07 SaganMan ** NICK SaganFestivus

18:15:18 Shinmera While I'm asking off-topic datastructure questions: any suggestions for edit-distance algorithms for natural language? Levenshtein distance is awful for that, since by its metrics "pear" is closer to "apple" than "apple trees" even though you'd highly likely want the latter to score better.

18:15:25 sjl some of these fuzzy finders cache results too though, so they may still have helpful things for you

18:16:53 Shinmera Well, edit distance would be the wrong term for what I want, I suppose. More like similarity scoring.

18:21:34 pjb Shinmera: sounds like the vector representation of words used in machine learning algorithms would be what you want.

18:22:18 pjb Shinmera: have a look at https://www.tensorflow.org/tutorials/word2vec

18:22:34 rpg Shinmera: I think there *are* such algorithms, but it's a tricky problem because, for example, you want to know about pronunciation (so that "too", "to" and "two" are similar). Maybe google for something in the information retrieval literature....

18:22:43 Shinmera pjb: Thanks

18:23:18 rpg If you were going to do a vector representation, you might as well go ahead and train a NN.... But you would need a source of training data for closeness....

18:23:21 Shinmera rpg: Right, IR should have something about that.

18:23:57 Shinmera Context is commands in a bot with a list of "did you mean" suggestions when you type an unknown command.

18:24:22 Shinmera the commands are strings that are very natural-language-y

18:24:24 rpg Shinmera: Like "fuck" for bash....

18:28:32 jmercouris Shinmera: Similarity scoring? Perhaps you mean like word trees

18:29:01 Shinmera Not really, no

18:29:23 Shinmera Users are likely to: typo, use a synonym, or forget other parts of the full command (substring)

18:29:37 jmercouris Shinmera: Ah, then you need to construct models of the typing paths

18:29:39 Shinmera So ideally it should be able to find "similar" word sequences based on that.

18:29:50 jmercouris Shinmera: markov chains would / could be a good starting point

18:29:59 jmercouris Shinmera: or automata with gates

18:30:52 jmercouris if you are interested, I can explain more, it will require your own implementation though, there does not exist a library for this

18:31:21 Shinmera Sure. Though maybe another time, I need to leave in a bit.

18:31:48 jmercouris ok, freel free to ping me, I have a lot of experience in this domain

18:31:53 Shinmera Great!

18:32:18 _death Shinmera: there are also classic algorithms like soundex and friends

18:33:11 pjb Nice NLP user interfaces should be able to ask questions back to the user to clarify requests where some ambiguity exist.

18:34:34 jmercouris pjb: Another cool thing is using the context of the words around it for guessing, not just what they typed

18:35:06 pjb Indeed, you can use the history of the user interactions (including a user model) to clarify things.

18:35:56 beach pjb: I think that's right, and such a model would include the kind of keyboard layout is used.

18:37:06 jmercouris beach: yes, definitely need a per keyboard training, ideally per user training

18:37:24 jmercouris it doesn't have to be a very complex or expensive model though, there are very lightweight ones you can embed into a user distributed application

18:37:27 beach pjb: And a model of me would include the fact that I always invert the h and e in the. In fact, I have a global emacs abbrev to change "teh" to "the"

18:38:01 jmercouris beach: C-t

18:38:12 beach jmercouris: You don't get it.

18:38:24 beach jmercouris: I see it way too late.

18:38:25 jmercouris teh -> C-t = the

18:38:31 beach You don't get it.

18:38:32 jmercouris Ah, okay lol

18:39:22 beach It is not as though I do t <lesses, is that right, yes it is> e <now this then, maybe> h <ah, not right, lets do a C-t>

18:40:25 beach In fact, given the bad typing of some people here, they could learn from what I do.

18:41:34 beach It is more like "here is teh phrase I want to type <oh, dear, I inverted the e and the h a few words ago>

18:42:08 Shinmera Things would be much easier if people just didn't make mistakes to begin with :^)

18:42:10 beach So auto-correcting from teh to the saves a lot of time, and a lot of burden on the reader.

18:48:10 jackdaniel teh mistakes are unavoidable

18:48:28 beach No, tehy are not.

18:48:29 jackdaniel (and yes, I know I made 2)

18:48:54 cgay I count 3.