freenode/#lisp - IRC Chatlog
Search
7:11:55
fiddlerwoaroof
Is there a generic way to get a value for CONCATENATE's first argument that matches the type of the first of the sequences you're concatenating?
7:14:14
fiddlerwoaroof
No, the issue is that the type it gives is too specialized: it includes the length of seq1
7:15:19
fiddlerwoaroof
in the case of vectors, it would have to be (vector (+ (length seq1) (length seq2)...))
7:16:47
aeth
Looks like the perfect use for this library: https://github.com/markcox80/specialization-store/
7:20:53
aeth
(defspecialization foo ((v1 simple-vector) (v2 simple-vector)) (concatenate 'simple-vector v1 v2))
7:21:12
aeth
Well, you probably want that inlined because it's trivial and unchanging and the inline will probably be more efficient.
7:21:32
fiddlerwoaroof
speed doesn't matter too much, and there's a reasonable mapping from class -> result-type
7:21:57
aeth
fiddlerwoaroof: but CLOS generics do not dispatch on types and the classes aren't (afaik) portable, so that's not useful for numbers or sequences, hence specialization-store
7:22:33
aeth
Actually, just noticed, in my foo example, you could probably macroify it to generate all the functions
7:23:04
fiddlerwoaroof
e.g. LIST is a class: http://www.lispworks.com/documentation/HyperSpec/Body/t_list.htm#list
7:23:25
fiddlerwoaroof
As is VECTOR: http://www.lispworks.com/documentation/HyperSpec/Body/t_seq.htm#sequence
7:24:26
aeth
If they have to match in type (but you'd have to generate a name instead of foo): `(defspecialization (foo :inline t) ((sequence-1 ,sequence-type) (sequence-2 ,sequence-type)) (concatenate ,sequence-type sequence-1 sequence-2)))
7:24:43
fiddlerwoaroof
But, it makes more sense to me, just to have a simple function that goes from thing -> symbol
7:26:31
aeth
like (simple-vector 3) -> (simple-vector *) and (simple-array single-float (3)) -> (simple-array single-float (*))
7:27:34
fiddlerwoaroof
Yeah, on sbcl there is the added wrinkle that it's possible for something besides LIST and VECTOR to extend SEQUENCE
7:28:43
aeth
lazy sequences, immutable sequences, very niche sequences like perhaps ordered hash tables or whatever.
7:38:15
aeth
I think for 1D simple-arrays it's always going to be either the cadr (simple-vector) or the caaddr (everything else) so that's just three cases: list, simple-vector, simple-array
7:39:42
aeth
And then I think it's the caddr for the other kinds of vectors (vector whatever-type size)
7:42:57
fiddlerwoaroof
I probably should put this into my utility library, because it's something I've often wanted
7:46:19
aeth
Something like this: (let ((seq (make-array 2 :element-type 'character :adjustable t))) (etypecase seq (list 'list) (simple-bit-vector '(simple-bit-vector *)) (bit-vector '(bit-vector *)) (simple-vector '(simple-vector *)) (simple-array `(simple-array ,(cadr (type-of seq)) (*))) (vector `(vector ,(cadr (type-of seq)) *))))
7:47:28
aeth
the cadr will get the type for simple-array and (not simple) vector, with vector having to come last because of all of the special vectors
7:56:53
aeth
(type-of "hello") => (simple-array character (5)) ; in SBCL with *print-case* as :downcase
7:57:09
fiddlerwoaroof
So, I'm going to go the prescriptive way: you do a typecase and put the most specific types first
7:59:34
fiddlerwoaroof
You could even support extensible sequences by falling back to a generic function call
8:02:02
aeth
Although I think the macro on top of specialization-store implementation quickly becomes more concise than the typecase approach
8:16:17
aeth
well, someone finally noticed a few months ago that almost every post to paste.lisp.org was spam because the captcha was ridiculously weak, like 012354 in a monospace font you can probably parse automatically.
8:16:57
aeth
Pretty much every no-registration thing is full of spam, and a lot of registration things, too. e.g. just about any mediawiki has to deal with tons of spam accounts spamming things
8:17:38
aeth
what I think it should do is require bot-based authentication since the main use is for IRC
8:18:04
aeth
so /msg some-bot please and it'll give me a URL that creates a session, no registration required, but if I spam they'll trace it to my IRC nick. And it's too complicated for general spam bots to bother with
8:19:00
aeth
I think that's simple enough that I could write it in a weekend or so, depending on what system the site uses
10:01:37
francogrex
https://codepaste.net/56qjnv in that snipper, on line 6 when I (cffi:foreign-free buf) which I suppose I should, the results are not maintained anymore, when I remove that, I have what I need. but
13:41:53
Shinmera
The only guess I have is if you didn't use in-readtable somewhere and rather setf the readtable. But if I remember correctly ASDF should take care of even that.
13:42:58
phoe
Also, (asdf:operate 'asdf:load-op :lparallel :force t) succeeds, and afterwards my system loads properly.
13:43:26
phoe
I am now tempted to open up a very clean Lisp environment, try loading Qtools, and then LPARALLEL.
13:51:11
mfiano
Xach_: I'm still waiting for him to get to this https://github.com/joaotavora/sly/issues/135
17:31:57
osune
Hoi, I want to dispatch by strings of the form "/cmd foo bar" where "cmd" can be any string. Currently I dispatch via cond and predicates returned by a cl-pprce match. Is there a better solution? E.g. In Python I would probably use a dict([('/cmd', cmd_handler_fn)]) to match and dispatch. The cond solution makes me uneasy because, if understood correctly, chls says it evaluates each test-form until one returns true. While the Python
17:31:57
osune
solution would be just a hash lookup. I'm sure I could do the same in CL with a hash table using equal as test. But I lack the experience to tell what the better style would be in CL or if I'm missing something entirely.
17:33:58
warweasle
osune: The has would work fine. You could extend the repl (meta-circular evaluator). Or you can use a stack to collect arguements and then evaluate when you have them all.
17:34:33
jmercouris
osune: Forget my first message, I understood what you were trying to do but express it improperly
17:34:59
jmercouris
What is wrong with me today, "Forget my first message, I understood what you were trying to do but expressed it improperly"
17:36:08
osune
jmercouris: I think you are assuming a more complicated problem. Think more of an IRC bot. Sorry, english isn't my native language either; so you meant I worded it improperly ?
17:37:21
tfb
osune: if you have a finite number of commands you know at compile-time (and if you don't want to do some incremental-reading thing) then interning them & CASE / ECASE might be enough
17:39:43
osune
tfb: It's finite. But I currently I don't know how many. So I would prefer a solution where I can in a central "place" easily add/remove dispatchers I guess. I'll take a look at the CASE variant.
17:39:52
warweasle
tfb: There's an old article which compares lists with hash tables. Lists are faster for around 8-12 entries, after which hashes are faster for lookups.
17:41:13
tfb
warweasle: Somewhere I have some code which does this benchmark for the implementation at compile time and then changes representation dynamically...
17:41:39
rpg
warweasle: I have heard rumors of implementations natively doing the optimization that tfb refers to.
17:43:02
tfb
warweasle: well, I think the main cool thing is that in Lisp you *can* do tricks like this, because you always have the language to play with
17:43:52
jmercouris
osune: I think you worded it just fine, english is one of my first languages, that's not why I'm having these issues :D
17:46:06
rpg
osune: many lisp implementations offer extensions to the standard for hash-tables that contain strings only.
17:48:29
osune
rpg: is this an "out-of-the-box" thing, if I use (declare ...) forms? Or something I have a look at the sbcl manual?
17:49:43
rpg
Bike: My impression is that EQUAL hash tables can be inefficient, because you can't exploit features of the data items in the hash function (or if you do, you end up needing some type checking at run time)
17:50:16
rpg
But yes, EQUAL is what you want for portable code -- but note that this is case-sensitive matching. If you want to match FoO and foo, you need EQUALP
17:50:26
Shinmera
Surely dispatching a command from a user facing side does not need to be terribly efficient in any case
17:51:12
Bike
though i'm not sure how much more efficient a more specific string= test would be anyway.
17:52:26
tfb
rpg: well, really, the test is how fast you can search compared to how fast your users can type
17:52:27
Shinmera
Now here's a question about data structures that actually requires performance: if you have a set of strings and would like to do fast fuzzy matching (find all strings for which a given string is a substring), what kind of data structure would be optimal for this?
17:53:23
rpg
Shinmera: I think it depends a lot on the nature of the strings. This problem arises in DNA/RNA sequencing, and I think there's an extensive literature on that. But those are long strings.
17:54:55
warweasle
Shinmera: If there is a "strcmp" function which returns (-1, 0, 1) then you could create your own tree from sexps.
17:56:40
Shinmera
warweasle: I don't know, I'm just asking. Given a set of strings, what data structure would be optimal to retrieve the subset that contains a certain other string.
17:57:47
Shinmera
osune: If your command list is an alist, you would do (funcall (cdr (assoc command command-list :test #'string-equal)) ..)
18:00:21
Shinmera
So you would compute a suffix-trie for each entry in the set and then just scan through them all?
18:01:17
Shinmera
I'm not sure if that would actually make it fast at all though, as the strings each are relatively short.
18:02:03
m00natic
@Shinmera, you'd use a single suffix tree (also look at the radix tree variant) and put all prespecified words in it - then use a single search
18:08:39
sjl
Shinmera: https://github.com/jhawthorn/fzy/blob/master/ALGORITHM.md might also be interesting
18:08:58
sjl
though it sounds like in your case you can do some precomputation on the data to be searched
18:09:27
Shinmera
Right. The set of symbols should not change frequently, and typically only ever grows.
18:15:18
Shinmera
While I'm asking off-topic datastructure questions: any suggestions for edit-distance algorithms for natural language? Levenshtein distance is awful for that, since by its metrics "pear" is closer to "apple" than "apple trees" even though you'd highly likely want the latter to score better.
18:15:25
sjl
some of these fuzzy finders cache results too though, so they may still have helpful things for you
18:16:53
Shinmera
Well, edit distance would be the wrong term for what I want, I suppose. More like similarity scoring.
18:21:34
pjb
Shinmera: sounds like the vector representation of words used in machine learning algorithms would be what you want.
18:22:34
rpg
Shinmera: I think there *are* such algorithms, but it's a tricky problem because, for example, you want to know about pronunciation (so that "too", "to" and "two" are similar). Maybe google for something in the information retrieval literature....
18:23:18
rpg
If you were going to do a vector representation, you might as well go ahead and train a NN.... But you would need a source of training data for closeness....
18:23:57
Shinmera
Context is commands in a bot with a list of "did you mean" suggestions when you type an unknown command.
18:29:23
Shinmera
Users are likely to: typo, use a synonym, or forget other parts of the full command (substring)
18:30:52
jmercouris
if you are interested, I can explain more, it will require your own implementation though, there does not exist a library for this
18:33:11
pjb
Nice NLP user interfaces should be able to ask questions back to the user to clarify requests where some ambiguity exist.
18:34:34
jmercouris
pjb: Another cool thing is using the context of the words around it for guessing, not just what they typed
18:35:06
pjb
Indeed, you can use the history of the user interactions (including a user model) to clarify things.
18:35:56
beach
pjb: I think that's right, and such a model would include the kind of keyboard layout is used.
18:37:24
jmercouris
it doesn't have to be a very complex or expensive model though, there are very lightweight ones you can embed into a user distributed application
18:37:27
beach
pjb: And a model of me would include the fact that I always invert the h and e in the. In fact, I have a global emacs abbrev to change "teh" to "the"
18:39:22
beach
It is not as though I do t <lesses, is that right, yes it is> e <now this then, maybe> h <ah, not right, lets do a C-t>
18:41:34
beach
It is more like "here is teh phrase I want to type <oh, dear, I inverted the e and the h a few words ago>
18:42:10
beach
So auto-correcting from teh to the saves a lot of time, and a lot of burden on the reader.