freenode/#lisp - IRC Chatlog

11:34:29 beach Speaking of data structures, I am collecting different versions of the standard binary search algorithm as published in various text books. So if you have a text book on algorithms that contains a version of binary search, I would like a copy of it (the algorithm, not the book) and the name of the book.

11:35:15 beach The one is Aho, Hopcroft, and Ullman takes twice the time that it should, for instance.

11:35:47 flip214 beach: do you take photos as well?

11:36:59 beach Sure.

11:41:04 shka_ beach: why it takes twice the time?

11:41:08 dim beach: would you be game to review what PostgreSQL is doing in that area? It's C code, but well commented and with references to original papers and algos when they've been implemented from papers

11:41:29 shka_ excessive use of comparsion?

11:43:05 shka_ beach: binary search limited to the simply ordered vectors, or also pointerless trees?

11:44:09 jackdaniel I've read "pointless trees" and I like this name. I call dibs

11:46:10 shka_ how silly

11:46:12 beach shka_: Because it test for equality with the middle element first.

11:46:26 shka_ beach: ok, this is well known

11:46:41 beach Yes.

11:46:53 beach But I am writing a chapter in a book about it.

11:47:15 beach shka_: Just sorted vectors.

11:47:22 shka_ ok

11:47:50 shka_ though this algorithm is probabbly the worst case for the cache possible

11:49:55 beach I also checked Sedgewick & Flajolet. It is wrong too.

11:51:36 shka_ i wonder how wrong version of this algorithm could become so widespread

11:52:07 beach It was done by theoreticians who only care about asymptotic complexity.

11:52:30 beach And then, it has been propagated from professor to student over many generations.

11:52:52 flip214 beach: is that interesting to you too? https://www.pvk.ca/Blog/2012/07/30/binary-search-is-a-pathological-case-for-caches/ I guess you already read that, though.

11:52:59 beach And, as usual, few people stop and think.

11:53:41 beach flip214: Sure, though that's not the subject of the current investigation.

11:54:51 beach Anyway, time for a break.

11:55:50 flip214 beach: sorry, the algorithms book I have doesn't include Binary Search.

11:58:31 beach No problem. Thanks for trying.

12:01:23 flip214 you're welcome!

12:58:56 dim mmm, what are the subtelties of #+pgloader-image and such wrt to compiling code to a binary image?

12:59:51 dim I've been using that to have interactive error handling in the REPL but backtraces when /usr/bin/pgloader is run, and now I realize it doesn't work like I want, or at all

13:07:14 beach dim: Sure, I'll look at it if you point me to the source code. I have programmed in C so I should be able to understand it.

13:08:42 beach dim: Otherwise, it is quite simple. If there are two tests per iterations, the first one testing for equality, then it is wrong.

13:09:19 dim the file would be https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtree.c

13:09:50 dim mmm, nope

13:09:55 beach No, that's binary trees.

13:12:20 dim I though that searching in binary tree would account for binary search, sorry, realizing the mistake now

13:12:30 beach OK.

13:13:27 dim Postgres just uses bsearch() and has its own sorting facilities (with abreviated keys in cases), but I don't think it has its own binary search actually

13:13:47 beach Yeah, it would be surprising now that I think about it.

13:14:30 dim well PostgreSQL is quite a good example of Greenspun's law

13:14:44 beach I can believe that.

13:15:09 dim have a look at this one for the fun of it: https://github.com/postgres/postgres/blob/97c39498e5ca9208d3de5a443a2282923619bf91/src/include/nodes/pg_list.h

13:16:05 dim and yeah it's used *a lot* in the rest of the source code, mainly the parser/optimizer

13:16:29 beach Depressing, really.

13:17:25 dim the history of it makes it a little better, I think: PostgreSQL optimizer is known to have been written in Lisp in the 80s, but then they switched to C like the rest of the system and they did that pg_list.h implementation to help with the porting

13:17:37 dim then it works well enough that it never improved from there, I suppose

13:17:37 beach Oh, and they even included the stuff that Common Lisp had to include for historical purposes, i.e. that the CAR and the CDR of NIL is NIL.

13:18:11 shka_ it was ingres back in the 80s

13:18:46 dim The Postgres project started in 1986 IIRC my readings about it

13:19:22 dim by then it was a university project by Stonebreaker and mainly used to host PhD thesis, up to 1995

13:19:37 dim anyway, Postgres has some lisp history and pg_list.h to show for it

13:19:47 beach I see.

13:21:18 dim ok I'm not sure why yet but it seems that #+pgloader-image isn't defined in the image I'm using, for some reasons

13:22:25 dim I'm doing (push :pgloader-image *features*) in src/hooks.lisp that I don't load with ASDF, but manually in the command line (using --load) when building the image, but maybe that feature is needed at compile time? read time?

13:47:57 dim ok read-time of course, tweaked my Makefile and buildapp usage to make it work

14:31:45 Demosthenex so i was tinkering with reading some machine generated text (https://bpaste.net/show/eb139faf9503) i wanted to throw into a table for searching. i could do this with regexp only, but given i have a variety of outputs to look at i was considering a lexer and parser. i found alexa, but cl-yacc seems to be all over. sound like the right tool? what's the right cl-yacc home?

14:36:47 beach Demosthenex: There are plenty of parser libraries for Common Lisp. One is esrap that is maintained by scymtym. He might know more about the subject.

14:37:39 Demosthenex beach: yeah, i was reading about a few on cliki, but all the links to google code are dead

14:39:46 beach https://cliki.net/parser%20generator

14:40:13 Demosthenex i was on the lexer page ;]

14:40:35 shka_ esrap is good

14:41:36 shka_ Demosthenex: also i find monadic parser combinators to be interesting and easy to use

14:42:20 Demosthenex i haven't used a lexer/parser since cs classes in school years ago. i just thought that given the regularity of my input data that regexp for each one was a waste of time if lex/parse could do it faster.

14:46:16 Lord_of_Life_ ** NICK Lord_of_Life

14:57:26 LdBeth Demosthenex: actually for this kind of text you shown you’d better use awk like tools rather than parser gen

14:58:02 LdBeth Cause strictly it’s table rather than BNF

15:00:33 LdBeth https://github.com/sharplispers/clawk

15:05:05 LdBeth Btw, scsh has a very impressive awk macro

15:07:51 Demosthenex LdBeth: that's an interesting assertion. i generally have used shell tools like awk to pull out that data, i was trying to make it a bit more formal/flexible to load into postgres, and since i was already using postmodern ;]

15:11:03 Demosthenex i think the key difference is i've used awk for line operations instead of longer paragraph records and data extraction.

15:17:42 dim Demosthenex: maybe you can implement ELT rather than ETL

15:18:10 dim Load the data in PostgreSQL then process it in SQL, that's very flexible and powerful, and you have advanced and fast support for data processing

15:19:20 Demosthenex dim: i only ever insert finished data into sql... do you have a link to an example?

15:22:35 dim use COPY to load the data quickly, then process it, I think I have many examples of that on my blog

15:23:35 Demosthenex well, i understood loading the raw text in. i just hadn't considered parsing the text to extract useful information inside sql... i was leaning on lisp for that

15:23:44 Demosthenex you have a link on your blog?

15:25:10 dim tapoueh.org

15:25:19 dim trying to find the most relevant article for you now

15:27:32 dim then my book has many examples of doing that in different ways, but well

15:29:01 Demosthenex i hope in your book you say "using multiline regexps to try and match data is a bad idea" ;]

15:29:16 dim https://tapoueh.org/blog/2013/08/the-most-popular-pub-names/

15:29:21 dim that's a good enough example

15:33:46 dale_ ** NICK dale

15:35:54 Demosthenex hrm. i see the group by a replaced string, but that's not the actual record extraction. you're extracting from xml

15:38:41 dim the idea is that you can do most of your processing directly in SQL

15:38:51 dim I should write a better article about that really

15:39:31 dim I've been doing that a lot, and I though I had the idea covered better on the blog, because I have lots of articles where I though I would explain that, turns out I did the job to prepare the data and that's usually not the topic of the blog post

15:39:57 dim if you want to see something very involved, maybe you'll like https://tapoueh.org/blog/2017/09/on-json-and-sql/

15:48:45 dim anyway, either filter on the fly with CL and use Postmodern support for the COPY streaming protocol (see cl-postgres:db-write-row and friends and an example at https://github.com/dimitri/pubnames/blob/master/pubnames.lisp); or just load a CSV-like input file and then process in SQL, that's the easiest way to do it, mainly because then the input data and their transformations are well defined

15:54:33 Lycurgus why is this http://lambda-the-ultimate.org/node/4129 so slow to render?

15:59:43 sjl_ curl -si lambda-the-ultimate.org | grep Powered-By

15:59:45 sjl_ X-Powered-By: PHP/5.2.6-1+lenny16

15:59:47 sjl_ Maybe ask in a PHP channel?

16:02:06 Bike_ ** NICK Bike

16:03:13 Lycurgus looking for van wijngaarden grammar based parse in cl, will check log for any replies, ty in advance

16:03:20 Lycurgus *parser

16:03:41 Lycurgus or more likely parser generator, cl-yacc variant, etc.

16:12:45 Bike_ ** NICK Bike

16:56:29 Demosthenex i was just asking about those ;]

16:57:12 Demosthenex dim: the filter and such sounds applicable to single row records, i'm having to collate data from a multiline record and transpose that into a single row for insertion, i don't think i can do that during a COPY

16:57:39 dim sure you can

16:58:23 dim that's what I'm doing in the pubnames examples, where the input format is XML key/value and with a single k/v per input line, and I'm using several values to form a single row in the COPY stream

16:58:49 dim if you process in CL then just call db-write-row when you have all the tuple information