freenode/#lisp - IRC Chatlog
Search
21:07:37
vms14
I have no idea how to convert a string to a list of symbols, I just take the first element
21:09:48
aeth
vms14: So the problem is that you're reading in a line into a string, and you're turning "Hello world" into |Hello world| instead of (HELLO WORLD)
21:11:17
vms14
I'm trying to parse input, I just want read every symbol from the input until the user press enter
21:13:37
aeth
vms14: One thing you could do, and it's quite a hack, is (read-from-string (read-line))
21:13:53
vms14
the thing is usually I won't know how many symbols will be, and the delimiter is the newline
21:14:04
aeth
vms14: the second value in read-from-string is where it left off so you can loop on that second value
21:14:25
pjb
vms14: or: (loop for element = (extract-one-item (read-line)) until (eof-element-p element) collect element)
21:15:03
pjb
vms14: using READ or READ-FROM-STRING, you allow input to do whatver it wants with your lisp image, by default.
21:15:11
aeth
vms14: (read-from-string (read-line)) for a line "hello world" will return (values HELLO 5)
21:16:30
aeth
vms14: you can then do (read-from-string (read-line) nil nil :start 5) to get (values WORLD 11)
21:16:53
pjb
So you would want to bind *read-eval* to NIL. but other reader macros can be problematic: (read-from-string "#8931289312839012*") for example, could DOS your system by trying to allocate all its RAM. (or just signal a condition, depending on the implementation).
21:17:19
pjb
another thing is that reading symbols will intern them, so if there's a loop, the input could fill your memory with useless symbols.
21:17:47
pjb
So you might want to intern the symbols in a throw away package that you can delete-package when you're done.
21:17:47
aeth
vms14: The "correct" (safe) way to do things is to parse the string, perhaps with cl-ppcre
21:18:15
aeth
By the time you add in the validation pjb is talking about, the parse solution probably becomes more concise than the elegant solution that pjb and I both said simultaneously
21:18:57
pjb
vms14: (split-sequence #\space (read-line) :remove-empty-subseqs t) is usually all you need.
21:19:36
aeth
Which to use is debatable. split-sequence is a smaller dependency, but if you're doing additional parsing, you might be using cl-ppcre anyway
21:19:40
pjb
(ql:quickload :split-sequence) (use-package :split-sequence) (with-input-from-string (*standard-input* "Hello world! How do you do?") (split-sequence #\space (read-line) :remove-empty-subseqs t)) #| --> ("Hello" "world!" "How" "do" "you" "do?") ; 27 |#
21:21:23
aeth
If you wanted "absolutely 0" overhead, you can get that. Well, not quite 0, you'd have to track start and end positions for each substring. String/sequence functions take in start and end so you can just work like that.
21:22:50
pjb
(com.informatimago.common-lisp.cesarum.array:positions #\space "Hello world! How do you do?") #| --> (5 12 16 19 23) |#
21:24:58
pjb
(let ((string "Hello world! How do you do?")) (loop :for start := 0 :then (1+ end) :for end :in (com.informatimago.common-lisp.cesarum.array:positions #\space string) :collect (cons start end) :into result :finally (return (nconc result (list (cons end (length string))))))) #| --> ((0 . 5) (6 . 12) (13 . 16) (17 . 19) (20 . 23) (23 . 27)) |#
21:25:07
aeth
You could store positions in an array with the :element-type alexandria:array-index, which will probably round up to fixnum or "unsigned fixnum" (it will show up as some strange looking unsigned-byte size like (unsigned-byte 62)) or (in 64-bit implementations) (unsigned-byte 64)
21:25:43
pjb
And then you can use (foo string :start (car pos) :end (cdr pos)) with most sequence functions to process the substrings. Or (subseq string (car pos) (cdr pos)) when you need to extract it.
21:28:18
aeth
You could also do that as two vectors or two lists, one for start position and one for end position. (I think to make the vector, the best solution would be to walk the string twice, first to get the length for the allocated vectors and then to set the elements)
21:28:21
pjb
vms14: Notice that displaced arrays just abstract those (car pos) (cdr pos) bounds. So instead of subseq, you can use (make-array (- (cdr pos) (car pos)) :element-type (array-element-type string) :displaced-to string :displacement-offset (car pos))
21:31:21
aeth
the alternative is to allocate a list or vector of positions, or, as I recently noticed, two sequences instead of one
21:34:57
aeth
splitting isn't the standard way to think about things, the standard way to think about things is with positions, which is why every built-in (and every well-behaved library) has start/end or start1/end1/start2/end2
21:36:55
aeth
the easiest no-library way to do it is probably read-line and do position tracking, but read-char will probably be the most efficient solution
21:38:38
aeth
Thinking about lists can be done with splitting without a library, but only in one direction, splitting the front parts off and keeping the tail.
21:39:27
pjb
Depending on the size of the string and the substrings, displaced arrays may spare a lot of RAM. However, in the substrings are short, then subseq will be more efficient both in time and space. (eg. on a 64-bit system, we can assumme that strings up to 8 or 16 bytes (2-4 unicode characters) are better created rather than (list* string start end) or displaced arrays.
21:42:07
aeth
vms14 might not need a subseq/displacement at all, if it's about determining what to do based on user commands.
21:43:26
pjb
But don't write the state machine by hand! Write a state machine compiler from a high level description!
21:44:07
vms14
yeah, I want to make a transpiler to c, starting with easy stuff like create a variable, output the value, etc
21:45:56
pjb
vms14: or you may have a look at: https://github.com/informatimago/lisp/tree/master/common-lisp/html-generator
21:46:09
pjb
Have a look at https://github.com/informatimago/lisp/blob/master/common-lisp/html-generator/html-generators-in-lisp.txt
21:46:23
aeth
I have a partially complete GLSL generator so I can already essentially transpile to C if I spent a few weeks on it. Very similar syntax.
21:46:42
aeth
Generally, people avoid the parsing problem altogether when generating another language and just work directly in s-expressions
21:48:42
aeth
vms14: the problem is that 90% of the cases where you'd need parsers in other languages, people just avoid them altogether in Lisps and start with s-expressions, so there's probably less work on parsers than you might expect
21:51:39
aeth
vms14: Almost every "transpiler" in Common Lisp starts with s-expressions. If you don't want to start with s-expressions, you should probably act like you're doing the exact same thing as the normal transpilers and use this as the intermediate format.
21:52:42
vms14
what I had is a function wrapping the input from read-line with parens using concatenate 'string xD
21:52:48
aeth
Lisp itself was written in this way. m-expressions were the next step. https://en.wikipedia.org/wiki/M-expression
21:56:56
aeth
This sort of thing in Lisp is always done in at least two stages, where the first stage parses to s-expressions and the last stage turns a direct (or near-direct) s-expression mapping into strings like (:+ 1 2 3) into "(1 + 2) + 3"
22:00:43
aeth
In fact, + is probably one of the harder ones. Mostly you just go (:foo 1 2 3) to "foo(1, 2, 3)" with the only real difficulty being the way to generate the names (e.g. does foo-bar become "fooBar"?)
22:51:25
grewal
vms14: Wouldn't (read-from-string (read-line)) do what you want (read-delimited-list #\Newline) to do?
23:04:05
pjb
vms14: you could make read-delimited-list #\newline work. For this, you need to copy the character syntax from #\) to #\newline.
23:06:55
pjb
theorically. It stil doesn't work :-( (let ((*readtable* (copy-readtable))) (set-syntax-from-char #\newline (character ")") (with-input-from-string (*standard-input* (format nil "hello world~%How do you do~%")) (values (read-delimited-list #\newline) (read-delimited-list #\newline)))) #| ERROR: Unexpected end of file on #<string-input-stream :closed #x3020025DED1D> |#
23:26:28
vms14
and there are more things I'm missing about format, I need to practice a bit with things like ~:* and so on
23:35:59
vms14
I shouldn't be coding yet, but I want to get used to lisp, and the best way is coding
23:36:51
vms14
grewal: I mean I should be reading and doing test stuff and wait a bit to make this program
23:38:03
vms14
also I still thinking the On lisp book should teach me nice things, but I need to understand lisp better before this book, or I'll miss some important stuff
23:49:30
pjb
vms14: loop is nice because it's versatile. Instead of having loops for, while, until, etc, loop does everything. (loop :while … :do …) (loop :do … :until …) (loop :for i :from 0 :to 10 :do …) and other variants: (loop :while … :do … :until …) (loop :do … :while … :do …) etc.
23:50:45
pjb
vms14: note that the :finally clause is jumped to as soon as one terminating clause is validated. So (loop … :until … :do … :finally …) doesn't evaluate :do when the :until condition is true.
23:54:09
aeth
What makes LOOP good for reading is its behavior for :for ... := ... is different than DO's behavior when you do not have an iteration step. With LOOP, it will do the thing initially and then repeat it, with DO it will only do it once so you wind up having to repeat yourself twice (once for the initial value and once for the step) unless you abstract over this with a custom macro.
23:55:39
aeth
So even if you're primarily using DO and/or DO* in your coding style, this is one of those good exceptions where you should use LOOP
23:56:40
aeth
(correction for the nitpickers, you repeat yourself once, which is writing the same code twice, you don't "repeat yourself twice")
23:58:57
pjb
And it's safe: (with-input-from-string (input " #.(delete-file \"~/.bashrc\")") (read-token-list input)) #| --> ("#.(delete-file" "\"~/.bashrc\")") |#
23:59:27
aeth
vms14: Imo, you shouldn't think in terms of "list of atoms being read from input" imo. That's eval()-style behavior (CL's EVAL is different, and eval("1 + 1") in other languages is closer to (eval (read-from-string "(+ 1 1)")) in CL)
23:59:45
aeth
vms14: You should be thinking in terms of what kind of syntax you want to support, and parsing that syntax.
0:00:16
aeth
None of this that we've been talking about is strictly necessary with a sufficiently restrictive syntax
0:01:53
aeth
e.g. you could require the user write things like "foo 42\nbar 43\n" (replace \n with newlines in your head; IRC is limited to one-line-per-message) in which case you don't technically need any intermediate strings.
0:07:19
pjb
and macros wouldn't be unsafe (and worse, unhygienic) if CL's macros weren't so powerful.
0:10:39
aeth
I guess my point is that for untrusted user input you don't want power, so you wind up having to write your own (or use a library) functionality. Shortcuts here are bad.
0:12:22
aeth
read-line vs. read-char is up to you (unless you *need* to not hang, then you have to use read-char-no-hang)