freenode/#mezzano - IRC Chatlog

17:48:02 ebrasca froggey: I don't like recursive locks. Can you make some automatic fixing function?

17:56:54 froggey no. I don't know how that would work

18:11:04 ebrasca Here my function: http://ix.io/1Pmz/lisp

18:12:50 ebrasca I changed this : ((and listener (eql flags +tcp4-flag-syn+)) ...)

18:15:00 froggey what's the error? include the backtrace

18:17:06 ebrasca Here error http://ix.io/1PmE

18:19:31 froggey look at the backtrace and think about it

18:20:39 froggey close-tcp-connection is trying to acquire a connection lock that is already held. you can look through the backtrace for calls to call-with-mutex to see what other function are holding locks

18:20:48 froggey *already held by the current thread

18:21:28 ebrasca Does mezzano.supervisor:condition-wait-for return some not nil value when it worked?

18:23:10 froggey it returns a non-nil value if the predicate returns non-nil. it returns nil if the timeout expires

18:25:23 ebrasca Predicate is same as in tcp-connect , tcp-connect does work.

18:29:23 froggey ok, there are two problems here: 1) the recursive locking error. 2) the unexpected timeout

18:29:46 froggey lets focus on problem 1 for now

18:35:04 froggey oh heh. the original tcp-connect code has the same problem with timeouts

18:35:30 ebrasca Yea I found some bug.

18:36:37 ebrasca ACTION have no idea how to fix it.

18:37:46 ebrasca Tcp is hard.

18:39:02 froggey close-tcp-connection wants to take the connection lock, but can't because it has already been taken by a function higher up on the call stack

18:39:45 froggey you can identify the function that originally took the lock by looking for call-with-mutex in the backtrace

18:40:18 ebrasca I get this in console.

18:41:07 froggey in your case it is %tcp4-receive, but the same bug is also present in tcp-connect as I just found out

18:41:58 ebrasca I can't replicate it in tcp-connect , I suspected it.

18:42:23 froggey try connecting to an address that doesn't exist and let it time out

18:42:42 froggey (mezzano.network.tcp:tcp-stream-connect "10.10.10.10" 1) was enough for me to reproduce

18:43:04 froggey close-tcp-connection doesn't need to do anything fancy with the lock, it does all its work with the lock held

18:43:24 froggey it could be changed to not take the lock, and to require that it is called with the lock already held

18:43:52 ebrasca Yea I am doing it.

18:43:57 froggey it is only ever called from tcp-connect and the close method, so that's a viable solution

18:44:12 ebrasca Only 1 place need to add lock to call it.

18:44:17 froggey right

18:44:21 ebrasca 2 places

18:44:29 ebrasca with lock

18:44:50 ebrasca and 1 in close

18:46:07 froggey this kind of fix won't work everywhere, and it's not something we want to do for public interfaces. we don't want arbitrary code having to take internal locks

18:46:40 ebrasca I undestand

18:47:02 ebrasca But do you care if it is internal function?

18:48:26 froggey not so much, no

18:49:17 ebrasca I have other idea

18:50:15 froggey yes?

18:51:28 ebrasca Writing new version of tcp-connect with it.

18:54:00 ebrasca I can't start my mezzano. Need to recompile to test it.

18:56:07 ebrasca My idea is this : http://ix.io/1PmS/lisp

18:57:16 ebrasca But now if it timeout it have 2 locks.

18:58:06 ebrasca And using 1 temporal variable.

19:01:03 froggey yeah, simpler to just change close-tcp-connection so that it must be called with the connection lock already held

19:01:20 froggey your solution should work, but it doesn't seem very elegant to me

19:02:55 ebrasca I undestand , but do you know someting better?

19:04:26 froggey like I said, just change close-tcp-connection so it is the caller's responsibility to take the lock

19:04:57 froggey tcp-connect would be correct with this change and the close method would have to modified to take the lock around the call

19:06:14 ebrasca But for you it is not elegant and I can't find some elegant method.

19:08:27 froggey no... I said your idea (http://ix.io/1PmS/lisp) wasn't very elegant

19:08:53 ebrasca I don't like it too.

19:10:05 froggey changing close-tcp-connection to put locking responsibility on the caller on the other hand is fine and is what I would do

19:11:09 ebrasca Good to know.

19:16:30 ebrasca I think it is good idea of having some snapshot functionality.

19:16:55 ebrasca When you boot you can go to 1 or other state of mezzano.

19:20:00 ebrasca I think I can remove this commentary ";; FIXME: This is temporary fix for recursive locking in tcp-listen" from tcp.lisp

19:22:35 froggey yup

19:59:07 ebrasca Now I need to fix second error.

20:20:00 froggey the second error is because you're blocking (via condition-wait-for) in %tcp4-receive

20:20:18 froggey the network stack uses a serial queue for packet processing, it only processes one packet at a time

20:21:21 froggey your call to condition-wait-for is waiting for either a timeout or the connection state to change

20:21:31 ebrasca Yes

20:21:56 froggey but the connection state is never going to change because the network stack can't process packets at that point

20:22:12 froggey it's waiting for %tcp4-receive to return before it processes more packets

20:22:28 ebrasca Why it does work with tcp-connect ?

20:22:52 ebrasca asinc ?

20:22:55 froggey tcp-connect is called from other threads, not from the inside the network stack

20:23:06 ebrasca call it with asic?

20:23:19 ebrasca s/asic/async ?

20:23:36 ebrasca mezzano.sync.dispatch:dispatch-async ?

20:23:40 froggey call what with async? tcp-connect or %tcp4-receive?

20:24:00 froggey or something else?

20:24:06 ebrasca waiter inside %tcp4-receive .

20:25:09 froggey you couldn't dispatch it on the network's serial queue, that would have the same problem. the task would be called, block, and prevent other packets from being processed

20:25:49 ebrasca I need to make sure connection is establised before giving it to some app.

20:25:55 froggey right

20:26:44 froggey so you only want to call mailbox-send/whatever when the connection state changes away from :syn-received, right?

20:27:28 ebrasca and it not aborted.

20:27:38 ebrasca and it is not aborted.

20:27:52 ebrasca or closed

20:28:11 froggey why don't you change the code in tcp4-receive?

20:28:19 ebrasca I don't know if you can close before established .

20:28:54 froggey then you don't need to use condition-wait-for at all, you can just do the work you want to do right away

20:29:00 ebrasca I have not think about tcp4-receive .

20:36:15 ebrasca I am going to need mailbox-full-p

20:43:00 froggey I'm not sure that'll do what you want. I think it'll be a soft limit at best, not a hard limit on the number of outstanding connections

20:45:39 ebrasca Then maybe for now I can :wait-p t in mailbox-send.

20:47:47 froggey consider what happens if a client starts 500 connections, sending only the initial syn packet. none of those connections end up in the mailbox, so they don't count towards the limit

20:48:16 ebrasca froggey: What is the purpose of backlog if you have soft limit?

20:48:19 p_l That's something that begs for syncookie

20:48:34 ebrasca froggey: It is handled before .

20:48:54 froggey yeah, I think we're getting into syn flood protection territory...

20:49:08 ebrasca froggey: I can't send ack syn if I don't have space.

20:50:28 ebrasca If mailbox is full I don't send syn ack.

20:50:42 froggey the current system is a hard limit, though it does the rest of the syn/ack sequence in the wrong place (in tcp-listen instead of in the main body of the tcp stack)

20:51:15 ebrasca I am working on it.

20:51:51 froggey ok, under your system when does a connection get added to the mailbox? immediately after the syn is received or after the syn/ack sequence is completed?

20:52:49 ebrasca You add it when it is in state :established .

20:53:40 ebrasca You add it after tcp three-way handshake is done.

20:55:01 ebrasca p_l: What do you mean with "syncookie" ?

20:55:27 housel https://en.wikipedia.org/wiki/SYN_cookies

20:55:53 froggey "you" as in your proposed system, yes? not the current system? the current system adds them to the mailbox immediately after the syn is received

20:55:57 froggey https://github.com/froggey/Mezzano/blob/master/net/tcp.lisp#L171

20:56:32 p_l ebrasca: it's a trick where you encode all state information necessary to reconstruct the state machine in data that you are guaranteed to get back in SYN/ACK answer

20:57:06 froggey if you wait until after the handshake then they don't count towards the mailbox's capacity while the handshake is in progress, that's why I said it was a soft limit

20:58:31 ebrasca froggey: What if connection is established and no space in mailbox ?

20:58:50 ebrasca froggey: Do I close it, do I reset it?

20:59:53 froggey sure, that's fine. throw it away

21:00:29 ebrasca You can make biger this mailbox.

21:00:40 froggey remember that you don't need to use the mailbox's capacity feature to implement this. you can maintain your own more accurate backlog count in the listener object

21:02:27 ebrasca I don't know if it is ok to reimplement/duplicate someting.

21:06:46 froggey I only used it because it was a simple & easy way to implement the backlog functionality at the time

21:07:21 froggey if you can't implement the proper behaviour with it, then it's fine to do something else

21:08:03 ebrasca OK

21:27:20 ebrasca Why mailbox-full-p is bad idea?

21:31:15 froggey it's not a bad idea or a good idea on its own, but the way you want to use it here seems like a bad idea to me

21:31:43 froggey because it means the backlog is treated as a soft limit rather than a hard limit, but maybe that's ok

21:36:52 ebrasca I like to only execute some part code if there is space in mailbox.

21:37:57 ebrasca Like creting connection , sending ack+syn back and adding to mailbox when connection is ready.

21:38:05 ebrasca But maybe it is not good.

21:39:07 ebrasca ACTION now think it is like 0 to 10 seconds after asking if it is full.

21:39:28 ebrasca It maybe no longer have space.

21:39:37 ebrasca froggey: I need to think it better.

21:42:55 froggey yes, that's another problem: https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use

21:43:49 froggey I don't think it's a big problem though, in the worst case a connection is dropped when it could have been accepted

21:47:33 ebrasca froggey: Thank you for helping!