freenode/#mezzano - IRC Chatlog
Search
17:48:02
ebrasca
froggey: I don't like recursive locks. Can you make some automatic fixing function?
18:20:39
froggey
close-tcp-connection is trying to acquire a connection lock that is already held. you can look through the backtrace for calls to call-with-mutex to see what other function are holding locks
18:21:28
ebrasca
Does mezzano.supervisor:condition-wait-for return some not nil value when it worked?
18:23:10
froggey
it returns a non-nil value if the predicate returns non-nil. it returns nil if the timeout expires
18:29:23
froggey
ok, there are two problems here: 1) the recursive locking error. 2) the unexpected timeout
18:39:02
froggey
close-tcp-connection wants to take the connection lock, but can't because it has already been taken by a function higher up on the call stack
18:39:45
froggey
you can identify the function that originally took the lock by looking for call-with-mutex in the backtrace
18:41:07
froggey
in your case it is %tcp4-receive, but the same bug is also present in tcp-connect as I just found out
18:42:42
froggey
(mezzano.network.tcp:tcp-stream-connect "10.10.10.10" 1) was enough for me to reproduce
18:43:04
froggey
close-tcp-connection doesn't need to do anything fancy with the lock, it does all its work with the lock held
18:43:24
froggey
it could be changed to not take the lock, and to require that it is called with the lock already held
18:43:57
froggey
it is only ever called from tcp-connect and the close method, so that's a viable solution
18:46:07
froggey
this kind of fix won't work everywhere, and it's not something we want to do for public interfaces. we don't want arbitrary code having to take internal locks
19:01:03
froggey
yeah, simpler to just change close-tcp-connection so that it must be called with the connection lock already held
19:04:26
froggey
like I said, just change close-tcp-connection so it is the caller's responsibility to take the lock
19:04:57
froggey
tcp-connect would be correct with this change and the close method would have to modified to take the lock around the call
19:10:05
froggey
changing close-tcp-connection to put locking responsibility on the caller on the other hand is fine and is what I would do
19:20:00
ebrasca
I think I can remove this commentary ";; FIXME: This is temporary fix for recursive locking in tcp-listen" from tcp.lisp
20:20:00
froggey
the second error is because you're blocking (via condition-wait-for) in %tcp4-receive
20:20:18
froggey
the network stack uses a serial queue for packet processing, it only processes one packet at a time
20:21:21
froggey
your call to condition-wait-for is waiting for either a timeout or the connection state to change
20:21:56
froggey
but the connection state is never going to change because the network stack can't process packets at that point
20:25:09
froggey
you couldn't dispatch it on the network's serial queue, that would have the same problem. the task would be called, block, and prevent other packets from being processed
20:26:44
froggey
so you only want to call mailbox-send/whatever when the connection state changes away from :syn-received, right?
20:28:54
froggey
then you don't need to use condition-wait-for at all, you can just do the work you want to do right away
20:43:00
froggey
I'm not sure that'll do what you want. I think it'll be a soft limit at best, not a hard limit on the number of outstanding connections
20:47:47
froggey
consider what happens if a client starts 500 connections, sending only the initial syn packet. none of those connections end up in the mailbox, so they don't count towards the limit
20:50:42
froggey
the current system is a hard limit, though it does the rest of the syn/ack sequence in the wrong place (in tcp-listen instead of in the main body of the tcp stack)
20:51:51
froggey
ok, under your system when does a connection get added to the mailbox? immediately after the syn is received or after the syn/ack sequence is completed?
20:55:53
froggey
"you" as in your proposed system, yes? not the current system? the current system adds them to the mailbox immediately after the syn is received
20:56:32
p_l
ebrasca: it's a trick where you encode all state information necessary to reconstruct the state machine in data that you are guaranteed to get back in SYN/ACK answer
20:57:06
froggey
if you wait until after the handshake then they don't count towards the mailbox's capacity while the handshake is in progress, that's why I said it was a soft limit
21:00:40
froggey
remember that you don't need to use the mailbox's capacity feature to implement this. you can maintain your own more accurate backlog count in the listener object
21:06:46
froggey
I only used it because it was a simple & easy way to implement the backlog functionality at the time
21:07:21
froggey
if you can't implement the proper behaviour with it, then it's fine to do something else
21:31:15
froggey
it's not a bad idea or a good idea on its own, but the way you want to use it here seems like a bad idea to me
21:31:43
froggey
because it means the backlog is treated as a soft limit rather than a hard limit, but maybe that's ok
21:37:57
ebrasca
Like creting connection , sending ack+syn back and adding to mailbox when connection is ready.
21:42:55
froggey
yes, that's another problem: https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
21:43:49
froggey
I don't think it's a big problem though, in the worst case a connection is dropped when it could have been accepted