#ls2 (irc2p) | #ls2 proposal review channel : Reviews Tuesdays 7 PM UTC: none scheduled

zzz zlatinb, would really appreciate it if you could allocate some time to review the pumper MR

zzz not sure it's solid enough yet for the next release but if we don't start working on it soon it won't make it in

zlatinb zzz: commented on the MR. There is an architectural flaw I'm afraid - all interest changes must happen on the pumper thread

zlatinb so one needs to introduce a queue with interest operation changes and wakeup() the selector after enqueing

zzz hmm. I looked at what jetty does and this is similar

zzz take a look at the locking before you declare it hopeless

zlatinb this is not hopeless per se because we have a time limit to how long we wait in select(..) but still as is this patch will add latency in the case where writes need to be done with interest ops

zlatinb you can write a small test program with two threads - one blocked in select() w/o a timeout, and the other trying to change interest ops

zlatinb actually let me double-check..

zlatinb yes, SELECTOR_LOOP_DELAY = 200

zlatinb so it may not be visible with naked eye

zlatinb I used the exact same design in one of my HFT jobs where microseconds mattered

zlatinb since they don't and the selector is guaranteed to wake up within 200ms it's probably not fatal

zzz zlatinb, the interest ops lock is on the individual SelectionKey, not on the selector; the write lock per-connection, not on the selector

zzz there's no possibility for a 200ms block

zzz like I said, take your time to review the diff

zlatinb I did and added other comments too

zlatinb but the 200ms wait is real

zzz I'm looking at github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/sun/nio/ch/SelectionKeyImpl.java

zzz where's the lock? line 101?

zlatinb yes, which is an abstract method in SelectorImpl implemented in the subclasses

zzz github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/sun/nio/ch/SelectorImpl.java at the bottom, still abstract

zzz still no sign of a lock

zlatinb EPollSelectorImpl.java

zzz not in github.com/AdoptOpenJDK/openjdk-jdk11/tree/master/src/java.base/share/classes/sun/nio/ch

zlatinb github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java

zlatinb and the windows equivalent: github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/windows/classes/sun/nio/ch/WindowsSelectorImpl.java

zzz so that holds updateLock which is also held in processUpdateQueue() called from doSelect()

zzz processUpdateQueue() is called at line 113; the timeout wait is at line 120

zzz so the lock's not held in the wait

zzz it would be very bad design if it did hold a lock in the wait()

zlatinb it was at least in earlier JDKs. Let me write a quick program

zzz how early? java 4 or 5? those were known to be bad

zlatinb around that early yes

zzz this jdk 11 code we're looking at is pretty clear

zlatinb it's not clear to me the interest ops will be changed until the next iteration of doSelect

zzz sure but most writes are happening directly, not in the selector thread

zzz if they do get thrown onto the selector, it will be woken up as usual

zzz 99.9% of writes bypass the selector

zlatinb that depends, maybe not in a bulk transfer. That will need to be tested in a testnet

zzz sure, not claiming it's perfect yet. just that it's been through several live net testing / bug fix iterations already to get to this pint

zlatinb ok, wrote a small program - the changing of the interestOps does not block the changing thread, but it does not take effect until the next iteration of select

zlatinb waiting for pastebins to load

zzz thats fine. if the write doesn't complete, it will get thrown onto the selector which will get woken up, just like before

zlatinb ok. In the code I did for work I would set a state that the channel is full and any further writes to the channel would need to go through the selector. I don't see such state kept in this MR

zzz there's no state. every call to NTCPCon.write() will try to write directly, and then if it fails, throw it to the selector

zlatinb hmm that might be a problem because there is no guarantee that data will be written in order

zzz true, my statement above was a simplification

zzz NTCPCon.write() -> pumper.processWrite() tries to clear the write buf queue

zlatinb yes but there is no guarantee that the pumper thread will grab the cpu before the writing thread writes to a writeable socketchannel

zzz that's why there's now a separate write lock

zzz all writes happen in pumper.processWrite(), whether via the selector thread or another thread

zlatinb that's not sufficient, consider this sequence:

zlatinb 1. writer thread writes to socketchannel, fills it up, registers for interest with selector, releases all locks goes to sleep

zlatinb 2. some time passes, socketchannel becomes writeable again, wakes up selector thread

zlatinb 3. before the selector thread gets the cpu, writer thread comes back, acquires all locks and writes more

zlatinb = data out of order

zzz no, because everything to be written gets put in the write buf queue (no matter what thread), and then pumper.processWrite() takes it off the queue

zzz in your sequence above, if there's still data in the write buf queue after 3), then 4) the selector will write the rest of it (or not)

zzz so when I say a thread 'writes directly', it is in the thread, but it's first putting the data at the tail of the write buf queue, then writing everything it can starting at the head of the write buf queue

zzz that keeps it all in-order

zlatinb ok that looks fine then

zlatinb I'll run it through the testnet and see how it behaves

zlatinb not tonight though

zzz great, glad to finally get your eyeballs on it