onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
kenny 2017-11-27T23:05:40.000330Z

Has anyone ran into this exception?

17-11-27 22:10:39 kenny-ubuntu ERROR [onyx.peer.peer-group-manager:348] - Error caught in PeerGroupManager loop.
                                      java.lang.Thread.run              Thread.java:  745
        java.util.concurrent.ThreadPoolExecutor$Worker.run  ThreadPoolExecutor.java:  617
         java.util.concurrent.ThreadPoolExecutor.runWorker  ThreadPoolExecutor.java: 1142
                                                       ...                               
                         clojure.core.async/thread-call/fn                async.clj:  442
          onyx.peer.peer-group-manager.PeerGroupManager/fn   peer_group_manager.clj:  380
      onyx.peer.peer-group-manager/peer-group-manager-loop   peer_group_manager.clj:  339
                                                       ...                               
                 onyx.peer.peer-group-manager/eval29918/fn   peer_group_manager.clj:  259
onyx.messaging.aeron.messaging-group/media-driver-healthy?      messaging_group.clj:   34
                     io.aeron.CommonContext.isDriverActive       CommonContext.java:  420
                 io.aeron.CommonContext.mapExistingCncFile       CommonContext.java:  374
                         org.agrona.IoUtil.mapExistingFile              IoUtil.java:  265
                           java.io.RandomAccessFile.&lt;init&gt;    RandomAccessFile.java:  243
                             java.io.RandomAccessFile.open    RandomAccessFile.java:  316
                            java.io.RandomAccessFile.open0     RandomAccessFile.java     
java.io.FileNotFoundException: /dev/shm/aeron-kenny/cnc.dat (Too many open files)
Receiving this exception after sending a segment that looks like this {:message {:a "c"}} onto a core async channel that flows to a Kafka producer task. The exception does not occur immediately. It takes several seconds (maybe ~30s) until it occurs. Within those 30s, Zookeeper rapidly logs these:
ookeeper_1  | 2017-11-27 23:01:30,035 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /172.20.0.1:57360
zookeeper_1  | 2017-11-27 23:01:30,035 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@928] - Client attempting to establish new session at /172.20.0.1:57360
zookeeper_1  | 2017-11-27 23:01:30,039 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@673] - Established session 0x15fff850f7306b3 with negotiated timeout 30000 for client /172.20.0.1:57360
zookeeper_1  | 2017-11-27 23:01:30,048 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x15fff850f7306b3
zookeeper_1  | 2017-11-27 23:01:30,051 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /172.20.0.1:57360 which had sessionid 0x15fff850f7306b3
Every 20ms a new section of the above is logged from Zookeeper.

kenny 2017-11-27T23:08:07.000052Z

Probably this actually 🙂 http://www.onyxplatform.org/docs/user-guide/0.12.x/#_peer_fails_to_start_and_throws

lucasbradstreet 2017-11-27T23:10:22.000323Z

@kenny I think lsof is your friend here, because the cnc.dat file shouldn’t be the overall cause, as only one should be created.

1