Has anyone ran into this exception?
17-11-27 22:10:39 kenny-ubuntu ERROR [onyx.peer.peer-group-manager:348] - Error caught in PeerGroupManager loop.
java.lang.Thread.run Thread.java: 745
java.util.concurrent.ThreadPoolExecutor$Worker.run ThreadPoolExecutor.java: 617
java.util.concurrent.ThreadPoolExecutor.runWorker ThreadPoolExecutor.java: 1142
...
clojure.core.async/thread-call/fn async.clj: 442
onyx.peer.peer-group-manager.PeerGroupManager/fn peer_group_manager.clj: 380
onyx.peer.peer-group-manager/peer-group-manager-loop peer_group_manager.clj: 339
...
onyx.peer.peer-group-manager/eval29918/fn peer_group_manager.clj: 259
onyx.messaging.aeron.messaging-group/media-driver-healthy? messaging_group.clj: 34
io.aeron.CommonContext.isDriverActive CommonContext.java: 420
io.aeron.CommonContext.mapExistingCncFile CommonContext.java: 374
org.agrona.IoUtil.mapExistingFile IoUtil.java: 265
java.io.RandomAccessFile.<init> RandomAccessFile.java: 243
java.io.RandomAccessFile.open RandomAccessFile.java: 316
java.io.RandomAccessFile.open0 RandomAccessFile.java
java.io.FileNotFoundException: /dev/shm/aeron-kenny/cnc.dat (Too many open files)
Receiving this exception after sending a segment that looks like this {:message {:a "c"}}
onto a core async channel that flows to a Kafka producer task. The exception does not occur immediately. It takes several seconds (maybe ~30s) until it occurs. Within those 30s, Zookeeper rapidly logs these:
ookeeper_1 | 2017-11-27 23:01:30,035 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /172.20.0.1:57360
zookeeper_1 | 2017-11-27 23:01:30,035 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@928] - Client attempting to establish new session at /172.20.0.1:57360
zookeeper_1 | 2017-11-27 23:01:30,039 [myid:] - INFO [SyncThread:0:ZooKeeperServer@673] - Established session 0x15fff850f7306b3 with negotiated timeout 30000 for client /172.20.0.1:57360
zookeeper_1 | 2017-11-27 23:01:30,048 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x15fff850f7306b3
zookeeper_1 | 2017-11-27 23:01:30,051 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /172.20.0.1:57360 which had sessionid 0x15fff850f7306b3
Every 20ms a new section of the above is logged from Zookeeper.Probably this actually 🙂 http://www.onyxplatform.org/docs/user-guide/0.12.x/#_peer_fails_to_start_and_throws
@kenny I think lsof is your friend here, because the cnc.dat file shouldn’t be the overall cause, as only one should be created.