You're viewing Apigee Edge documentation.
Go to the
Apigee X documentation. info
Symptom
Unable to start the ZooKeeper process.
Error messages
When you attempt to start the ZooKeeper process, the following error message is returned indicating that ZooKeeper could not be started:
+ apigee-service apigee-zookeeper status apigee-service: apigee-zookeeper: Not running (DEAD) apigee-all: Error: status failed on [apigee-zookeeper]
Possible causes
The following table lists possible causes of this issue:
Cause | For |
---|---|
Misconfigured ZooKeeper myid | Edge Private Cloud users |
ZooKeeper port in use | Edge Private Cloud users |
Incorrect process ID in apigee-zookeeper.pid file | Edge Private Cloud users |
ZooKeeper Leader Election Failure | Edge Private Cloud users |
Click a link in the table to see possible resolutions to that cause.
Misconfigured ZooKeeper myid
The following sections provide an overview of the myid file and describe how to diagnose and resolve misconfiguration issues.
Overview of the myid file
On each ZooKeeper node, there are two files:
- The
/opt/apigee/apigee-zookeeper/conf/zoo.cfg
file which contains a list of IPs for all the ZooKeeper nodes in the cluster.For example, if the
/opt/apigee/apigee-zookeeper/conf/zoo.cfg
contains the IPs of 3 ZooKeeper nodes part of the cluster as follows:server.1=11.11.11.11:2888:3888 server.2=22.22.22.22:2888:3888 server.3=33.33.33.33:2888:3888
- The
/opt/apigee/data/apigee-zookeeper/data/myid
file contains a single line of text which corresponds to the server number of that particular ZooKeeper node. The myid of server 1 would contain the text "1" and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255.For example, on ZooKeeper server.1, the
/opt/apigee/data/apigee-zookeeper/data/myid
file should just contain the text 1 as shown below:$ cat myid 1
Diagnosis
- Check the ZooKeeper log
/opt/apigee/var/log/apigee-zookeeper/zookeeper.log
for errors. - If you see the WARN message similar to “Connection broken for id #, my id = #”,
as shown in the below figure, then the possible cause for this issue could be that the server #
in the myid file is misconfigured or corrupted.
[myid:2] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker@762] - Connection broken for id 2, my id = 2, error = java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker. run(QuorumCnxManager.java:747)
- Check the
/opt/apigee/apigee-zookeeper/conf/zoo.cfg
file and note down the server.# for the current ZooKeeper node. - Check the
/opt/apigee/data/apigee-zookeeper/data/myid
file and see if the text in this file matches the server.# noted in step #2. - If there is a mismatch, then you have identified the cause for ZooKeeper failing to start.
Resolution
If myid file is incorrectly configured, then edit the myid file and replace the value to a correct text representing the server.# parameter in the zoo.cfg.
ZooKeeper port in use
Diagnosis
- Check ZooKeeper log
/opt/apigee/var/log/apigee-zookeeper/zookeeper.log
for errors. - If you notice the exception
java.net.BindException: Address already in use
while binding to the port #2181, as shown in the figure below, it indicates that the ZooKeeper port 2181 is being used by another process. Hence, the ZooKeeper could not be started.2017-04-26 07:00:10,420 [myid:3] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181 2017-04-26 07:00:10,421 [myid:3] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:130) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
- Use the below netstat command to confirm that the ZooKeeper port 2181 is indeed being used
by another process:
netstat -an | grep 2181
Resolution
If the ZooKeeper port 2181 is still in use, then follow the below steps to address this issue:
- Use the
netstat
command to find the process that is holding onto port 2181. Kill the process that is using the ZooKeeper port 2181:$ netstat -antp | grep 2181 tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 28016/java <defunct> $ kill -9 28016
- Clean up pid and lock files if they exist:
/opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.pid /opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.lock
- Restart the ZooKeeper:
/opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper restart
Incorrect process ID in apigee-zookeeper.pid file
When you attempt to stop/restart the ZooKeeper, it may fail because the
apigee-zookeeper.pid
file contains older/incorrect pid and not that of the currently
running ZooKeeper process. This may happen if the ZooKeeper process terminated unexpectedly or
abruptly for some reason and the apigee-zookeeper.pid
file was not deleted.
Diagnosis
- Get the process id of the currently running ZooKeeper process by running the
ps
command:ps -ef | grep zookeeper
- Check if the
/opt/apigee/var/run/apigee-ZooKeeper/apigee-zookeeper.pid
file exists. If it exists, then note down the process id written into this file. - Compare the process ids taken from step #1 and #2. If they are different, then the cause
for this issue is having the incorrect process id in the
apigee-zookeeper.pid file.
Resolution
- Edit the apigee-zookeeper.pid file and replace the incorrect process id with the correct process id obtained from ps command (step #1 above).
- Restart the ZooKeeper:
/opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper restart
ZooKeeper Leader Election Failure
Diagnosis
To diagnose:
- Check the ZooKeeper log
/opt/apigee/var/log/apigee-zookeeper/zookeeper.log
for errors. - Check if there were any configuration changes which may cause ZooKeeper election of the leader to fail.
- Check the
/opt/apigee/apigee-zookeeper/conf/zoo.cfg
and make sure all ZooKeepers in the cluster have the proper number and IP addresses for the server.# parameter. Also note that for the leader election to succeed there needs to be at least 3 voters minimum and the number of voters should be odd numbered. If there are too little voters, like only 2 voters, it cannot come to a quorum to decide a leader among only 2 voters.
Resolution
Typically, ZooKeeper election failure is caused by a misconfigured myid. Use the resolution in Misconfigured ZooKeeper myid to address the election failure.
If the problem persists and further diagnosis is needed, contact Apigee Edge Support.