Replication queue is blocked on Adobe/Day CQ5
Impact: Authors are able to create and
activate content, but the activated pages are not updated on the CQ5 publish
instances. (Forward replication has issues) This means end users may see stale
content.
End users are able to update user related data however they are not being replicated to Author instances,
or different publisher instances have gone out of sync on data. (Reverse
replication has issues)
Overview
Adobe CQ facilitates the concept of managing content on
publisher instance by creating/ modifying the content of author site and then
replicating the same content on all publisher instances. It does not require
server restart so replication in CQ is seamless and just in time. For
replication process to work smoothly, It is must that replication
infrastructure like queues and agents are functioning properly. Replication
agents are configured and managed via Admin Console CQ
Issue
Any content being activated from
author site is not being replicated on publisher instances (in case of forward
replication queues are blocked). Similarly any content changes on publisher are
not being reverse replicated to Author (in case of reverse replication queues
are blocked). So website users won’t be able to view updated content. Data
among multiple instances can become out of sync depending on the instances
between which queues are blocked.
Resolution
The resolution is to make sure that the replication agents
are fine and the blocked messages on queues are cleaned up.
• Go to the list of replication agents
(/etc/replication/agents.author.html). Access replication agent console to view the list of agents.
• for each replication agent, do the following:
1. Make
sure that the agent is enabled. For every replication agent status should
display ‘enabled’
2. Verify
the connectivity with the publish instance by clicking on the "Test
Connection" link; if it fails, make sure that on TCP network level, the
server hosting the CQ author instance can connect to the port of the publish instance
3. Open
the replication log via the "View Log" link and check when the last
replication attempt was successful. Log should display content replication
information like below logs. Response :
200 OK signifies the successful replication.
10.09.2013 14:36:50 - INFO - publish :
Replication (ACTIVATE) of /content/dam.../pdfs/Draft.pdf successful.
10.09.2013
14:39:13 - INFO - publish : Creating content for page
/content/dam/..../pdfs/Draft.pdf
10.09.2013
14:39:14 - INFO - publish : Sending POST request to https:// prod.publish......services.gs.com:24000/bin/receive?sling:authRequestLogin=1
10.09.2013 14:39:14 - INFO - publish : sent.
Response: 200 OK
4. Make
a note of first payload path in replication queue. Then try to clear the 1st
element of the replication queue, and verify if the replication resumes. Once
it resumes activate the first payload noted above in the queue again. Payload
can be activated by right clicking on content on author site and selecting
activate option.
5. Check
with the CRX
Content Explorer that there is no /bin/receive node on the publish instance,
otherwise delete it. This will ensure the failed replication clean up and
content received on publisher will not be duplicated.
6. Check
with the CRX
Content Explorer that there is no /bin/replicate node on the author
instance, otherwise delete it. This will ensure the failed replication clean up
and content available on author for replication will not be duplicated.
7. In
case the logs show no replication attempt since a few minutes, restart the
replication bundle in the Felix
console; if there's still no replication attempt in the replication logs
then restart the Apache
Sling Event Support bundle
Verification
8.
1. The
content that has been activated above in Step 4 should be verified at Publisher as well as dispatcher.