Loading
Salesforce now sends email only from verified domains. Read More

How disaster recovery by DNS resolution switch works on PCE

Publish Date: Jul 25, 2025
Steps

PREREQUISITES

In order to pursue a DR strategy in PCE you will need at least:
  • Two independent PCE installation using the same version and number of nodes
  • Have the same data on both platforms. Follow the backup and restore procedure to replicate PROD data into DR.
  • An external load balancer in front of each your PCE platforms
  • A name server to resolve the hostname of PCE that can be altered

SCENARIO

In case your PROD environment goes bad you will perform a DNS switch to make the PCE hostname resolve to the DR environment load balancer:
Example:
  1. PROD load balancer is hosted at 173.1.227.112 and your DR load balancer at 173.2.227.112
  2. PCE hostname is mypce.prod.domain.local and resolves to 173.1.227.112
  3. You perform the DNS switch from Prod (173.1.227.112) to DR (173.2.227.112)
  4. Now mypce.prod.domain.local resolves to 173.2.227.112
Other DR scenarios, like a single LB driving traffic to one environment or the other, are not covered by this article, though the general principles would apply.

SWITCHING ENVIRONMENTS

When you need to switch from one environment to the other, either from PROD to DR or back after recovery, these are the actions to be performed:
  1. Alter DNS record to point to the load balancer of the entering platform
  2. Restart all the Mule Runtime servers managed by the platform

FREQUENTLY ASKED QUESTIONS

Why is the runtimes restart required?

Because of DNS caching. When the platform goes down the runtimes will attempt to reconnect but will do it to the same IP address as before. There are a number of caches in play: Each machine has a local DNS cache maintained by the OS, intermediate DNS servers have their caches, JVM also has a cache of its own. By default it seems the behavior is to cache address forever, so the reconnection will still be attempted to the initial IP address in this scenario.

Is there a way to avoid runtime restart?

It seems that there is a property we could take advantage of: networkaddress.cache.ttl it is documented here <https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html>. Using this property should allow the cache to expire and the new IP address to be picked up.

I am doing a test of my DR strategy, should I take my PROD platform down to do it properly?

It depends. If you follow the steps you will complete the switch to the DR environment but you wont't see any errors on the runtimes as you would upon a failure of PROD.

Runtimes are connected to the platform through Runtime Manager Agent which maintains a WebSocket to the platform. This WebSocket is permanent, so as long ad both endpoints are up, it will be connected. Let's analyze this scenario: You perform the DNS switch from Prod (173.1.227.112) to DR (173.2.227.112) but both environments are still up. Unless something is done to break this WebSocket connection the runtimes will still be communicating with Prod, as long as that IP is reachable to them.

For a true DR experience you should make PROD unavailable.  For example having the load balancer close and reject all incoming connections. 
Knowledge Article Number

001116302

 
Loading
Salesforce Help | Article