How ServiceNow Syncs Application Cache Between Nodes

ServiceNow’s infrastructure is one where multiple application Tomcat web servers all communicate with a single database. This is how everything is mainly kept in sync. However, despite indexing, database caches and all the other optimising you could do on the database, if ServiceNow had to query the database for every transaction it would grind to a halt. So for some of the most commonly used and static tables, ServiceNow puts them into the application cache on each of the individual nodes. Examples of this are system properties, ACLs and form layouts.

So for example, when you update a system property, the application knows you’ve just updated the system property and will invalidate and re-build it’s system properties cache (which is why you may have noticed when you update a system property, it can take a while to save sometimes).

All good so far and pretty standard stuff. Now right at the beginning I mentioned that there are multiple application nodes. Each application node is technically independent from the others in so far that they do not directly communicate with each other (they of course share the same database). Going back to the example of updating the system property, the node that you’re connected to currently is aware that you updated the property and therefore can invalidate and re-build its cache. Now what if the other nodes haven’t invalidated their cache? They would still have the old value in their cache and assume it’s correct (sometimes there are caching issues and this is where the ‘/cache.do’ page comes in handy as a manual cache clear out).

The question then is how are the other nodes informed that they need to invalidate their cache? Well there’s a system table called sys_cluster_message. When an application node needs to communicate something directly to another application node (or all of them), it can insert a message in this table with the relevant script that you want it to run.

If you navigate to it, you’ll see a number of records already in there. The vast majority are for invalidating cache on the other nodes but the template is always the same.

The name is always ‘script’. The system ID is the id of the node that triggered the message (the id of the node can be found by navigating to sys_cluster_state table). The recipient in the examples is blank which means that all nodes must execute the script. But in here, you can put a specific node if you need to only run a script on one node. Finally is the message field. In here is a simple bit of XML where the payload element contains the script.

If you want to test this out and have a record written automatically for you, if you navigate to the sys_cluster_state table which shows all the application nodes, navigate to an online node and you’ll see a link UI action called ‘Run Script’. Clicking this will pop up a dialog to enter your script into. Type what you like in there and click Run. Now go back to the sys_cluster_message table and you’ll see it there.

The sys_cluster_message table is a way to allow direct communication between all the nodes but due to the nature of the application, it’s worth noting that the process is asyncrhonous so there’s no guarantee of when the code will run on the application nodes.

 

4 Comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s