Confluence is the go-to tool for teams looking to collaborate around the clock and around the globe. While there has been a big push to Confluence Cloud during recent months, it’s important to note that Confluence Data Center is just as reasonable an option for organizations who are looking for an instance with high productivity and security. Data Center allows you to meet your organization’s unique needs with flexible infrastructure choices and controls. It also allows you to modernize your IT infrastructure without compromising on security in a self-managed environment. With Confluence Data Center’s built-in enterprise-grade features, you can meet your organization’s most complex demands. In the rest of this article, we will talk about some of the guardrails which can help make your Data Center instance even more productive.
In late June of 2022, Andrzej Kotas, Product Manager for Jira Data Center, announced to the Atlassian community an exciting development in documentation and recommendations for Data Center deployments of Atlassian Tools. The post introduced Jira and Confluence Guardrails. Sound familiar? Well, we mentioned this in a post in October of 2022.
From the post, guardrails are “product-based documentation and strategic recommendations designed to educate and help you avoid reaching a tipping point where your instance might start to experience performance impacts”.
You might be thinking now, “well that’s not very exciting!” Well, it is, and here is why. For years, Jira and Confluence administrators alike have been looking for guidelines and recommendations on how to manage their on-prem instances. There hasn’t been much in the way of documentation or specific recommendations. With the soon-to-be unsupported Server deployments, a lot of the documentation was on how to troubleshoot poor performance. Atlassian outlined the ways that an admin could go about diagnosing the problem, but it was never easy to find information on how to identify if your instance was getting close to having performance problems.
With the introduction of these guardrails, Atlassian aims to communicate the effects of growing data types (i.e: comments or issues) and provide you with controls, insights and capabilities to keep your data in line.
So now that we know the goals of Guardrails, what are they exactly?
More on Guardrails
In short, guardrails for Confluence are recommendations from Atlassian on various limits that a Data Center instance could reach at scale. Atlassian recommends avoiding surpassing these thresholds in order to keep your instance stable.
In Atlassian’s community post about guardrails, they say guardrails are based on real-world experiences with large enterprise customers. This means that there is real data to back up their recommendations within Jira and Confluence Data Center deployments. It also means that these are not a “one size fits all” recommendation. They list out each guardrail and provide a threshold or limit for that guardrail. This threshold is a data point which you can reference when investigating each guardrail. You might find that your instance is performing well and over some of these thresholds. Or you might find your instance is performing slowly but it’s operating within these thresholds. We think it’s important to at least mention this before we dig into the weeds of the guardrails.
It’s also important to mention that these guardrails are not hard or exact limits, but rather recommendations when performance of your Data Center instance could be at risk.
Guardrails themselves are built on three pillars:
With "identification", Atlassian aims to make Data Center admins aware of the specific data dimension which could cause performance issues. For "understanding", they look to empower the admin to know the possible outcomes if they go beyond the data dimension. Finally, with "mitigation", Atlassian explains the risks if you are approaching the value or reducing it.
For the remainder of this post, we will take a look at the new Confluence Data Center guardrails. These guardrails are important to us because we want to ensure our customers are getting the most out of Linchpin Intranet Suite and Confluence Data Center.
Are you getting the most
out of Confluence?
Try out Linchpin Intranet Suite and transform your corporate wiki into a productive
and rewarding social intranet.
So how can we keep the risk low of overloading our index? One way could be to use SSD disks for your local home and shared home directories. Another way could be to disregard indexing attachments if you don’t want to search for them. Finally, you could migrate some spaces to Cloud to reduce the size of your DC instance. There are a few more ways to mitigate this risk, and you can read about them here.
Spaces can be a pain for any Confluence admin. For this guardrail, Atlassian recommends keeping the total number of Spaces under 10,000. To read how to find the number of Spaces in your site, click this link.
If you operate a Confluence Data Center instance that has above 10,000 Spaces, you could see high memory and CPU consumption whenever Confluence needs to perform permission checks to determine which pages to display to a user. An example of this is in the Confluence dashboard and various macros which show lists of pages.
In order to mitigate this guardrail, you could enable faster permissions service. Another option is to delete some Spaces that are empty or no longer needed. Thirdly, we could move some Spaces to a small Cloud site for storage. Atlassian offers free subscriptions to Confluence for up to ten users and 2 GB of data.
Space Size (for import)
This is the total number of pages, blogs, attached files, version history, and trash, in a single Space. The guardrail is 5 GB for the entities.xml file in the Space export zip file.
If you try to import a Space that is larger than 5 GB, Atlassian has seen cases where an instance runs out of memory and crashes. This is obviously quite serious and something we should pay attention to. Another, more minor issue, is that during this import, your entire site’s performance could be degraded.
In order to prevent this from happening, you should consider splitting your Space into smaller chunks for import. You can also use retention rules to reduce the overall size of the Space.
Here, we are looking at the total number of users synced between LDAP and Confluence. If you and your team are using Microsoft Active Directory, the guardrail for this is 100,000 users. If you are using another Active Directory, the guardrail is 70,000 users. This is obviously a lot of users and would only come into play in the largest instances.
However, if your instance happens to be this size, syncing this many users can cause very high CPU usage when checking for group permissions on pages. Another pitfall is that the sync of users could take a very long time.
In order to avoid these potential risks, admins managing Microsoft ADs should enable incremental syncing. This feature fetches only changes in the LDAP, which speeds up syncing. Another option is to use Crowd which benefits from active user syncronization. Finally, admins can use User, Group, and Membership schema configuration filters to restrict the data synchronization with Confluence.
Similar to the above guardrail, here we are focusing on the number of groups synced between LDAP and Confluence. Also similarly, this guardrail depends entirely on if you are using Microsoft AD or not. If you are, the guardrail is 30,000 groups. If not, the guardrail is 20,000 groups.
The biggest problem if you go over this amount of groups is overall instance instability. You may also experience performance degradation, directory syncs taking a long time, admin screens can be unresponsive, and user authentication can take a long time.
The mitigation steps for this guardrail are the same as the above section. Follow those steps to ensure you don’t have any issues.
Depth of Nested Groups
The final guardrail is how deep your nested groups go in layers. The limit here is four layers. Atlassian also recommends that groups do not contain groups and users, rather groups should contain one or the other in order to avoid this guardrail.
When systems are operating above this guardrail, Atlassian has noticed instance instability which includes performance degradation and potential outages when usages in Confluence is at a high end. They also noticed directory syncs can take a long time and user authentication can take longer than expected.
The mitigation options for this guardrail are simple. Avoid having too many layers in your groups. You should also avoid having a mix of users and groups within a group if you do have nested groups. This is something that is handled outside of Confluence and will be up to your AD admin.
In the end, guardrails help customers scale responsibly, start creating or enhancing governance policies, and ensure they are within the threshold and their instance is operating well. These guardrails are good for large enterprise customers and medium size customers with fast-growing instances.
It’s important to remember that these guardrails are just guidelines, nothing is written in stone. If you find your instance getting close to or exceeding these guardrails, don’t panic. Go through the mitigation options and see what solutions you and your team can come up with.
At Seibert Media, it’s vital for us to keep improving the performance of our apps for Confluence Data Center when building new versions. This is why you should update your Linchpin Intranet Suite app whenever you have a new update available. We are striving to constantly improve the way Linchpin integrates with Confluence Data Center. If you, as an administrator, can keep updating your Linchpin app and follow these guardrails, your instance should be in a great place.
If you have any questions about Confluence guardrails or want to implement a social intranet in your company, please contact us at Seibert Media. We have helped numerous customers with enterprise-level Data Center deployments keep their instances stable throughout the lifecycle of their business.