Step One – What and where is the most sensitive data?
Understanding what data your organisation has is the first thing to do. You can achieve this through a data discovery exercise, looking at every place data is stored such as databases, network shares and user hard disks. Review the results with your colleagues, compare what you’ve found with your business processes to ensure nothing has been missed.
When you’ve identified all your data it’s time to consider how sensitive it is. Sensitivity can be based on several factors but overall the question is what’s the risk if this data is exposed, loses integrity or is not available? There are going to be nearly as many answers as there are types of data and this introduces the first difficult step into the process: data grouping.
The goal is to be able to classify your data into a small and easily managed number of levels, so it’s necessary to be realistic and group data together. To start it can be helpful to phrase it in terms of high, medium and low risk, remembering classification isn’t the only way to control how it’s handled as different access rights can be implemented for the same classification level. As a rule, most organisations will by fine with no more than four levels along the lines of highly sensitive, sensitive, internal only, everything else.
Step Two – Labelling and handling data
The choice of the labels to use often becomes one of the most difficult and sometimes contentious parts of the process. It’s important to remember the point of labelling is to act as a signpost to the user so that they understand how the data should be handled, it doesn’t have to encapsulate what the data is.
Whatever naming scheme you decide to implement make sure you’ve clearly explained to everyone who creates or has access to the data what each label means, how it should be used and what specific handling must be employed.
When it comes to identifying what label to apply, there are three main methods you can use:
- Content-based – automated inspection and interpretation of data by identifying sensitive information
- Context-based – looks at variables such as location, creator, application as an indication of whether it’s sensitive information
- User-based – relies on the creator or end-user to manually assess if its sensitive information based on their knowledge.
Context-based can be expensive and not always reliable, unless it’s very clear what constitutes sensitive information such as a string of characters in a standard format – like a national insurance number, bank account and sort code, or similar. Many companies won’t be that lucky and sensitivity will not be easy to gauge by automated means. Content-based is partially a circular reasoning; if the data is in a sensitive application it will, most likely, by default be sensitive information. To my mind the best approach is to combine user-based and context-based; by educating the users to think about and recognise data classification requirements, they’ll be far more likely to get the context correct, then others can rely on the context itself.
When the data is correctly classified and labelled, handling should be straightforward. For each classification level you’ll need to agree on where the data can go and who can see it. The aim is to makes sure you limit the opportunity for the information to be exposed to an unauthorised person, lose integrity after being changed in error or through a malicious act, and be available when and where needed. Unless you’re certain you’ve got the necessary controls in place, take a top-down approach and start by restricting everything from everyone, only relaxing the control as needed.
Labelling within each document or piece of information will make it clear to all what classification has been applied. Depending on the application this can be easily achieved. Labels should be visible to everyone who needs to see them, so while meta tags can be great they aren’t always visible and run the risk of being missed. Another one to be aware of is headers and footers aren’t displayed in Excel unless you select to view in page layout or print it out. If an application is used exclusively for a single classification the application opening page should reflect that.
All instructions for labelling and handling data must be documented within a policy and procedures. The process will fail unless everyone understands and uses it.
Step Three – Who needs to access and use the data?
This first part of this step should be relatively easy as it’s likely you already know the answer or have a good idea. Again, use the business process the data is relevant to and you should be able to determine the internal people involved.
The second part is a little more difficult; now you’ve established the internal access requirements you’ll need to widen the analysis – are there people outside the organisation who will need access? This can be harder to understand as it requires a measure of forethought; suppliers, partners, auditors, external consultants, lawyers and others may need to be included under the labelling and handling rules you put in place. You’ll need to have methods in place to deal with these.
Step Four – Controlling access and use
Internally you’ll need to use the access control methods built into your applications and network to control who can see the data. Ideally there are different storage locations for data of different sensitivities but in the real world this rarely happens. The answer is to look at granular access control while ensuring it doesn’t become unworkable; define access rights by roles to which individuals can be assigned. Also consider whether individuals need to be able to modify data or whether their needs will be met by having read-only access.
Most mail systems can be configured to identify data based on a label or text contained within a file, and some are able to act on this. Once a policy has been created within the system it’s possible to restrict the emailing of certain data based on its sensitivity.
There are many technical controls you can apply, for example controls affecting mobile devices, removable media, encryption, clear screens and monitoring. However, a technical solution isn’t always applicable to every situation, this is especially true for external data uses. Instead you’ll need to have in place confidentiality and non-disclosure agreements, and if the data is personal information, possibly data processor agreements too.
Data classification doesn't need to be difficult, all you need to understand is what the data is, where it is, who needs it and how to stop others from accessing it and you’ll be well on the way to a robust data classification and handling system that enhances the security and reduces risk within your organisation to the benefit of yourselves and all those whose data you use.
Blog by our Head of Consulting, Rob Horne