Audit sampling (LONG)

[This piece was prompted by a question on the ISO27k Forum about ISO27k certification auditors checking information security controls, and a response about compliance audit requirements. It's a backgrounder, an essay or a rant if you like. Feel free to skip it, or wait until you have a spare 10 mins, a strong coffee and the urge to read and think on!]

“Sampling” is an important concept in both auditing and science. Sampling (i.e. selecting a sample of a set or population for review) is necessary because under most circumstances it is practically impossible to assess every single member – in fact it is often uncertain how many items belong to the set, where they are, what state they are in etc. There is often lots of uncertainty.

For example, imagine an auditor needs to check an organization’s information security policies in the course of an internal audit or more formal certification/compliance audit.

Some organizations make that quite easy by having a policy library or manual or database, typically a single place on the intranet where all the official corporate policies exist and are maintained and controlled as a suite. In a large/diverse organization there may be hundreds of policies, thousands if you include procedures and guidelines and work instructions and forms and so forth. Some of them may be tagged or organized under an “information security” heading, so the auditor can simply work down that list … but almost straight away he/she will run into the issue that information security is part of information risk is part of risk, and information security management is part of risk management is part of management, hence there should be lots of cross-references to other kinds of policy. A “privacy policy”, for instance, may well refer to policies on identification and authentication, access control, encryption etc. (within the information security domain) plus other policies in areas such as accountability, compliance, awareness and training, incident management etc. which may or may not fall outside the information security domain depending on how it is defined, plus applicable privacy-related laws and regulations, plus contracts and agreements (e.g.nondisclosure agreements) … hence the auditor could potentially end up attempting to audit the entire corporate policy suite and beyond! In practice, that’s not going to happen.

In many organizations, the job would be harder still because the policies etc. are not maintained as a coherent suite in one place, but are managed by various parts of the business for various purposes in various formats and styles. On top of that, ‘policy lifecycle management’ is an alien concept to some organizations, hence even the basics such as having a defined owner, an ‘issued’ or ‘effective from’ date, a clear status (e.g. draft, exposure draft, issued and current, withdrawn) etc. may not be there. Simply getting hands on copies of current policies is sometimes tricky, making it hard to determine how many policies there are, where they are, who owns them, whether they are current, whether they have been formally sanctioned or mandated or whatever.

Note: there could be several ‘audit findings’ in these circumstances, particularly the latter, before the auditor has even started reviewing a single policy in detail!

Scope concerns are emerging already: are ‘compliance policies’ part of the ‘information security policies’ that were to be checked? What about ‘business continuity policies’ or ‘health and safety policies’? What about the ‘employee rulebook’, oh and that nice little booklet used by the on-boarding team in the depths of HR in a business unit in Mongolia? What about a key supplier’s information security policies …? Information is a vital part of the entire business, the entire supply chain or network in fact, making information risk and security a very broad issue. An audit can’t realistically cover “everything” unless it is deliberately pitched at a very high level – in which case there would be no intent to delve deeply into each and every policy.

The next issue to consider is the time and resources available for the audit. Audits are inevitably constrained in practice: usually there is an audit plan or schedule or diary for each audit within the period (often several years), and auditors are in short supply, especially in specialist areas where deep technical knowledge is needed (e.g. tax, information security, risk, health and safety, engineering …).

Another issue is the depth and detail of the audit checks or tests or assessments or reviews or whatever you call them. I could spend hours poring over and painstakingly picking apart a relatively simple website privacy policy in great detail, digging out and checking all the external references (plus looking for any that are missing), exploring all the concerns (and the plus points too: I strive to be balanced and fair!), writing up my findings and perhaps elaborating on a set of recommended improvements. Add on the time needed to initiate and plan the audit, contact the business people responsible, schedule interviews and meetings, complete the internal quality assurance, discuss the draft findings and report, and close the audit, and the whole thing could easily consume a week or three – auditing a single, simple policy in depth. It would need to be an unusually valuable audit to justify the expense, since I could have spent my time on other, more worthwhile audit work instead (an opportunity cost).

Yet another relevant matter is how the auditors go about sampling, the sampling rationale or technique or method. Again, there are lots of possibilities e.g. random sampling, stratified sampling, sampling by exception, pragmatic sampling, dependent sampling etc. The auditors might pick out a couple of items at each level in the policy pyramid, or all the information security policies released within the past six months, or every one produced by the Information Risk and Security Management function at HQ, or every one with a “C” or a “D” in the title, or all those on a pre-compiled shortlist of ‘dubious quality, worth a look’, or all those that explicitly reference GDPR, or whatever. Rather than all, they might pick ‘the top 10%’ by some criterion, or ‘the bottom 10%’ or whatever. They might simply start with whatever policies are most readily available, or whichever ones happen to catch their eye first, and then ‘go from there’, following a trail or a contingent sequence that arises naturally in the course of the initial reviews. The auditors' nose often leads the way.

In my experience, surprisingly few audits are conducted on a truly scientific basis, using sound statistical techniques for sampling and data analysis. It’s fairly unusual for the sampling rationale even to be formally considered and documented, except perhaps as a line or two of boilerplate text in the audit scoping and planning documentation. Usually, the auditors and/or their managers and audit clients come to an informal arrangement, or simply ‘get on with it and see how it goes’, relying on the auditors’ experience and preference. For sausage-machine audits that are repeated often (e.g. certification audits), the sampling rationale may be established by convention or habit, perhaps modified according to the particular circumstances (e.g. an initial infosec policy audit at a new client might seek first to assess the entire policy suite at a high level, with more in-depth audits in specific areas of concern in later audits; an audit at a small local firm might sample just 1 or 2 key policies, while auditing a global conglomerate might involve sampling 10 or more).

Finally, there’s a sting in the tail. All sampling entails risk. The auditors are trying to determine the characteristics of a population by sampling a part of it and generalizing or extrapolating the results to the whole. If the sample is not truly representative, the conclusions may be invalid and misleading, possibly quite wrong. More likely, they will be related in some fashion to the truth … but just how closely related we don’t normally know. There are statistical techniques to help us determine that, if we have taken the statistical approach, but even they have assumptions and uncertainties, which means risk. Furthermore, the evidence made available to the auditors varies in terms of its representativeness. Sensible auditors are quite careful to point out that they can only draw conclusions based on the evidence provided. So not only are they practically unable to conduct 100% sampling, the sample itself might not be truly representative, hence they may miss material facts, hence an audit “pass” does not necessarily mean everything is OK! Most formal audit reports include some boilerplate text to that effect. That is not just a ‘get out of jail free’ card, an excuse or an attempt to gloss-over audit limitations: there is a genuine issue underneath to do with the audit process. It’s reminiscent of the issue that we can identify, assess and quantify various kinds of information risk, but we can’t prove the absence of risk. We can say things are probably safe and secure, but we can never be totally certain of that (except in theoretical situations with specific assumptions and constraints). Same thing with audits.