Cyber Essentials Plus Sample-Size Rules, Explained from the Assessor's Side

Net Sec Group is an IASME and NCSC certification body. The most common scoping surprise on a Cyber Essentials Plus engagement is the sample size. The applicant has 50 devices and expected the assessor to look at "a few of them". The assessor returns a sample count of 24. The cost, time, and access requirements all change.

This article explains how an IASME-accredited assessor calculates the sample, walks worked examples for typical UK fleet shapes, and flags the rules that change the count in ways applicants do not anticipate. If you are scoping a CE Plus engagement and want to predict the sample before the assessor confirms it, this is the calculation.

Why the sample exists

Cyber Essentials Plus verifies the five controls technically, on real devices, with the assessor's tooling, in real time. Verifying every device in scope is impractical for any firm with more than a handful of devices. The IASME Cyber Essentials Plus test specification therefore defines a sampling methodology: a representative sample is tested, and the controls that pass on the sample are accepted as representative of the whole estate.

The sampling methodology is not random. It is structured by build type: each distinct operating system build is sampled separately, with a sample size that scales with the count of devices in that build.

The trade for the applicant is straightforward. A standardised estate with one or two builds produces a small sample. A heterogeneous estate with many builds produces a large sample. The CE Plus sample is, in this sense, a hidden tax on build-fragmentation.

The IASME sample-size table

The current IASME Cyber Essentials Plus test specification publishes the table per build:

| Devices in build | Sample tested | |---|---| | 1 | 1 | | 2 to 5 | 2 | | 6 to 19 | 3 | | 20 to 60 | 4 | | 61 and above | 5 |

This applies per build, not per estate. A firm with 50 Windows 11 Pro 24H2 devices contributes 4 to the sample for that build. A firm with 10 macOS Sonoma devices contributes 3 from that group. The total sample is the sum across all build groups plus servers (see below).

For the netsecgroup.io reference with deeper coverage of the table and the IASME methodology, see Cyber Essentials Plus Sample Sizes.

What counts as a "build"

A build is a distinct combination of operating system version and edition. This is the rule that tends to bite applicants who have not standardised:

The build distinction is the lever that determines whether a 100-device estate has a sample of 8 or a sample of 28.

Servers and hypervisors are always sampled in full

The IASME methodology samples end-user devices but tests every server and every hypervisor. There is no "sample" applied to the server fleet. A 12-server in-scope estate is 12 servers in the test count, not 4.

This includes cloud-hosted servers running on IaaS (Azure VMs, AWS EC2 instances, GCP Compute Engine instances) as long as they are in scope. The cloud-managed services (Azure App Service, AWS Lambda, GCP Cloud Run, managed databases) are not in scope as servers; the underlying infrastructure is the cloud provider's responsibility, and the firm's responsibility is the application code and runtime.

For a typical UK SaaS firm with five Windows servers and three Linux servers in scope, the server contribution to the sample is eight. Plus the end-user device sample on top.

Worked examples for typical UK fleet shapes

Below are five worked fleet shapes covering the range we see most often. Each row shows total device count, build-type breakdown, the resulting sample per build, and the total assessor sample.

Fleet 1: 10-device microbusiness, single Windows build, no servers

| Build | Count | Sample | |---|---|---| | Windows 11 Pro 24H2 | 10 | 3 | | Total | 10 | 3 |

The firm is small, the build is consistent. The sample is 3. The assessment runs as a single half-day technical session.

Fleet 2: 25 devices, two Windows builds, no servers

| Build | Count | Sample | |---|---|---| | Windows 11 Pro 24H2 | 18 | 3 | | Windows 11 Pro 23H2 (un-updated) | 7 | 3 | | Total | 25 | 6 |

Same OS edition, two feature updates. The 23H2 devices that were not pushed forward to 24H2 add a build, so 7 devices contribute 3 to the sample alongside the 18 on 24H2. The remediation lever is to push the 23H2 devices forward, which collapses the sample from 6 to 3.

Fleet 3: 50 devices mixed Windows and Mac, two cloud servers

| Build | Count | Sample | |---|---|---| | Windows 11 Pro 24H2 | 35 | 4 | | macOS Sonoma | 12 | 3 | | Windows Server 2022 (Azure VM) | 2 | 2 (full) | | Total | 49 | 9 |

Two end-user builds, one server build (servers tested in full irrespective of count). 9 devices in the sample. Roughly a one-day technical session.

Fleet 4: 100 devices, three Windows builds, two macOS builds, three on-prem servers, two cloud servers

| Build | Count | Sample | |---|---|---| | Windows 11 Pro 24H2 | 50 | 4 | | Windows 11 Pro 23H2 | 18 | 3 | | Windows 10 Enterprise (legacy) | 6 | 3 | | macOS Sonoma | 15 | 3 | | macOS Ventura | 11 | 3 | | Windows Server 2022 (on-prem) | 3 | 3 (full) | | Windows Server 2022 (Azure VM) | 2 | 2 (full) | | Total | 105 | 21 |

Five end-user builds, two server build groups (servers tested in full). Sample is 21. Notice how the legacy Windows 10 devices add a build that contributes 3 to the sample even though only 6 devices are involved. This is the classic fleet shape that produces a sample larger than the applicant expected.

Fleet 5: 250 devices in a tightly standardised fleet

| Build | Count | Sample | |---|---|---| | Windows 11 Pro 24H2 (managed by Intune) | 230 | 5 | | macOS Sonoma (managed by Jamf) | 18 | 3 | | Windows Server 2022 (Azure VM, in scope) | 6 | 6 (full) | | Total | 254 | 14 |

A larger estate, lower sample than Fleet 4, because the build standardisation is tighter. This is the lesson: at scale, build standardisation pays back into a smaller sample, which pays back into shorter assessment time and cost.

The rules that change the count in ways applicants do not anticipate

Five sub-rules trip applicants up:

  1. Feature updates count as builds. Windows 11 Pro 23H2 and 24H2 are two builds. macOS Sonoma 14.6 and 14.7 are not (point releases of the same major version are the same build), but Sonoma and Ventura are. Tighten the feature-update rollout to reduce builds.

  2. Editions count as builds. Windows 11 Home and Windows 11 Pro are different builds even at the same feature update. A firm with mixed Home and Pro licences has at least two Windows builds.

  3. Android major versions count as builds, OEM skins do not by default. A BYOD scope with users on Samsung, Pixel, and OnePlus phones at Android 14 is one build under the IASME methodology, not three. The assessor can ask for additional samples if the skin-layer configuration affects the controls.

  4. Servers do not get a sample reduction. A 12-server estate produces 12 server tests in the count, regardless of build distribution.

  5. Cloud servers in IaaS scope are servers. They are not "cloud" with a separate methodology. They are tested in full like on-prem servers.

A sixth sub-rule, less common but worth flagging: anti-malware product variants count toward the configuration check. A firm running Microsoft Defender on Windows and CrowdStrike on Mac runs two anti-malware products and the assessor confirms both work; this does not increase the device sample but does add work in the malware-protection block of the assessment day.

Pre-engagement sample prediction

If you are scoping a CE Plus engagement and want a number before the assessor returns one, run this calculation against your inventory:

  1. Pull the current device inventory from the firm's authoritative source (Intune, Jamf, Active Directory, or the asset-management spreadsheet)
  2. Group devices by build (OS version and edition combination)
  3. For each build, look up the sample count from the IASME table above
  4. Sum the per-build samples
  5. Add the count of every server and hypervisor in scope (no sampling reduction on servers)
  6. The total is your predicted sample

The number can be off by 1 or 2 either way depending on exactly how the assessor groups borderline builds. We typically predict within 90% accuracy on the first pass.

For the formal reference and netsec's published table including the methodology behind borderline cases, see Cyber Essentials Plus Sample Sizes. For the second-sample rule that applies when a sampled device fails, see Cyber Essentials Plus Second Sample Rule.

What happens when a sampled device fails

A failure in the sample triggers two things. First, the failed control is recorded against the build group, not against the device. The assessor's interpretation is that the underlying control is not in place across the build, not that one device is misconfigured. Second, the second-sample rule applies, where the assessor selects a fresh device from the same build to verify the failure pattern.

For the per-pattern walk through second-sample triggers, see The CE Plus second-sample-when-required rules on this site. For the broader question of "what if my engagement fails", see The CE Plus second-attempt rules.

Bring-Your-Own-Device sampling

BYOD changes the sample in two ways. First, every BYOD device in scope is part of the sample population for its build. Second, the access patterns are different: the assessor cannot rely on MDM remote-screen-share on a personal device the firm does not manage, and the assessor coordinates with the user directly.

For the BYOD-specific sampling and access patterns, see BYOD sampling on CE Plus on this site.

What the assessor confirms during scoping

Before the engagement starts, the assessor walks the in-scope inventory and confirms the sample count. This usually surfaces three categories of question:

  1. What is the build distribution exactly? The applicant's view of "we use Windows 11 Pro" usually conceals two or three feature-update bands. The assessor pulls a recent inventory export and counts the bands
  2. Are all servers in scope and counted in full? The applicant's view of "a couple of servers" sometimes excludes a development server, a backup server, or a cloud VM that should be in scope
  3. Is BYOD in or out? If users access in-scope cloud admin consoles from personal phones or laptops, those devices are in scope under BYOD rules unless documented exclusions apply

The output of the scoping conversation is a confirmed sample count, an estimated technical-day duration, and any pre-engagement remediation recommendations to reduce the sample (typically: push deferred feature updates forward, retire end-of-life devices, decide BYOD scope explicitly).

For the per-control checks the assessor will run on each sampled device, see What the CE Plus assessor checks. For the per-control evidence formats the assessor accepts, see The evidence the CE Plus assessor accepts.

Common questions

If we standardise our estate after scoping but before the assessment day, does the sample reduce?

Yes. The assessor scopes against the current state on the day. If a feature-update rollout collapses three Windows builds into one between scoping and the assessment day, the assessor confirms the new state and reduces the sample accordingly.

Can the assessor sample more than the table requires?

Yes if the assessor has reason to believe the sample is unrepresentative. This is rare and triggered by patterns like inconsistent patch state across the inventory, evidence of unmanaged devices, or contradictory information from the applicant during scoping. The IASME methodology is a minimum, not a maximum.

Can the assessor sample fewer devices than the table requires?

No. The IASME table is binding from below. The assessor cannot accept a smaller sample even if the firm requests it.

Are smartphones in scope when the firm uses them only for two-factor authentication?

A device that performs an authenticator-app role only and is otherwise out of the firm's day-to-day work surface is typically out of scope under the test specification's scope-of-work rules. The judgement is on whether the device handles in-scope data; an authenticator app does not handle in-scope data. The assessor confirms during scoping.

What if our inventory is incomplete?

We routinely run a discovery pass before the formal scoping conversation, especially for firms whose authoritative inventory has drifted. The discovery pass identifies devices the firm did not know it had in scope, and the sample is calculated against the corrected inventory.

Next steps

For the deeper netsecgroup.io references:

When you are ready to scope or want a sample-prediction pass against your own inventory before booking, contact Net Sec Group or book a Cyber Essentials Plus assessment directly.