Fundamental limitations of container isolation
Containerization has become the dominant method for packaging and delivering applications in cloud and microservice architectures. Containers allow you to bundle code, libraries, and dependencies, ensuring high portability and operational efficiency.
However, the massive migration of critical workloads to containers makes the resilience of isolation mechanisms a fundamental concern. Research and surveys of DevOps and security teams consistently show that container security remains one of the top worries: kernel and runtime vulnerabilities, alongside configuration errors, regularly lead to real-world isolation breaches.
The key question is: Is container isolation absolute?
The answer is no, and the reason lies in their architecture. In this article, we'll delve deeper into this question and explore the measures available for mitigating vulnerability risks.
Containers vs. Virtual Machines: the shared kernel
Virtual Machine (VM):
-
Each VM runs its own OS kernel, completely separated from the host.
-
Isolation is enforced by a hypervisor (KVM, Xen, ESXi, etc.).
-
An attack typically must target the hypervisor, significantly narrowing the attack vectors.
Containers:
-
A container is a process isolated using Linux kernel primitives (namespaces, cgroups, etc.).
-
A container holds only the user space (libraries, runtime, application) but shares one kernel with the host and other containers.
-
Any critical kernel vulnerability automatically becomes a potential vector of attack for all containers on the node.
The architectural choice of a shared kernel is what makes container isolation fundamentally non-absolute: successful exploitation of a kernel vulnerability allows an attacker to bypass the boundaries of any namespace.

The role of namespaces and Cgroups in creating isolation
The foundation of container isolation in Linux rests on Namespaces and Cgroups.
-
Namespaces – Isolation of System View: PID, Mount, Network, User, IPC, UTS, and others. Processes "see" only their own space: their filesystem, their PIDs, their hostname, and their network.
-
Cgroups – Resource Control and Limiting: CPU, memory, I/O, process count, etc.
Namespaces create the illusion of a separate system, and Сgroups prevent one container from hogging the host's resources. However, it's crucial to note that Namespaces are an isolation mechanism, not a full-fledged security mechanism against deliberate barrier circumvention. They do not prevent kernel vulnerability exploitation, do not resolve issues with excessive root privileges or privileged containers, do not filter system calls, and do not manage access policies.
Therefore, namespaces and Сgroups are merely a base layer that must be fortified with:
-
Syscall filtering (Seccomp)
-
MAC policies (AppArmor/SELinux)
-
Capability restrictions
-
Ideally, User Namespaces + rootless mode
Container Platform Security: critical threats and attack vectors
The attack surface of a container platform is multi-layered. The following main surfaces can be identified:
-
Runtime and kernel (runc, containerd, CRI-O, Linux kernel).
-
Image supply chain.
-
Container and orchestrator configuration.
-
Cloud and infrastructure (IAM, API, storage, network).
Overview of attack vectors and critical vulnerabilities: Container Escape

Typical vectors in a container environment include: exploiting a CVE in the Linux kernel or container runtime; using insecure images (outdated packages, exploit kits); configuration errors in Kubernetes/Docker (over-privileges, host filesystem mounting, host network stack); and registry and CI/CD compromise.
It is important to understand that a Container Escape is often a combination of factors:
Kernel/Runtime Vulnerability + Poor Configuration (running as root, --privileged, host filesystem mount, missing MAC/Seccomp).
Supply chain vulnerabilities
The container image supply chain is one of the main risks because base images from public registries often contain known CVEs. Dependencies (pip/npm/gem/go modules) are pulled transitively and are not always controlled. Most dangerously, malware can be "dormant" and only manifest at runtime.
Therefore, the following best practices are mandatory in production:
-
Image scanning for CVEs, using tools like Trivy, Clair, Grype, etc.
-
Using policy-aware registries with enforcement, such as Harbor, Quay, or GHCR.
-
Image signing and verification before deployment.
Configuration errors and privileged containers
The most common error that widely opens the door to vulnerabilities is running processes inside the container as root! If the application in the container is launched with superuser rights, upon container compromise, the attacker gains root inside the container. A successful escape outside the container then significantly simplifies the escalation to the host root.
Prohibiting the launch of applications as root and utilizing unprivileged UID/GID is the basic rule of proper hardening.
Privileged containers using the --privileged flag practically disable key isolation mechanisms, specifically some Cgroups, AppArmor/SELinux, and Seccomp restrictions. This gives the container direct access to host devices and pseudo-filesystems (/proc, /sys, /dev, etc.). From a privileged container, it is easy to mount host disks and modify the filesystem and configurations.
In essence, a privileged container is equivalent to an intentional privilege escalation. Remember, this is only acceptable in rare, well-isolated scenarios, and certainly not in a multi-tenant environment.
There are also "silent" configuration errors that can make the container environment extremely vulnerable, even without using the --privileged flag. The most dangerous are:
-
Mounting hostPath to sensitive host directories, such as /etc or /var/run/docker.sock.
-
Granting unnecessary Capabilities, for instance, CAP_SYS_ADMIN, which virtually grants root rights on the host.
-
Unjustified use of hostNetwork or hostPID modes.
These seemingly minor settings can allow an attacker to escape the container and gain control over the kernel or the entire host system.
Architecture сomparison: Docker vs. Podman
The architectural differences between Docker and Podman directly influence the threat model. Docker, with its high-privilege daemon, presents the main risk: a single point of failure.
Docker and rootful mode
In the classic, or standard, Docker architecture, the Docker CLI communicates with the dockerd daemon via the /var/run/docker.sock socket or TCP. The dockerd daemon runs as the root user and performs core operations:
-
Creating/deleting containers
-
Building images
-
Managing networks/volumes
-
Interacting with registries
It follows that direct access to the Docker API is equivalent to almost full root access to the host – allowing an attacker to mount the filesystem, launch privileged containers, etc. In this scenario, the daemon becomes a single point of failure – if it’s compromised, the attacker gains host access.
Therefore, Docker requires strict access control to the socket, mandatory enablement of MAC and Seccomp, and restriction of Capabilities. Crucially, minimizing the number of users and services with access to the Docker API is essential.
Daemonless Podman: the native linux approach
Podman was developed by Red Hat and the community as a safer, "native" Linux alternative. It was originally designed with a daemonless architecture, meaning there is no persistent, centralized daemon. Instead, containers are child processes of the user who ran the command. The foundation is deep integration with systemd, where units and services are created directly.
This eliminates the single point of failure presented by the root daemon and aligns better with the traditional Unix model: "whoever launched the process owns it".
Rootless mode: User Namespaces as the main security barrier
The rootless mode is essentially the logical answer to the fundamental danger: "What if the container does break out?"
The role of user namespaces and damage mitigation
User Namespaces isolate UID/GID and allow for UID remapping. For example, if a process inside a container has UID 0 (root), that same process is mapped to an unprivileged UID from the subuid/subgid range on the host.
Therefore, even if an attacker inside the container has root privileges and succeeds in exploiting a runtime/kernel vulnerability to escape to the host level, they continue to operate as an unprivileged user on the host.
Many exploitation chains, including runc-escape, are thus broken or have a significantly reduced impact. This is because the attacker cannot overwrite root-owned binaries/configurations, cannot mount filesystems, or manage devices without Capabilities, and MAC policies continue to operate on the host too.
Rootless Podman: Limitations and the Network Stack
Podman was designed from the start with a focus on rootless operations. Containers are created and managed by an unprivileged user, user namespaces are enabled by default, and SELinux/AppArmor and Cgroups are used to the extent possible without root access.
However, there are some tradeoffs in this scheme. When a rootless container operates within the subuid/subgid range, mapping errors are possible if the image contains files with UID/GID values that fall outside this range.
Another indirect drawback is that an unprivileged user cannot freely configure network namespaces. Since early 2025, Podman has been using the user-space stack pasta as a replacement for slirp4netns. This plugin operates entirely in user space, but thanks to the tap interface and zero-copy splice, it delivers near-native network performance. Nevertheless, performance slowdowns are still possible, and are definitely guaranteed if users continue to use slirp4netns.
Rootless Docker: implementation and requirements
Docker also supports a full-fledged rootless mode, in which the dockerd daemon and containers are launched inside a user namespace without the privileges of the host's root user. Importantly, this differs from the older userns-remap feature, where the daemon itself remained privileged.
From a security standpoint, this guarantees that a container breach will not result in automatic escalation to the host root user level. In the event of a successful attack on dockerd in rootless mode, only the resources of the unprivileged user will be accessed, not the entire system.

The choice between rootful and rootless modes is primarily a balance between maximizing performance and maximally limiting potential damage in the event of a breach.
Conclusions and hardening recommendations
The analysis confirms that container isolation is not absolute due to their architecture. Because containers share the host system's kernel, vulnerabilities at the kernel or runtime level can, by definition, "see" across namespaces, violating expected security boundaries.
In practice, most successful attacks are linked to a combination of a known vulnerability (kernel/runc) and a weak configuration (root, --privileged, hostPath).
A reliable container security model must be built as a multi-layered system:
Layer 1 – Segmentation: (namespaces, cgroups)
Layer 2 – Policies and Filtering: (Seccomp, MAC, Capabilities)
Layer 3 – Architectural Privilege Restriction: (via rootless mode and User Namespaces)
Checklist: host hardening for Docker and Podman containers
Rootless by Default
Consider Podman or Docker in rootless mode as the default model for internal and multi-tenant workloads.
Kernel Hardening and Patching
Promptly update the kernel (Dirty COW, Dirty Pipe, and similar CVEs are almost always already addressed by patches). Use distributions with a good security patch history (RHEL, OpenSUSE, Ubuntu LTS, etc.).
Adopting Specialized Immutable OS Distributions
To enhance container environment security, use immutable OS distributions such as openSUSE MicroOS or Flatcar Linux. Their root filesystem is read-only by default, significantly reducing the attack surface. The key mechanism is atomic updates: a new OS image is created, and the update is either applied completely or fully rolled back, preventing system corruption and speeding up recovery.
Principle of Least Privilege
Never use the --privileged flag in production, except for highly specialized, isolated environments. Avoid running applications inside containers as the root user and use a read-only filesystem, the minimal set of Capabilities, and strict Seccomp profiles.
MAC Policies and Seccomp
Enable and configure SELinux\AppArmor for containers. Use Seccomp profiles, ideally custom ones based on the application's actual system calls, not just the Docker defaults.
Supply Chain Security
Implement container image scanning. Limit the sources of base images to official or internal registries and enable image signing and verification before deployment.
Runtime Monitoring
Employ anomaly detection tools and container IDS/EDR (Falco, Sysdig Secure, or Aqua Security) that can detect container escape attempts and monitor the execution of malicious commands. Track atypical network traffic, unexpected system calls, and access to sensitive files.
The future of isolation: Kata Containers, gVisor, and Micro-VMs
Rootless mode significantly reduces the risk of total host compromise, but it does not solve everything, as dependencies on the host kernel remain and some classes of attacks are not eliminated.
Therefore, hybrid approaches are gaining popularity for critical workloads:
- Kata Containers: Each pod or container is launched inside a lightweight micro-VM with a separate kernel, combining the advantages of VMs and containers.
- gVisor: A sandbox kernel written in Go that acts as an intermediary layer between the application and the true Linux kernel.
These solutions reduce reliance on the shared kernel and move us closer to the VM isolation model while preserving the declarative nature and convenience of container orchestrators.
Turn the guidance in this post into real-world security on your next deployment.
Get a VPS or Dedicated Server and start setting up a secure container platform today.
Proceed to configuration selection.

New Posts:
Docker vs. Podman: A Complete Guide to Daemonless Containerization