In high-performance computing and AI, the NVIDIA A100 Tensor Core GPU is a benchmark device. Its raw power helps data scientists and researchers move faster.
However,the challenge is using that power efficiently. In multi-tenant setups, where many users or jobs share the same GPU, running tasks one after other leads to low utilization and poor ROI.
Multi-Instance GPU (MIG) solves this long-standing problem by making GPU sharing clean and predictable.
This guide explains memory partitioning on the A100, how MIG works and how to configure it for higher utilization with reliable, secure performance across diverse workloads.
The Problem: Underutilization and Resource Contention
Before MIG, sharing a single A100 in a multi-tenant environment was hard. Two common approaches dominated:
Time-slicing
A scheduler rapidly switches between jobs. It looks like parallelism, but it is not. A greedy process can starve others, which creates unstable latency and weak quality of service.
Over-provisioning
To avoid contention, teams get dedicated A100s. That wastes money. Many tasks, like inference or early model work, do not need a full A100. Expensive GPUs sit idle or run far below capacity.
Neither approach delivers the right mix of efficiency, security and performance isolation. MIG was built to fix this.
The Solution: Multi-Instance GPU (MIG)
MIG is a hardware feature in the A100 that enables spatial partitioning. Instead of sharing one GPU over time, MIG splits the physical device into as many as seven fully isolated instances. Each GPU Instance (GI) has dedicated resources:
- Streaming Multiprocessors (SMs). The A100’s compute cores are divided and assigned to specific GIs.
- High-Bandwidth Memory (HBM2e). Each instance gets a fixed portion of memory.
- L2 cache and memory controllers. Every GI owns its cache slice and memory controllers, which prevents contention and data leakage.
This hardware isolation is the key. A workload in one GI cannot touch the resources of another. You get the effect of several smaller, separate GPUs on a single card. That makes MIG a strong fit for multi-tenant environments where security, predictability and efficiency matter.
Understanding the MIG Architecture and Profiles
The A100’s layout enables MIG to partition compute and memory into slices. On an A100 40 GB, you can think in terms of memory slices of 5 GB and seven compute slices. A GPU Instance is formed by combining a set number of compute and memory slices.
NVIDIA provides predefined MIG profiles that bundle these combinations. Profiles use the naming format:
<number_of_compute_slices>g.<memory_per_instance>gb
For the A100 40 GB, common profiles include:
- 1g.5gb. The smallest partition with 1 compute unit and 5 GB. You can fit up to seven on one A100 40 GB.
- 2g.10gb. Two compute slices with 10 GB. You can fit three per A100.
- 3g.20gb. Three compute slices with 20 GB. You can fit two per A100.
- 4g.20gb. Four compute slices with 20 GB. One per A100.
- 7g.40gb. The full A100, all seven compute slices and 40 GB in a single instance.
These profiles let you mix and match sizes on one A100 to serve different workloads at the same time.
Step-by-Step Guide to MIG Setup
You configure MIG with the nvidia-smi CLI. The flow is straightforward.
Step 1: Prerequisites
- An NVIDIA A100 GPU, PCIe or SXM4
- A recent NVIDIA driver with MIG support, 450.80.02 or later
- CUDA Toolkit 11.0 or later
- A Linux OS with super-user access
Step 2: Enable MIG Mode
MIG is off by default. Enable it on the target GPU. If the GPU is busy, you may need a reset or reboot.
sudo nvidia-smi -i <GPU_ID> -mig 1
Replace <GPU_ID> with the index of the GPU you plan to partition. If you omit the ID on a multi-GPU system, the command attempts to enable MIG on all GPUs.
Step 3: Check Available MIG Profiles
List the profiles supported on your device. This shows IDs, memory, and compute details for each configuration.
nvidia-smi mig -lgip
Use this to plan which profiles you will create.
Step 4: Create GPU and Compute Instances
Create GPU Instances (and their Compute Instances) by specifying your chosen profiles. You can create a uniform layout or a mixed layout.
Example: Partition one A100 into a 3g.20gb instance and a 4g.20gb instance.
sudo nvidia-smi mig -i 0 -cgi 3g.20gb,4g.20gb -C
The -C flag auto-creates matching Compute Instances for each GI.
Step 5: Verify Your Configuration
Confirm that instances were created and are visible as separate devices with unique UUIDs.
nvidia-smi -L
You will see entries like MIG-GPU-UUID-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.
Step 6: Assign Workloads to MIG Instances
Pin an application to a specific MIG device by exporting its UUID. The app will see only that instance.
Export CUDA_VISIBLE_DEVICES=MIG-GPU-UUID-XXXXXXXX-XXXX-XXXX-XXXXXXXXXXXXXXXX
python your_application.py
This gives each workload strict isolation and clear resource boundaries.
The Benefits of Adopting a MIG Strategy
Implementing MIG for multi-tenant workloads offers a host of significant advantages:
Improved GPU Utilization
By right-sizing GPU resources to meet the needs of each workload, MIG eliminates idle time and maximizes the return on investment for expensive A100 hardware.
Cost Efficiency
With MIG, organizations can maximize the value of each GPU by securely sharing it across multiple users. For teams that don’t need a full A100, the option to rent NVIDIA A100 instances with MIG partitioning offers a cost-effective way to scale workloads without overpaying for unused resources.”
Predictable Performance
Hardware isolation ensures that a resource-hungry job cannot impact the performance of other tasks, providing consistent and predictable latency and throughput for all users.
Enhanced Security
The strict partitioning at the hardware level provides a robust security boundary, preventing data breaches and unauthorized access between different workloads or tenants.
Scalability and Flexibility
Administrators can dynamically create or destroy MIG instances to adapt to shifting workload demands, enabling a highly flexible and scalable infrastructure. This is particularly beneficial in a cloud environment where resource needs can fluctuate dramatically.
Unlock Predictable GPU Performance with MIG
Ready to turn one A100 into seven reliable mini-GPUs? With MIG, you can right size memory, isolate workloads and push utilization without noisy neighbors.
AceCloud can operationalize this playbook end to end. Their team guides MIG profile selection, automates Kubernetes node layouts, and tracks per‑instance health with DCGM to keep each tenant performant and secure. They typically begin with a rapid architecture review, then run a pilot on the customer’s existing cluster.
