Procedures for Addressing Disk Subsystem Failures
Introduction
Disk stability is critical for maintaining service continuity and protecting your data. While INTROSERV infrastructure uses enterprise-grade hardware, issues can still occur due to hardware faults, RAID degradation, or filesystem errors. This document explains the types of disk problems that may occur, what you can do to help resolve them, and how our support team handles recovery procedures.
INTROSERV Infrastructure: Protecting Your Data
INTROSERV uses a multi-layered approach to disk reliability. INTROSERV servers are equipped with enterprise-grade HDD, SSD, and NVMe drives and tested RAID controllers. Most servers support hot-swap disks, meaning replacements can be completed without shutting down your server. Upon request, we can also monitor your storage health and performance.
Types of Storage Issues
Several types of disk-related problems may occur in any server environment:
- Physical disk failure - The disk stops responding or reports critical SMART errors. SMART (Self-Monitoring, Analysis and Reporting Technology) detects early signs of disk problems.
- RAID degradation - One disk in a RAID array is offline, and the system runs in reduced-capacity mode. During this state, your server remains accessible (if the array has redundancy), but performance may be affected and data protection is temporarily reduced.
- RAID failure - Multiple disks are offline or the RAID array cannot be accessed. This requires immediate intervention.
- Controller errors - Problems with the RAID controller or its cache module prevent proper disk communication.
- Filesystem corruption - Data structures on the disk become damaged. The system may automatically switch to read-only mode to prevent further damage.
- Performance degradation - You experience unusual latency spikes during read and write operations, which may indicate emerging disk problems.
- External storage problems - Issues affecting remote storage systems used in certain server configurations.
When you contact support, our engineering team evaluates the incident, determines its severity, and selects the appropriate recovery procedure. We recommend configuring server monitoring to detect disk problems early.
What You Can Check Before Contacting Support
While most disk failures require technical intervention from our engineers, you can gather useful diagnostic information to help us resolve your issue faster. If your server is still accessible, you may perform the following checks:
- Review system logs - On Linux, access your system logs using dmesg (displays kernel messages) or journalctl (displays system journal entries). On Windows, use Event Viewer to check the System log. These logs often contain error messages related to disk problems.
- Run SMART diagnostics - On Linux, use the smartctl command to check disk health, or nvme-cli for NVMe drives. On Windows, free tools such as CrystalDiskInfo can display SMART data. This information helps our engineers diagnose the problem faster.
- Check RAID array status - For software RAID, use your operating system's built-in tools (mdadm on Linux, Storage Spaces or Disk Management on Windows). For hardware RAID, use the controller manufacturer's utility (such as MegaCLI, StorCLI, or the controller's web interface). This information is valuable for diagnosing degradation or failure.
- Back up important data - If the system remains partially accessible, consider backing up critical data to a different location.
The INTROSERV Client Area provides basic information about your server configuration. For detailed logs and hardware diagnostics, access your server's IPMI interface (iDRAC, IRMC, iLO, or similar). Use this information to prepare details before opening a support ticket.
When to Contact Support
Contact our support team immediately when any of the following conditions occur:
- The disk reports SMART errors
- RAID enters degraded mode (one or more disks offline)
- Your system freezes, becomes read-only, or becomes unresponsive
- Filesystem repair attempts do not resolve the issue
- The server does not detect one or more disks
- You notice unusual performance degradation that persists
INTROSERV support operates 24/7 and processes hardware-related incidents with high priority. Contacting us promptly significantly speeds up resolution time.
Information to Include in Your Support Ticket
When opening a support ticket about disk issues, include the following information:
- Server ID, name, or IP address
- Clear description of what you observed (system behavior, error messages, timing)
- SMART output if available
- Current RAID array status if you were able to check it
- Approximate time when the issue started
- Any steps you performed before opening the ticket Disk serial number and slot number if available. If the disk is not recognized by the system, provide serial numbers of all other visible disks.
The INTROSERV Client Area includes general information about your server configuration. However, detailed diagnostic data such as disk serial numbers, SMART output, and RAID status should be collected manually using the tools described above.
How Our Engineers Handle Disk Issues
When we receive your support ticket, our engineering team follows a structured workflow to diagnose and resolve your issue:
Step 1: Hardware status review - With your permission, our engineers connect to your server via IPMI or OS to check the hardware status and verify the condition of each disk.
Step 2: Component assessment - With your permission to access the OS, IPMI, or RAID utility, we identify the faulty disk and confirm which disks are functioning normally.
Step 3: Recovery determination - Based on the assessment, we determine whether a disk replacement, RAID rebuild, or other recovery procedure is required. We then inform you of our findings and wait for your confirmation before proceeding. This gives you time to back up data or perform any other actions if needed.
Step 4: Maintenance coordination - If work requires downtime, we coordinate a maintenance window with you to minimize disruption.
Step 5: Implementation and reporting - Depending on the issue, our administrators resolve it remotely, or our data center technicians perform physical hardware replacement. After completion, we provide you with a detailed report of the actions taken and the results.
This approach ensures that your issue is handled predictably and transparently, and you always know what is happening with your server.
Disk Replacement Procedures
When a disk requires replacement, our data center technicians perform the work. Most INTROSERV servers support hot-swap disk replacement, which allows the disk to be replaced without shutting down your server. If the replacement requires shutting down the server, we will coordinate a suitable maintenance time with you.
After a replacement, the RAID array must rebuild. The rebuild duration depends on disk size and RAID configuration. During the rebuild process, your server remains operational, but performance may fluctuate. We recommend avoiding heavy workloads during this time unless absolutely necessary.
Recovery Procedures for Severe RAID Failures
If a RAID array cannot be rebuilt, our engineers perform a detailed assessment of all disks to determine the best recovery approach. Depending on your hardware condition and server configuration, we may:
- Attempt partial data recovery - Retrieve accessible data from undamaged disk sections.
- Prepare replacement infrastructure - Set up a new server or storage environment and assist with data migration.
- Restore from backups - If you have INTROSERV backup services enabled, we can restore your data from your backup storage.
Our engineers will discuss the best approach with you based on your specific situation.
Filesystem Repair
If your disks and RAID system are functioning but the filesystem has become corrupted, diagnostic tools such as fsck (filesystem check) may help restore access. We recommend the following approach:
- Use single-user mode - Run filesystem repair in single-user mode to minimize the risk of further damage from concurrent system activity.
- Follow our guidance carefully - Incorrect parameters or repeated repair attempts can cause additional data damage. Our support team will provide step-by-step instructions if you choose to perform this yourself, or we can perform it for you.
- Contact support for complex cases - If standard repair tools do not resolve the issue, contact support for assistance.
Data Protection Through Backups
Backups are the most effective protection against permanent data loss. INTROSERV provides comprehensive backup services that automatically save your data to separate infrastructure, completely independent from your main server hardware. This separation means that disk failures on your primary server do not affect your backup copies.
How to use INTROSERV backups:
- Order and enable backup services through the Client Area
- Configure automatic backup schedules based on your needs
- Restore data directly through the Client Area, or request assistance from our support team
If you do not currently have backups enabled, we strongly recommend enabling them. This provides the best protection for your critical data.
Summary
INTROSERV provides the infrastructure, tools, and expert support needed to keep disk-related risks under control. Our combination of enterprise hardware, rapid disk replacement capabilities, and clear recovery procedures helps minimize downtime and protect your data. When issues occur, our engineering team is available 24/7 to assist you. Paired with INTROSERV backup services, this approach provides comprehensive protection for your critical information.