Linux Recovery: Cannot SSH to Linux VM due to file system errors (fsck, inodes)

Article
09/13/2016

When a Linux VM requires fsck to repair possible file system issues, manual intervention will be required. Below you can see four examples on how to identify file system issues by looking at the boot diagnostics on a given VM under:
Virtual Machines > VMNAME > All settings > Boot diagnostics

Example (1)

Checking all file systems.

[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1

/dev/sda1 contains a file system with errors, check forced .

/dev/sda1: Inodes that were part of a corrupted orphan linked list found.

/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY

Example (2)

EXT4-fs (sda1): INFO: recovery required on readonly filesystem

EXT4-fs (sda1): write access will be enabled during recovery

EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure

EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Making fs in need of filesystem check .

Example (3)

[ 14.252404] EXT4-fs (sda1): Couldn't remount RDWR because of unprocessed orphan inode list. Please unmount/remount instead

An error occurred while mounting /.

Example (4) - This one in particular is the result of a clean fsck. In this specific case there is also additional data disks attached to the VM (/dev/sdc1 and /dev/sde1)

Checking all file systems.

[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1

/dev/sda1: clean, 65405/1905008 files, 732749/7608064 blocks

[/sbin/fsck.ext4 (1) -- /tmp] fsck.ext4 -a /dev/sdc1

[/sbin/fsck.ext4 (2) -- /backup] fsck.ext4 -a /dev/sde1

/dev/sdc1: clean, 12/1048576 files, 109842/4192957 blocks

/dev/sde1 : clean, 51/67043328 files, 4259482/268173037 blocks

To recover the VM back to a normal state you will need to delete the inaccessible VM and keep its OSDisk and deploy a new recovery VM using the same Linux distribution and version as the inaccessible VM.

NOTE: We highly recommend making a backup of the VHD from the inaccessible VM before going through the steps for the recovery process, you can make a backup of the VHD by using Microsoft Storage Explorer, available at https://storageexplorer.com

The steps are described below:

A = Original VM (Inaccessible VM)
B = New VM (New Recovery VM)

Stop VM A via Azure Portal
For Resource Manager VM, we recommend to save the current VM information before deleting
- Azure CLI: azure vm show ResourceGroupName LinuxVmName > ORIGINAL_VM.txt
- Azure PowerShell: Get-AzureRmVM -ResourceGroupName $rgName -Name $vmName
Delete VM A BUT select “keep the attached disks”
NOTE: The option to keep the attached disks is only available for classic deployments, for Resource Manager deleting a VM will always keep its OSDisk by default.
Once the lease is cleared, attach the Data Disk from A to VM B via the Azure Portal, Virtual Machines, Select “B”, Attach Disk
On VM “B” eventually the disk will attach and you can then mount it.
Locate the drive name to mount, on VM “B” look in relevant log file note each Linux is slightly different.
- grep SCSI /var/log/kern.log (ubuntu, debian)
- grep SCSI /var/log/messages (centos, suse, oracle, redhat)
You will not be able to mount the file system so we have to check the correct file system that we need to run the disk check.
- sudo -i
- fdisk -l (this will return the attached disks, use it with also df -h)Sample outputs from both commands:
  # fdisk -l
  Disk /dev/sdc: 32.2 GB, 32212254720 bytes
  255 heads, 63 sectors/track, 3916 cylinders
  Units = cylinders of 16065 * 512 = 8225280 bytes
  Sector size (logical/physical): 512 bytes / 512 bytes
  I/O size (minimum/optimal): 512 bytes / 512 bytes
  Disk identifier: 0x000c23d3
  Device Boot      Start         End      Blocks   Id System
  /dev/sdc1   *           1        3789    30432256   83 Linux
  /dev/sdc2            3789        3917     1024000   82 Linux swap / Solaris# df -hFilesystem      Size Used Avail Use% Mounted on
  /dev/sda1        29G 2.2G   25G   9% /
  tmpfs           776M     0 776M   0% /dev/shm
  /dev/sdb1        69G 180M   66G   1% /mnt/resourceAfter looking at the output of the above commands we can see that sda1 and sdb1 are mounted as part of the local OS, sdc1 is not mounted so, in this case, we will run fsck against /dev/sdc1.NOTE: Prior to running fsck - Please capture the following data and send Microsoft Support Engineer the *.log files (sdc and sdc1 are used as examples)fdisk -l /dev/sdc > /var/tmp/fdisk_before.log
  dumpe2fs /dev/sdc1 > /var/tmp/dumpe2fs_before.log
  tune2fs -l /dev/sdc1 > /var/tmp/tune2fs_before.log
  e2fsck -n /dev/sdc1 > /var/tmp/e2fsck._beforelog
  
  Now proceed to run fsck on the desired partition
- fsck -yM /dev/sdc1
  fsck from util-linux-ng 2.17.2
  e2fsck 1.41.12 (17-May-2010)
  /dev/sdc1: clean, 57029/1905008 files, 672768/7608064 blocks

Detach the disk from VM B via the Azure portal
Recreate the original VM A from the repaired VHD

For a Classic VM:

Recreate the original VM A (Create VM from Gallery, Select My Disks) you will see the Disk referring to VM A – Select the original Cloud Service name.

For a Resource Manager VM you will need to use either Powershell or Azure CLI tools, the articles below have steps to recreate a VM from its original VHD:

Azure PowerShell: How to delete and re-deploy a VM from VHD
Azure CLI: How to delete and re-deploy a VM from VHD

Linux Recovery: Cannot SSH to Linux VM due to file system errors (fsck, inodes)

Additional resources