Linux Recovery: Cannot SSH to Linux VM due to file system errors (fsck, inodes)

When a Linux VM requires fsck to repair possible file system issues, manual intervention will be required. Below you can see four examples on how to identify file system issues by looking at the boot diagnostics on a given VM under:
Virtual Machines > VMNAME >  All settings > Boot diagnostics

Example (1)

Checking all file systems.

[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1

/dev/sda1 contains a file system with errors, check forced .

/dev/sda1: Inodes that were part of a corrupted orphan linked list found.

/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY

Example (2)

 EXT4-fs (sda1): INFO: recovery required on readonly filesystem

EXT4-fs (sda1): write access will be enabled during recovery

EXT4-fs warning (device sda1): ext4_clear_journal_err:4531: Filesystem error recorded from previous mount: IO failure

EXT4-fs warning (device sda1): ext4_clear_journal_err:4532: Making fs in need of filesystem check .

Example (3)

[   14.252404] EXT4-fs (sda1): Couldn't remount RDWR because of unprocessed orphan inode list.  Please unmount/remount instead

 An error occurred while mounting /.

Example (4) - This one in particular is the result of a clean fsck. In this specific case there is also additional data disks attached to the VM (/dev/sdc1 and /dev/sde1)

Checking all file systems.

[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda1

/dev/sda1: clean, 65405/1905008 files, 732749/7608064 blocks

[/sbin/fsck.ext4 (1) -- /tmp] fsck.ext4 -a /dev/sdc1

[/sbin/fsck.ext4 (2) -- /backup] fsck.ext4 -a /dev/sde1

/dev/sdc1: clean, 12/1048576 files, 109842/4192957 blocks

/dev/sde1 : clean, 51/67043328 files, 4259482/268173037 blocks

To recover the VM back to a normal state you will need to delete the inaccessible VM and keep its OSDisk and deploy a new recovery VM using the same Linux distribution and version as the inaccessible VM.

NOTE: We highly recommend making a backup of the VHD from the inaccessible VM before going through the steps for the recovery process, you can make a backup of the VHD by using Microsoft Storage Explorer, available at https://storageexplorer.com

The steps are described below:

A = Original VM (Inaccessible VM)
B = New VM (New Recovery VM)

  1. Stop VM  A via Azure Portal
  2. For Resource Manager VM, we recommend to save the current VM information before deleting
    • Azure CLI:                  azure vm show ResourceGroupName LinuxVmName > ORIGINAL_VM.txt
    • Azure PowerShell:     Get-AzureRmVM -ResourceGroupName $rgName -Name $vmName
  3. Delete VM A BUT select “keep the attached disks
    NOTE: The option to keep the attached disks is only available for classic deployments, for Resource Manager deleting a VM will always keep its OSDisk by default.
  4. Once the lease is cleared, attach the Data Disk from A to VM B via the Azure Portal, Virtual Machines, Select “B”, Attach Disk
  5. On VM “B” eventually the disk will attach and you can then mount it.
  6. Locate the drive name to mount, on VM “B” look in relevant log file note each Linux is slightly different.
    • grep SCSI /var/log/kern.log  (ubuntu, debian)
    • grep SCSI /var/log/messages  (centos, suse, oracle, redhat)
  7. You will not be able to mount the file system so we have to check the correct file system that we need to run the disk check.
    • sudo -i

    • fdisk -l (this will return the attached disks, use it with also df -h)Sample outputs from both commands:
      # fdisk -l
      Disk /dev/sdc: 32.2 GB, 32212254720 bytes
      255 heads, 63 sectors/track, 3916 cylinders
      Units = cylinders of 16065 * 512 = 8225280 bytes
      Sector size (logical/physical): 512 bytes / 512 bytes
      I/O size (minimum/optimal): 512 bytes / 512 bytes
      Disk identifier: 0x000c23d3
      Device Boot      Start         End      Blocks   Id  System
      /dev/sdc1   *           1        3789    30432256   83  Linux
      /dev/sdc2            3789        3917     1024000   82  Linux swap / Solaris# df -hFilesystem      Size  Used Avail Use% Mounted on
      /dev/sda1        29G  2.2G   25G   9% /
      tmpfs           776M     0  776M   0% /dev/shm
      /dev/sdb1        69G  180M   66G   1% /mnt/resourceAfter looking at the output of the above commands we can see that sda1 and sdb1 are mounted as part of the local OS, sdc1 is not mounted so, in this case, we will run fsck against /dev/sdc1.NOTE: Prior to running fsck - Please capture the following data and send Microsoft Support Engineer the *.log files (sdc and sdc1 are used as examples)fdisk -l /dev/sdc > /var/tmp/fdisk_before.log
      dumpe2fs /dev/sdc1 > /var/tmp/dumpe2fs_before.log
      tune2fs -l /dev/sdc1 > /var/tmp/tune2fs_before.log
      e2fsck -n /dev/sdc1 > /var/tmp/e2fsck._beforelog

      Now proceed to run fsck on the desired partition

    • fsck -yM /dev/sdc1
      fsck from util-linux-ng 2.17.2
      e2fsck 1.41.12 (17-May-2010)
      /dev/sdc1: clean, 57029/1905008 files, 672768/7608064 blocks

  1. Detach the disk from VM B via the Azure portal
  2. Recreate the original VM A from the repaired VHD

For a Classic VM:

Recreate the original VM A (Create VM from Gallery, Select My Disks) you will see the Disk referring to VM A – Select the original Cloud Service name.

For a Resource Manager VM you will need to use either Powershell or Azure CLI tools, the articles below have steps to recreate a VM from its original VHD:

Azure PowerShell: How to delete and re-deploy a VM from VHD
Azure CLI: How to delete and re-deploy a VM from VHD