Received an e-mail alert for one of my SQL Servers:
SQL Server Alert System: ‘Error Number 825’ occurred on < my SQL Server >
DESCRIPTION: A read of the file ‘ < my SQL Server Database file > ‘ at offset 0x00003514ec6000 succeeded after failing 3 time(s) with error: 1167(The device is not connected.). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
- Viewed the SQL Server Error Log, and found several entries for other databases within the same minute, that started like:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file …
- Ran a DBCC CHECKDB on the database reported in the e-mail alert (and all other databases in the instance). The CHECKDB returned with out find errors:
CHECKDB found 0 allocation errors and 0 consistency errors in database …
- Asked the Server Administrator to look at the Operating System, and he reported back that in the Windows Disk Manager, the disks were labeled “(At Risk)“.
It took the Storage Administrator to find the actual problem. The disks were on a SAN (Storage Area Network), which all reported healthy. That meant my databases’ data was healthy, which explains why the CHECKDBs were successful.
The Storage Administrator did find that at the same time SQL Server was reporting the errors, the storage switch supporting that SAN reported it was overloaded, and had written errors to its own error log. To resolve, we had to open a case with the vendor for that storage switch.