Monday 3 August 2015

The 497 Day Uptime Bug

Issue

  1. All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup. Therefore, TCP/IP ports may be exhausted, and new TCP/IP sessions may not be created. All the TCP/IP ports that are in a TIME_WAIT status are not closed after 497 days from system startup. Therefore, TCP/IP ports may be exhausted, and new TCP/IP sessions may not be created.
  2. TCP/IP chimney offloading fails after 248.5 days. Therefore, systems will stop responding after 248.5 days if offloading connections are being used.

Resolution

  • Schedule more frequent server reboots
  • Apply a hotfix

Background

The reason that 497  is a problem number is because of the use of a 32 bit counter to record uptime.   If you record a tick for every 10 msec of uptime, then a 32-bit counter will overflow after approximately 497.1 days.  This is because a 32 bit counter equates to 2^32, which can count 4,294,967,296 ticks.  Because a tick is counted every 10 msec, we create 8,640,000 ticks per day (100*60*60*24).  So after 497.102696 days, the counter will overflow.

Some systems have a problem at 248.551348 days (half of that) if they use a signed 32 bit integer to store the value (one less bit to work with).

Note that this bug does not only affect Microsoft products. Other vendors to be affected by this bug include: Avaya, Brocade, Cisco, EMC, QLogic and VAX/VMS.

Links