SpreadsheetML Dates


Recently a couple people have questioned the decision to leave the legacy Excel date behavior in the Open XML formats. This was primarily triggered by a post from IBM’s Rob Wier. While those folks have referred to it as an Excel bug, it’s actually something that was purposely designed into Excel since the beginning. Whenever you work on a feature, you eventually get to a point where you have to make difficult decisions and there are positives and negatives on each side. Let’s go though a couple points here around this date issue: (1) how the dates are stored; (2) history of the bug; (3) backwards compatibility concerns; (4) drawbacks.


How dates are stored


Let’s first look at the basic issue of how dates are stored in file formats.


When it comes to the storage of dates, there are a number of different approaches you could take. The key issue to first decide is whether you want to store it in a presentation friendly format, or one designed for easy storage and processing. For example, you could just store it as it’s displayed to the user (ie “10/23/2006” or “October 23, 2006”), or maybe using ISO 8601. This could actually be a fine approach for certain types of formats (like a wordprocessing application, or maybe for metadata), but in the case of a spreadsheet that’s probably not the best way to go. In order to quickly consume and create the files, you should probably choose one simple consistent storage method as you may be dealing with hundreds of thousands of dates in one file and you don’t want to have to deal with any complicated parsing.


In the case of spreadsheetML, where you have the possibility of hundreds of thousands (if not millions) of dates existing in a single workbook, you need to think carefully about how those dates are going to be used. You also need to think about the fact that there may be formulas that take those dates into account. For example, you might have a table like this:





























Project Number


Start Date


End Date


Daily Cost


Total Cost


1


9/13/2006


9/27/2006


$5,000.00


$70,000


2


9/20/2006


10/3/2006


$1,000.00


$13,000


The “total cost” column will have a function that is essentially:


=([end date] – [start date]) * [daily cost]


The simplest way to store these dates (both internally and in the formats) is to just pick a specific date as the base date and then every other date is just an index from there. This is similar to the Julian date system. So, if you decided that 9/13/2006 was the base date, then the data stored in the table above would be:





























Project Number


Start Date


End Date


Daily Cost


Total Cost


1


1


15


$5,000.00


$70,000


2


8


21


$1,000.00


$13,000


This makes the storage and parsing of the dates a lot easier, and it also makes any formulas that work with dates easy to work with. The dates are always just stored as an index, and then it’s just a matter of formatting to determine how the date is presented to the user. So the following dates (9/13/2006; Sept. 13 2006; Wednesday, September 13, 2006) are all the exact same values, they are just displayed to the user differently. This approach also has the advantage of working with other calendar systems since the storage of the date doesn’t account for months, years, or days of the week, but is instead just a counter of how many days from a given date have passed. Then any unit of measurement smaller than a day (hours, minutes, seconds, milliseconds, etc.) are just a fraction. So 1 second would be 1.157e-5


History of the date “bug”


It’s because of this “index” approach that the bug in question presents itself. Excel uses January 1, 1900 as the base date, meaning January 1, 1900 is stored as “1”. The problem is that 1900 was not a leap year because while it passes the first test of being divisible by 4, it doesn’t pass the second test which is that it must a multiple of 100 that is not divisible by 400. 2000 actually was a leap year, but 2100 will not be. Rob Wier’s blog post has some great information on the history of this behavior in our modern calendar system.


Unfortunately this bug occurs because, February 29, 1900 is actually treated as a date when the indexes are calculated. This means that every date after February 29, 1900 is off by one (or you could also say that the base date was supposed to be 12/31/1899 and every day before March 1, 1900 is off by one). This is only the case for the year 1900 though. 2100 and on are treated properly. Why did this happen in the first place? It’s actually covered pretty well in the following Microsoft KB article:


When Lotus 1-2-3 was first released, the program assumed that the year 1900 was a leap year, even though it actually was not a leap year. This made it easier for the program to handle leap years and caused no harm to almost all date calculations in Lotus 1-2-3.

When Microsoft Multiplan and Microsoft Excel were released, they also assumed that 1900 was a leap year. This assumption allowed Microsoft Multiplan and Microsoft Excel to use the same serial date system used by Lotus 1-2-3 and provide greater compatibility with Lotus 1-2-3. Treating 1900 as a leap year also made it easier for users to move worksheets from one program to the other.

Although it is technically possible to correct this behavior so that current versions of Microsoft Excel do not assume that 1900 is a leap year, the disadvantages of doing so outweigh the advantages.

If this behavior were to be corrected, many problems would arise, including the following:



  • Almost all dates in current Microsoft Excel worksheets and other documents would be decreased by one day. Correcting this shift would take considerable time and effort, especially in formulas that use dates.

  • Some functions, such as the WEEKDAY function, would return different values; this might cause formulas in worksheets to work incorrectly.

  • Correcting this behavior would break serial date compatibility between Microsoft Excel and other programs that use dates.

If the behavior remains uncorrected, only one problem occurs:



  • The WEEKDAY function returns incorrect values for dates before March 1, 1900. Because most users do not use dates before March 1, 1900, this problem is rare.

NOTE: Microsoft Excel correctly handles all other leap years, including century years that are not leap years (for example, 2100). Only the year 1900 is incorrectly handled.


Backward Compatibility


While it’s no longer as important to maintain compatibility with Lotus 1-2-3, it is important to maintain compatibility with the billions of existing spreadsheet files out there. Barclay’s Capital was one of the members of the Ecma TC45, and they helped make it crystal clear how important this backwards compatibility was. Imagine how upset our customers would be if we changed the values returned from functions like =WEEKDAY.


I think that one of the reasons some folks think this is just a “stupid decision” made by TC45 is that they aren’t looking at all the places where the behavior of a spreadsheet could be affected. Remember, we don’t just want the date values to be interoperable, but spreadsheet functions have to be as well. We spent a lot of time documenting how every single spreadsheet function would work so any application can come along and properly consume and generate SpreadsheetML files.


If we decided to fix this bug and shift each date value down by one, how many spreadsheet formulas out there would we break? Here’s a really simple example, where the following function that had worked in previous versions would no longer work:


=IF(TODAY()=39013, “Due Today!”, “Not Due Today!”)


I will admit that the above function is probably not the best approach for comparing dates, but it’s something that’s always worked. Believe me, I’ve seen much crazier stuff in spreadsheets. We can’t control how our customers use the product, and if something like this worked before, we can’t break it now. If we changed our behavior so that 39013 no longer represented 10/23/06 and instead represented 10/24/06, that would completely break the function above.


It’s true that we could probably write a solution that parses through all functions out there and tries to determine if the number in the function is supposed to represent a date, but that would be seriously buggy, and to be honest it wouldn’t be worth it.


Drawbacks?


One way I like to think about this whole issue is to look at this the other way. Let’s instead say that the base date system for SpreadsheetML was supposed to start on December 31, 1899. There’s just a bug where dates between then and March 1, 1900 are off by one. So while we have this odd behavior for the first two months of the last century, everything after March 1, 1900 is stored correctly. I’m actually only half kidding here… since the date is just stored as in index, there is no real problem caused by this bug, other than maybe looking a bit foolish. In fact, OpenOffice has a behavior vary similar to this. In OpenOffice and Microsoft Excel, the numerical value for 9/13/2006 is equivalent. The numberical values for dates map all the way back to March 1, 1900; and then from that point back they are off by one (so in OpenOffice, 12/31/1899 maps to “1”). So the key thing to notice is that this behavior is fully documented, and there is no reason it should prevent anyone from properly consuming and generating valid SpreadsheetML files.


As I said in some earlier comments, this did come up in the TC45 meetings fairly early on. We didn’t spend too much time on it though as it was pretty obvious that while it looked a bit silly, there was no harm. And we all recognized that if we did try to change this, it could seriously change the meaning of existing files.


Remember, this format was not an attempt at building the ultimate generic Office application file format from the ground up. If we did that, and didn’t take our customers existing documents into account, then none of our customers would use the format. I think at times people misunderstand the design goals behind the Office Open XML formats, and I’m sorry if I haven’t been clear enough on that. The Office Open XML formats were definitely designed with the existing base of legacy Microsoft Office file formats in mind, and it’s important to remember that when you look at the format. There are definitely designs in the Open XML formats that clearly show their origins. A key focus of TC45 though was to make sure those behaviors were completely implementable on any other platform. This is covered in more detail in the TC 45 whitepaper.


We not only wanted to create an open format that folks could build solutions on top of, but we wanted the format to be something that our customers would actually use… otherwise what’s the point? We didn’t want this to just be another optional format that only some people would use, it’s the new default format and we hope that all of our customers will use it.


-Brian

Comments (31)

  1. Brian –

    I’m sorry, but you really seem to be missing the fact that changing from an internal binary representation to an external XML representation means that you don’t have the problem you think you have.  There is NO NEED to change the internal storage of dates.  No need to fix the bug/feature/legacy kludge.  None.  All you need to do is convert explicit dates in the XML to the correct (in Excel’s worldview) internal representation.  In the case of the formula, the numeric representation of the date is indicative of an internal value, so the formula will keep working in Excel without trouble.  Granted, if another vendor implements Open XML and tries to use the formula, it would be off unless they dealt with the numeric change, but that is a bit like those people who want to squeeze in OS specific functionality and complain when it breaks on a new OS.  The standard doesn’t have to support that bug just because your internal representation does.  You can even explicitly specify that use of internal numbers may vary by implementation are are discouraged.  Then, no existing applications will break, and nobody who has used dates the way they should will ever have a problem even if they switch to a new vendor.   From your point of view, you have complete fidelity without changing the code.

    – Ben

  2. Ben, what’s the point of changing the storage of the dates if it completely breaks the interoperability of formulas. It’s extremely important that formulas can work across multiple applications. One of the key reasons people use spreadsheets is for the ability to put a bunch of data into a grid and then perform calculations on that data. If you take that spreadsheet to another application (like OpenOffice) adn the values of basic date functions are different, then you have a serious problem.

    Do you really feel it would have been worth breaking the portability of formulas just to fix this minor date issue? The date issue doesn’t break interoperability in any way. Your suggestion around formulas would completely break it.

    -Brian

  3. sinleeh says:

    Can you elaborate more on the formula problem by giving an example?

    Coz I cannot see it as a problem Taking the formula "number_of_days = end_day – start_day". If I put end_day and start_day in a cell formula, it is simply a matter of the application translating to-and-from 1990-is-leap-year calendar when converting from diskfile, i.e., XML to application’s internal representation.  This is the job of the application developer.

    If I use "number_of_days = end_day – 12/10/3", i.e. date constant in formula, the formula will still be interpreted correctly.

    Unless of course you have idiots that put "number_of_days = end_days – 12442", where 12442 is the actual date index. In this case, it is rightly that the formula fails and the formula writer must take all the blame for writing idiotic date-based formula. In this case, I will also argued that the person had an assumption that the incorrect date index will not hurt him later, his fault. Thus, the arguement on why make everyone pay for his idiotic behaviour?

  4. Sinleeh,

    I wish it were as simple as you and Ben suggest, but it’s not. We can’t just tell our customers that they are idiots.

    Especially when that are doing something that has been supported since the first version of Excel shipped. I’m sorry but that’s just not an option.

    Again, this format is designed to fully support the existing base of binary documents out there. It’s not a format that’s designed to be the format to end all other formats. It’s fully documented so that anyone can use it, there is nothing that depends on a particular operating system or office application. Remember though that it’s an open standard that was designed to be fully compatible with the existing set of extremely valuable documents. If you build the ultimate general file format and no body uses it, what’s the point? Our customers would never use the formats if they broke existing formulas.

    There is no way we can predict what people are doing in their files and with their formulas. If you take the date 12/10/2004 in both Excel and in OpenOffice and you format that date as a number, you get “38331”. So are you suggesting we should change this so that in the new file formats 12/10/2004 is now equal to 38330 instead?

    The only inconsistency comes into play for 2 months (from 1/1/1900 to 3/1/1900). It sounds like you and Ben are suggesting that that inconsistency (which is super easy to workaround) is bad enough to actually cause real pain to customers either by breaking their existing formulas, or even worse, by making the standard treat dates in formulas differently that dates in cells. I’m sorry, but I just don’t see how that makes things better in any way.

    I understand the desire for everything to be perfect, but show me one piece of software or document format that exists today that’s perfect. Backwards compatibility with legacy solutions is important to us (but it means there are tradeoffs we had to make in the file format design). If you don’t care about that, then feel free to use another format. We aren’t the ones out there trying to mandate our format as the only option.

    BTW…
    In the SpreadsheetML format we first introduced back in Office XP (we started working on that about 8 years ago), we actually used the syntax that Rob Wier is suggesting. This is what a date would look like:
    <Cell ss:StyleID=”s62″><Data ss:Type=”DateTime”>2005-12-13T00:00:00.000</Data></Cell>

    The result of this though was that we found we kept running into problems with existing user functions (as I’ve described above), and the performance hit of reading and writing those dates wasn’t worth it. So as I said before, this is something we thought hard about and made a decision we felt was right for our customers. We then reviewed it with the Ecma TC45 and they came to the same conclusion.

    -Brian

  5. Brian,

    Please read back over our comments.  Neither of us is suggesting what you are suggesting.  We are both saying that you keep 12/10/2004 meaning 38331, if that is what it means now.  The internal representation doesn’t matter and should not be part of the spec, except perhaps to note that that is how you use it internally.  Since in XML you will represent a date as a date, and since even formulas that include an internal number will continue to work, you have no backwards compatibility problems.  Where is the problem?  You keep suggesting that you would change the representation.  At best, I guess you could say that you are doing what you suggested earlier and say 1 is equal to 12/31/1899, but since the internal number should not be used in the XML standard itself at all, even that is barely going to matter.  The reason to use dates as dates, not numbers, is to avoid the internal representation issues.

    Sp, if you leave the internal representation the same and assume 1 is 12/31/1899, then the only problems you introduce are people who used dates as numbers, which is weird and probably not ever officially supported, and even then only for 1900, since all others will work.  That is well worth doing to avoid adding a bug into a standard, no matter how "small" an issue you think that is.  

    – Ben

  6. sinleeh says:

    Dear Brian,

    Ben got it right when he says we both want you to keep 12/10/2004 as 38331 and this is of course your choice to keep it as 38331. It is your internal representation. Your choice. If you judge it too important to change, then don’t do it.

    It is also the standard committee’s  choice to decide to keep this "2 months" bug. However, I think you both missed an opportunity to correct this bug. This decision lets it drag on for a few more years.

    At present, it certainly looks and behave like a small problem. Unfortunately, by not correcting it, it just snowballed into a bigger problem. Everyone who writes/maintains applications to read SpreadsheetML must be aware of and take note of this, even if their application is not interested in supporting old binary formats.

    IMHO, if not canned soon, it will keep snowballing slowly until we end up with a big problem. I cannot speak for Barclays Capital but I do believe that opinion is short-sighted. And that will turn out to be false reassurance for this decision for SpreadsheetML.

    I agree it is not correct to publicly call one’s customer idiots. However, just between us, anyone who does date as index number is deserve it, unless he can justify it.

  7. orcmid says:

    I went through a similar analysis as yours essentially from scratch, not knowing the history and the TC45 background.

    I first thought that Rob Weir’s solution should work.  Then I realized there’s no automated way to tell which formulas in a spreadsheet are manipulating numbers as representations of dates since dates are just represented as numbers in OOX, and they are only *presented* as dates based on cell format, something formulas are indifferent to.   Furthermore, the computation might depend on representation by the 1900 base system with its too-high-by-one after Februay 28, 1900 feature, the alternative 1904 base system that is also part of OOX, or some scheme of the spreadsheet-author’s choosing.  

    Having OOX specify that 1900-base values for dates prior to March 1, 1900 are undefined would remove the glitch from interchange and even has WEEKDAY work correctly for all defined dates.   But the cost to interchange and preservation of existing uses of those early dates is a matter for the TC45’s wisdom to resolve, and I’ll throw in the towel at that point.  It’s a practical matter now, not one of esthetic preference, and I have no skin in that game.

    I am fond of Bob Bemer’s observation that "standards are arbitrary solutions to recurring problems."  The OOX approach definitely satisfies that condition with all ties given to preservation of legacy and current documents.  

    It’s important to appreciate that there is no way to automatically preserve legacy spreadsheets by anything that changes to a different base or an alternative system, like ISO 8601, as the interchanged representation.  That is because it is inherent in the "dates are just numbers" approach that you can never be sure when a formula on numbers depends on that number actually being a particular-base representation of a date.  

    This a practical, real demonstration of why it is not possible to have a universal document format that can represent all other document formats.  When models are different, they are different, and they might not be inter-convertible by inspection.  For example, the date and time-interval system in ODF uses a model that (because of its stronger typing of dates) is probably convertible to SpreadsheetML, but the reverse will be an interesting challenge, let us say.  It’s also why, as my little tests confirm, OpenOffice.org allows the same system codified in OOX.  That is the only way to safely accomplish reliable import-export with Excel, a matter of some importance for that product and any others for which portability with Excel is promised.   Pragmatism trumps ideology and programmer esthetics.

    I am sure there are some who will drag out the argument that adjusting the date base doesn’t matter to 99% of spreadsheets (you pick the percentage you like) and movement to another format and document model (OpenOffice, Dan Bricklin’s web spreadsheet, etc.) should be wonderful.  Well, developers of new systems are welcome to make a bet like that.  I completely understand why Microsoft dare not.

    Thanks for providing the full background on this fascinating subject.

  8. sinleeh says:

    Dear Dennis,

    If a system can choose between 1900-is-leap-year-based dates and 1904-based dates, then shouldn’t the same system be capable of handling any date system, including, 1900-not-leap-year-based dates?

    When faced with two date systems, most programmers will adopt one as the internal date system, and convert the other date system to another. If this is possible, then any date system is not a problem.

    Sometimes date system is tied down to hardware. For example,  the 1904-based system, is tied down to old Mac. As such, date conversions is not rocket science.

    "Pragmatism trumps ideology and programmer esthetics"

    True, but we are not talking ideology nor esthetics here. Having two months behaving strangely is a time-bomb waiting to explode. I treat them as hidden bug.

  9. hAl says:

    As I understand it any number used in a spreadsheet might or might not be a date. By converting documents to another format there is no way to tell is a number represents a number or a date (that is up to the customer who created the spreadsheet) so you cannot convert it to the new format.

    Actually we also have a legacy application using a numbered dateformat. I just checked and we do indeed put those numbers as they are in a spreadsheet.

    They list like:

    144001

    144002

    and so on

    Some weirdo has made us a spreadsheet function years ago to return that into a date so we aren’t excel optimally there but we could have used an easy transversion to an excel date number by simply subtracting 100000 or so.

    If we would have done that it would have been nearly impossible to notice that the original table is really the representing datenumbers. It could be wierd if we created a file in excel 2003 for years by subtracting 100602 days from the date column and when we changed over to excel 2007 we would have to subtract 100601 days especially if we didn’t know beforehand.

  10. Sinleeh,

    Dennis and hAl are correct here that any attempt to move from one system to another is extremely problematic in term of functions, conditional formatting, etc. They’ve done a much better job that I have in describing why this isn’t as simple as some folks assume.

    The 1904 issue was something we have already had to deal with and it’s very problematic. The last thing we want to do is insert a third date system that will be that would cause problems with all existing documents.

    This problem with dates is only limited to two months at the beginning of the year 1900. It doesn’t affect any other years going forward, so from there on out everything works fine. This hasn’t really come up as a big issue in the past 20+ years of Excel’s existence, so I don’t really see it ever snowballing into a bigger problem as you suggest. Especially since the exact behavior is so fully and clearly documented.

    -Brian

  11. orcmid says:

    Sinleeh,

    hAl and Brian have stated this again in enough ways that I don’t need to respond except for three important things:

    1. If a spreadsheet is flagged as using the 1904 system, it *can’t* be changed to the 1900-base system internally, even tough the 1904 base date range fits entirely inside the 1900-base range.  The only correct way to "deal with " the two systems (the phrase is from the TC45 Final Draft), is to actually implement both systems and use the specified one for all date-related functions and date-format cell preservations for the entire workbook.  Now you can see the problem with adding a third system to the dates-are-just-numbers model.  (I’ll speculate that it *might* work with introduction of genuine and time-interval datatypes, and that might be necessary for interchange with ODF and expansion of supported date cases in the future.)

    2. The serial date system is *not* just an internal representation.  It is the interchange representation.  And it is that because there is no difference between date cells as cells and numeric cells, as cells.  They are all numeric cells.  That is part of the model and it has been the model since the first Excel spreadsheet and the interchange of Excel spreadsheets among users, versions, and platforms (Mac and Windows).  Users know that and, as others have testified, users are at complete liberty to rely on that fact as *specified* *behavior*.  OOX basically ratifies that expectation, as it must.  What might happen from here on out is different, now that the legacy-preservation and interchange case has been handled (not counting bugs and ambiguites that will be found and expunged as the OOX specification is put into use in interchange).

    3. Now, if I was designing a spreadsheet system today, and I didn’t care about interchange with Excel, I might do something different, like the ODF approach to dates.  Even then, a calendar-neutral serial date might be very valuable in interchange, so long as the base date is known somehow.  The trade-off depends a lot on whether there is a dominant use case, and I don’t know what experience there is to draw upon.   (Readability is a great abstract requirement, and it would certainly aid in troubleshooting, so long as the date form recorded was one familiar to the troubleshooter, but I wonder if that can trump all of the usages where noone ever looks at the XML.)

  12. Rob says:

    So why does WEEKDAY() need to give the wrong answer for dates in 1900 prior to 1 March?  This looks like an independent error, or at least an error which could have been corrected without changing the date origin.

  13. orcmid says:

    PS: The 1900-base and 1904-base distinction is very real.  All of the date-related functions in the TC45 OOX Final Draft have two definitions: One for when the worksheet is using the default 1900-base scheme, another for the 1904-base scheme.  

    As Brian says, that is definitely problematic.  Some user who has a clever formula for converting to, say, lunar calendars (or, I dunno, sidereal time) by working on the date numbers will be surprised that the formula doesn’t work the same when added to different workbooks, until the 1900/1904 difference is understood.

    Last year I speculated that we were going to learn a great deal about interoperability and interchange of data formats as the Office Open XML specification became available.  The real lessons are coming before us.  

  14. orcmid says:

    Rob,

    I wondered about that too.  I suspect because that is its behavior already.  I guess they could have added a corrected weekday function, with a new name, but the legacy problem is what it is.

    1. I don’t know what one decides what to do about February 29, 1900.  Is it the same weekday as February 28 or as March 1?  I wonder what choices have been made by those who have made such adjustments, or do they know "February 29" is not in their data?  It seems the specification can’t assume that.

    2. I also don’t know how one detects an existing spreadsheet that actually deals with dates in that era, but corrects for the WEEKDAY error in its logic.  If WEEKDAY gets fixed, the user’s logic gets broken.  

    But I’m guessing.  It looks like the thrust was to "codify what there is" so it is absolutely clear that the legacy is covered and now fully specified moving ahead.  Then, after all the smoke clears, we can see where it matters most to raise the level of abstraction in the interchange format.  I expect that will be done very carefully, over an extended time.

  15. Dennis,

    You’re right that the approach for the =WEEKDAY issue would be to introduce a new weekday function (WEEKDAY2 or something?) that provides the proper weekday value for those first 60 days in 1900.

    You can’t change the behavior of the current WEEKDAY function because (as you pointed out) there’s no telling how many people have already worked around the issue by adding a small bit of additional logic. If we all of the sudden changed the values returned by weekday we would break all those solutions.

    If this is something that people feel is a serious enough problem though, please provide comments to Ecma TC45, or even join the TC. There could always be the suggestion that a new "WEEKDAY2" function be added to the next version of the spec.

    -Brian

  16. Pepe says:

    Wait, so this is the big "OOX broken at storing dates!!!" issue that ODF fanboys have been screaming about for the last two weeks?  LOL

    This is an issue that is so minor that nobody cares about it in the real world.  I had been on the side that it should be "fixed" just to remove ammo from the anti-OOX crowd which are using it to bash OOX.  But now I agree that it’s not worth "fixing" this problem.  Just leave it be.

    And the ODF folks are extremely hypocritical harping on this minor issue when ODF doesn’t even have a standard way to store spreadsheet formulas in the first place.

  17. Francis says:

    It seems like this controversy could have been nipped in the bud by:

    1. not changing the numerical representation any dates after March 1, 1900

    2. renumbering all dates in the two months before then (i.e., new base date of 12/31/1899)

    3. scanning all worksheets in compatibility mode for numbers that fall in that range and that use date functions

    4. popping up a dialog telling the user about the problem in the rare occurrence that a worksheet uses both numbers in that range AND date functions

    This change would only break those worksheets that actually do use dates before March 1, 1900–which probably is minimal given Brian’s statement "[t]his hasn’t really come up as a big issue in the past 20+ years of Excel’s existence."

  18. hAl says:

    @Francis

    [quote]4. popping up a dialog telling the user about the problem in the rare occurrence that a worksheet uses both numbers in that range AND date functions[/quote]

    How is that a rare occurrence?

    That included all spreadsheets using any number between 0 and 60 and a date function or a date formatting. That could be billions.

    And even then you aren’t sure you got all.

  19. marc says:

    10/01/1900 was wednesday, ok ?

    ( i use Unix "cal" command with confidence 😉 :

    $ cal 1 1900

       January 1900

    Su Mo Tu We Th Fr Sa

       1  2  3  4  5  6

    7  8  9 10 11 12 13

    14 15 16 17 18 19 20

    21 22 23 24 25 26 27

    28 29 30 31

    excel’s weekday("10/01/1900") says it was tuesday  (!!!)

    open office’s same function give the right answer, like cal

    my conclusion ( IMHO ):

    weekday() is giving *wrong* answers ! you must *face it* and *fix it* and not burn this into a "standard" ( hope the ISO guys will be less permissive than ECMA if you submit to ISO this "legacy feature" )

  20. Francis says:

    Hal: I see your point. Here’s a better solution: when Excel displays a date of before March 1, 1900, then Excel could pop up that dialog.

    That would be easy to trap. And I think it is unlikely that there are many worksheets that use dates in the range 12/31/1899 to 2/29/1900 that do not display a date in that range on-screen.

  21. Thank you for the comments everyone.

    Now, contrary to what some people are reporting, Microsoft is not attempting to change history and add an additional day in the year 1900. We thought about adding more days to the summer of 2006 so that Vista would have a summer release, but some folks caught wind of it and we decided it was best to leave only 31 days in July instead of our original plan of 150 days. As to the year 1900, I (and the rest of the Office team) didn’t really have much motivation to change the number of days.

    Unfortunately, Lotus 1,2,3 had a bug, and rather than break people’s spreadsheet formulas, we decided about 20 years ago that we would keep the same bug. So, for those of you dealing with those first 60 days of 1900, there is a minor correction you’ll need to deal with. ALL OTHER DAYS FROM THEN TO THE END OF TIME WORK! I think that may be the first time I’ve used the all caps 🙂

    We aren’t introducing a bug into a standard in an attempt to ruin civilization as we know it. We’ve merely created an open file format that is compatible with an existing set of documents, and giving it to a standards body so that they can take over the maintenance of the format (and guarantee it’s long term availability).

    If somebody wants to add a new WEEKDAY function that spits out the right values for those first 60 days of the year 1900, that’s great. Join the Ecma TC, and we’ll get it into the next version of the spec!

    Marc, thank you for showing me that the weekday function is wrong for those first 60 days of the century. I was already aware of that, and we have “faced it” for all of Excel’s existence. It’s widely documented. This shouldn’t be a surprise to anyone. I’m sure it wasn’t a surprise to the IBM folks who have chosen this convenient time to really amp up the noise. 🙂 The spec had been open for public comments since last spring. Anyone was free to send comments or even join the Ecma TC if they felt strongly about this.

    I personally think this was the right decision. Any attempt to change the meaning of a function that’s been in use for 20 years would cause a huge amount of user pain. If this is such a big issue, then let’s add new functions (with new names) that give the right values for those first 60 days in 1900. All other dates already work though so we wouldn’t really be helping out too many people.

    If there is anyone out there who has actually been bitten by this issue and needs to see a fix, please let me know. As I said, we can look at adding new functions, or doing anything you need to find a resolution. Please describe how the issue is affecting you, and why there isn’t an easy workaround. Thanks!

    -Brian

  22. Brian –

    Nobody is suggesting you change Excel in any particular way.  Document this is a feature, a bug or compability anomaly or whatever you like.  Just don’t incorporate it into a standard so that all future implementors must do it the same way.  That is great for a specification of an existing implementation, but lousy as a standard.  As the proposed standard reads: "For legacy reasons, an implementation using the 1900 date base system shall treat 1900 as though it was a leap year."  Not even "may treat" or "may want to treat", but "shall treat", so that anybody wanting to implement this correctly will essentially be violating the standard.  It is not a question of how many people have ben bitten by the bug (precious few, I imagine), but why we would want to continue the idiocy started in 1-2-3 twenty years ago for another generation as a requirement "shall" and not even an option "may".  This is the kind of problem that exists all over in the OOXML specs, a preference for what is in MS Office rather than what should be in a standard, without even the flexibility to allow differences or acknowledge particularity.  As I have said before, I don’t think this serves Microsoft or the OOXML standard well for the furture, and feel that you are choosing short term ease at the expense of long term issues, even for yourselves.

    – Ben

  23. Ben,

    Thanks for your feedback, and at this point I think we’re just going to have to agree to disagree. It sounds like we are in agreement on almost everything except for the definition of a "standard."

    The key point of first opening up the formats and then submitting them to a standards body was to allow for interoperability with business processes; solutions; and other applications. The standard describes exactly how everything works so that people can share documents. If the date behavior we’ve been discussing here wasn’t incorporated into the standard as is, then sharing a file from Excel to another application might result in completely different calculations. It doesn’t mean we can’t add new functionality with the correct behavior, but if we try to change the legacy behavior, things will break.

    I think the big hold up here is how we define the term "standard." You’re view seems to be that in order for something to be determined a "standard", it must not allow legacy behaviors to affect it’s design. It should be designed with only the future in mind. Any legacy "bugs" or behaviors should be left out.

    I disagree. I think that a "standard" is something that is owned an maintained by the community, and not by an individual company. That doesn’t mean it can’t be designed with an individual companies history in mind. A standard is something that is completely accessible to everyone, and the "stewardship" is in the hands of a neutral party. Submitting something to ISO for example doesn’t mean that members of ISO have technically reviewed it and agreed with all design decisions. It just means that ISO views it as not contradictory (meaning that it’s existence doesn’t mean an existing ISO standard is no longer correct), and that they will maintain ownership to guarantee is accessibility.

    So, when I say that the Office Open XML formats are a standard, that’s what I mean. I agree with you that it was clearly designed with the legacy binary formats in mind. We took those legacy formats, and created an Open XML format to match. We fully documented it, and gave ownership to a standards body. The spec underwent a number of changes over the past year as it was reviewed by Ecma, but it maintained those legacy compatibility behaviors.

    The Office Open XML formats were designed with the past, present and future in mind. Now that we have that base in place the sky is the limit for version 2.0. Anyone can suggest new things to add to the formats, and given that we have companies like Novell and Apple in the TC, I’m sure there will be a lot of new things added to the next version. TC45 actually went beyond regular Ecma behavior and opened the gates early on for public comments and review of the spec. We posted a number of drafts publicly, and took public comments into account. So even if you don’t join the Ecma TC, you can still provide feedback, and if there’s something you’d like to see added next time around let us know!

    -Brian

  24. Brian –

    I agree that we have to simply disagree on this.  It is not at all that I think legacy should not be taken into account in a standard.  It is that a standard with its whole guiding principle being support of a legacy application is not much of a standard.  There are whole parts of the OOXML specs that could have been written differently to both support legacy documents and still remain open to future use.  Would that have satisfied all ODF supporters?  Probably not.  Would it have been better for Microsoft, ISV’s working to implement these specs and customers?  I would argue that it would.  Unfortunately, the two sides have hardened their positions to a point where mutual respect and interaction is difficult.

    So, I agree to disagree with you, and will leave well enough alone, on this thread at least.  As for providing input, I will look into it, but the guiding principle of direct support of legacy office documents, not even indirect support which would allow full support by you but not every implementor, seems to be off the table.  That is a mistake, in my opinion.

    – Ben

  25. Earlier today I got an e-mail from Max asking if I could help clarify the section of the specification

  26. The answer to the first question may be yes or no, but my guess is the second answer is an unequivocal

  27. Pri dnešnom tradičnom rannom prechádzani bookmarkov mi zrak padol na článok Optimalizace uložení dat

  28. I thought it might be worthwhile to give a bit of an overview of the WordprocessingML model that you

  29. I thought it might be worthwhile to give a bit of an overview of the WordprocessingML model that you

  30. For those of you not only reading my blog, but also the other content provided by the Info Support blog