I wrote this post on Joel On Software in 2004 in response to a newbie asking about EDI. Since then I have referred several people to it, but a few times I've found the post tricky to find. So, instead of constantly relying on the benevolence of Joel acting as a library for me, I'm going to recycle bits and post this here. The advice is targeted towards EDI, but it really is good general purpose advice for building ETL or any kind of document parsing...
Step 1: Get sample data from every trading partner. Refuse to do anything until you have this. Claim work stoppage, announce loudly at meetings you're stalled, send emails to VP's, whatever - GET DATA FROM EVERY PARTNER BEFORE STARTING.
Step 2: Make zero assumptions. Provide error cases for every possibility
Step 3: Get the ANSI X12 specs for the document. Read them. Compare the sample data from #1 to them. Be prepared to create program flows for every line in the implementation guides BUT look for lines that aren't used by your partners. Any line you can't find being used, document it. Once you have a full list, send that list to your manager for "I don't see these fields being used - do we need to implement them?" Get the answer in writing.
Step 4: Make sure your code can provide for the lines in step 3 when some partner starts using one of them the day after you deploy.
The hardest part about EDI is that the rules are observed mainly in the breach, and nobody makes partners follow the rules. I built my 810/850 parser as a class hierarchy, which ended up serving me VERY well - it was very modular, and changes like those in step 4 above turned out to be relatively straightforward.
Final note: make sure you know what versions the partners are using, too. 🙂