Here is the scenario…
Every day you get a text file with a number of financial records in a fixed length format. Your job is to identify any new or changed records from the previous day and be able to produce an output file in the same format containing only the new or changed records. You can make no assumptions about the ordering of records in the file.
Here is a sample of the data
1310|512|086048610|01/01/1996|WB| |12/31/9999|1290.00 |USD5 |
1310|512|110000011|06/10/2002|WB| |12/31/9999|100.00 |USD5 |
1310|512|110000111|06/10/2002|WB| |12/31/9999|100.00 |USD5 |
The data files can get quite large (3GB)
Question for you: How would you architect the solution for this to achieve optimal performance?