Processing PDFs (or anything else!) in BizTalk Orchestrations



So you're probably aware that you can pass virtually any file (*.exe, *.dll, *.xls) through BizTalk Server's messaging components, but did you also know that you can pass just about anything through an Orchestration process as well?


If you want to pass, say, a Microsoft Word document through the BizTalk Messaging Engine, you'd simply set up a receive location that grabs the file, make sure you use the pass-through pipeline, then create a send port that drops the file back out, also using the pass-through pipeline and a subscription to the receive port. This works because everything passes through BizTalk Server as a byte stream, therefore allowing any binary object to move through unmolested by parsers. This scenario works great if you are moving docs between SharePoint libraries, or yanking off inbound email attachments (BizTalk Server 2006 only) and throwing them to a file share.


The case I'm demonstrating here is the scenario when you don't want to just pass the non-XML file through the Message Bus, but also apply some business process via Orchestration. The first step was to draw out the Orchestration process and create the necessary Messages and Variables. The goal of my orchestration is to take in a PDF file, look at the context information, and based on the inbound file name, route it to one of three locations. If it's any sort of training manual I move it to one folder, if it's a company report I drop it to another location, and if it's anything else, it rests in one final spot.


So my process looks like this:



You'll notice a few things. First of all, the message type of the inbound document is of System.Xml.XmlDocument. This message type doesn't actually require XML content. Rather, it's treated within BizTalk Server as a grab bag of any file format. Because a message traveling through an orchestration isn't automatically loaded up into the DOM but rather remains a stream (unless you have Distinguished Fields which then cause selected parts of the data to be loaded into memory), there's no problem with accepting anything into that Message. Try it out, it's neat stuff. Of course remember that I haven't shown anything that lets you get at the CONTENT of that message, as you only have access to the context properties unless you have helper components that can rip open the message.


You may assume that I'm using regular expressions to parse out the receive file name. You are correct. So in each decision shape, I use the static IsMatch member of the RegEx object to look for a key phrase in the file name. See below:




After building all this out, we deploy it. When creating the ports, remember to keep the pipelines as pass-through (unless you write a custom pipeline component that is used to add key data to the message context on the way in). You see my active configuration below:




So there you go. While BizTalk Server is keenly optimized to take advantage of XML formats, we've also enabled you to pass everything but the kitchen sink through the messaging and process engines. All the more reason you can use BizTalk Server as a hub for all the message-based traffic hurtling through your enterprise.

Comments (24)

  1. krithiga says:

    Is there any size limit when we pass files like .pdf or .gif files in biztalk ???

    Can you send me ur answer to krithiga_srinivasan@yahoo.com

  2. There’s a physical size limit imposed by the server. But theoretically, you can pass 1GB+ files through BizTalk Server. However, there’s almost no case where that is a good idea. A good rule of thumb (again depending on the server hardware) is to keep messages under 5MB.

  3. I must access the content of the file (ascii) to give it to a helper component. Is this possible?

  4. Karsten, can you provide a few more details. Do you want to map the format out of BizTalk to a helper component called from an Orchestration?

  5. joren says:

    I would like to know if you can create a pdf-file based on an xml-file for instance with biztalk, or how to do it? Thanks.

  6. You can use an adapter for PDF (see list of all third party adapters at http://www.microsoft.com/biztalk/evaluation/adapter/partner/2004.mspx), or, mostly commonly, use a custom pipeline component (for instance, see http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=78C0FD43-E4C3-4DCF-AF23-FE72F317891A) to convert the message content into a PDF format.

  7. Salam Y, ELIAS says:

    Hi, I tried to implement the idea without success. Is it possible to send me the project in order to compare it with what I have done.

    Thanks in advance. My email is salam@altern.org

    Thanks in advance

  8. Salam Y, ELIAS says:

    I sorted it out. I was using back slash instead of forward slash in the Port_2(Microsoft.XLANGs.BaseTypes.Address) So first, I used,

    ———-

    @"file://C:TutorialProcessingAnyThingInBTSOutTrainingtraining.pdf", then I switched to

    @"file://C:/Tutorial/ProcessingAnyThingInBTS/OutTraining/training.pdf"

    It works like a charm

    Again nice idea and well done

  9. Excellent.  Glad you got it. That syntax is so picky!

  10. Noel Austin says:

    Great blog!   I got everything working except I’m trying to use an expression to change the FILE.ReceivedFileName property.  

    When I try to say:

      FILE.ReceivedFileName = "test.txt";

    I get the error – "Cannot implicitly convert type "System.String" to "FILE.ReceivedFileName" .  I’m wondering if this is because the message is type XmlDocument and not XLANGMessage.

    Any suggestions?

  11. You actually can’t change that property.  You’d have to use a dynamic port to set the outbound file name.

  12. edgar otero says:

    Any chance I can get the project file so I can learn from it. I am a beginner BizTalk developer. Thanks

  13. Hey Edgar,

    I accidently paved over the project when rebuilding a virtual machine, but the steps I outlined here are fairly easy to reproduce.  Try setting it up yourself, and post any questions you have.

  14. Jan says:

    Hello Salam Y, ELIAS,

    i am new to BizTalk Server and also in the org where i am working, no one has worked on biztalk server before. i will be oblidged if u send me the sample code for the scenario being discussed here. thanx

  15. Hey there,  as posted 1 comment above, I actually don’t have the physical bits anymore, but I fairly accurately showed all the parts you need to built it in this post.  If you have any questions while setting it up, feel free to post.

  16. bsmith says:

    Great article.  How do I keep the same filename when processing the file?

  17. bsmith says:

    What I would like to do is use this process as a file copy to deploy to a web site.  I need the files names to stay the same as they were when they come in.  I don’t see how to keep the file names the same.  Am I missing something in the Set Port Address Expression?

  18. Note that you have a context property called FILE.ReceivedFileName which you can then either extract out the specific file name using a pipeline component, or, more easily, use the %SourceFileName% macro on the outbound port.  This macro strips out the path info (which is contained in FILE.ReceivedFileName) and leaves you with the simple file name.

  19. bsmith says:

    Thanks for your help.  The macro works great.

  20. mtntrax3 says:

    I am very new to BTS and need to ‘get’ all xml attachments from any given email for processing. For now I am just trying to get them and drop them to a designated folder. I see various examples but nothing that gives me enough information or direction.

    Can someone help me explain the steps. I can get an email and get "an" attachment of a particular index. I need to get all xml attachments – the number of files(attachments) will be unknown.

    Thanks,

    Phil

  21. Just a bit over a year since I started BizTalk blogging, so I thought I’d take 5 minutes and review…

  22. Atul Gupta says:

    good stuff !

  23. Rabi says:

    I am having one http receive port which receive the name of the image need to be moved from in folder to out folder. In folder have n number of images but the http request says which need to be moved. How can I achive this. Pls Help…. Also please reply to rabi.sahu@gmail.com

  24. You could have the HTTP message contain the file name, then within an orchestration go grab the file at that name (using a helper component).  Then take that message and send it out of the orchestration to the new location.  If this is a multi-server solution, you’d have to be careful about only using shared addresses and not a "C:…" sort of address.

Skip to main content