DVR-MS: Adventures in Closed Captioning


I finished the code for this project months and months ago, and I had every intention of writing a full MSDN article describing the ins and outs of what I’d accomplished, but time seems to have gotten away from me. Rather than let the code languish any longer, I’ve decided to simply write up a (relatively) short blog post, making the code available for all to explore.


For those of you who have read my previous MSDN articles about Media Center, you know I’m an avid fan. I’m also really interested in exploring and expanding on the developer scenario for working with Media Center as well as with the DVR-MS files in which it saves recorded video. There is so much interesting information there, be it the actual video or audio content, or be it the metadata surrounding that content. However, there is a large amount of information available in DVR-MS files that has largely gone unnoticed and unsung, and with this post I hope to rectify that inequity.


Closed captions are an amazing source of information about recorded television shows. The metadata headers in DVR-MS files provide a great deal of information about the show, most of it based on information captured from the electronic program guide. But closed captions detail everything that goes on in a show, the lines that were spoken, interesting dialog, when music is played, and so on. Being able to harness this data makes it possible to write a wide variety of just plain cool applications that make watching and working with recorded television much more enjoyable.


To exemplify this, I’ve written a managed class library in C# (extending that which I created for my original Fun with DVR-MS article on MSDN), that parses and exposes all of the closed captioning data from NTSC non-HD DVR-MS files in a developer-friendly fashion. I’ve then layered on top of this library a few sample applications to demonstrate the power of such a library and the types of application you can have fun writing yourself.


How do I use the library?


You can download the complete sample source code and compiled binaries here. It includes Toub.MediaCenter.Dvrms.dll, which provides the classes for parsing out the captions and working with them. The central class to this effort is the abstract class ClosedCaptionsParser, available in the Toub.MediaCenter.Dvrms.ClosedCaptions namespace. From this class derives the concrete NtscClosedCaptionsParser class which is used to extract the closed captioning data from NTSC files (you’ll notice that the library also includes a PalClosedCaptionsParser, but if you look inside it, you’ll see that it’s simply a shell waiting for you to implement it 😉 To use an NtscClosedCaptionsParser, one simply instantiates the class, passing to the constructor the path to the DVR-MS file whose captions are to be extracted, and then calls the parser’s GetCaptions method. GetCaptions returns a ClosedCaptionsCollection, which contains the parsed ClosedCaption instances. Each ClosedCaption instance exposes four properties, three of which are TimeSpans and one of which is a string. The string is the text of the caption (most likely an individual word or sentence from the captioning), and the TimeSpans represent the times at which the data for the caption began to be received, was displayed to the screen, and was cleared from the screen. The NtscClosedCaptionsParser actually creates instances of a class derived from ClosedCaption, NtscClosedCaption, that provides two additional properties. The first of these additional properties is of the enumeration type NtscClosedCaptionType and describes the type of the caption: roll-up, pop-on, or paint-on. To understand the distinction between these types of captions, I suggest you read the free Code for Federal Regulations document 47CFR15.119. This will provide you with a PDF of the subsection titled “Closed caption decoder requirements for analog television receivers,” which is part of the book covering the Federal Communications Commision (FCC) and the section covering radio frequency devices. The other property exposed by NtscClosedCaption is Channel, which tells you with which channel of captioning this particular caption is associated (you’ll often find multiple languages on different channels interwoven, and this allows you to separate them out from each other).


So, as an example of how this can be used, here is a snippet of code that parses the captions from a DVR-MS file and writes them out to the console in a tab-delimited fashion:

NtscClosedCaptionsParser parser = new NtscClosedCaptionsParser(filename);
ClosedCaptionCollection ccs = parser.GetCaptions();
Console.WriteLine(“Start\tDisplay\tClear\tText\tType\tChannel”);
foreach(NtscClosedCaption cc in ccs)
{
Console.WriteLine(“{0}\t{1}\t{2}\t{3}\t{4}\t{5}”,
cc.StartTimecode, cc.DisplayTimecode, cc.ClearTimecode,
cc.Text, cc.CaptionType, cc.Channel);
}

A Few Words About NTSC Closed Captions


A brief tour of NTSC closed captions is probably appropriate. Note that there are a lot of intricacies involved with parsing and rendering closed captioning data, and I’ve chosen to ignore most of them. Thus, I’ve coded up a simplified parser that just deals with the text included in the stream (most of the specification is dedicated to the commands that are also included in the data, which dictate actions such as moving the cursor around the screen and changing the font color). This works very well for most, if not all, of the recorded programs I’ve tested it with, but there could very well be some recordings that cause this code to blow up. If it destroys your computer, I take no responsibility (though I would like to know about it, as I’d find it pretty amazing).


DVR-MS files are created by the Stream Buffer Engine (SBE) introduced in Windows XP Service Pack 1, and are used by Media Center for storing recorded television. They contain metadata describing the contents of the recording (title, episode title, actors, etc.) as well as a variety of data streams. A typical DVR-MS file is made up of two or three of these streams, depending on what version of Media Center created it, the type of signal being recorded, and the actual data being recorded. Most, if not all, DVR-MS files will have both an audio and a video stream. The third stream is the one that may or may not exist, though it will for almost all NTSC content recorded with a recent version of Media Center. This third stream contains the closed captioning data for the recorded television show. As with the audio and video stream, the closed captioning stream is encrypted/tagged, so it must first pass through a Decrypter/Detagger filter. If you use GraphEdit to look at the Out pin on this filter, you’ll see that its major type is AUXLine21Data and that its subtype is Line21_BytePair (this may vary based on whether the content is NTSC, PAL, HD, etc.). Closed captioning in NTSC television shows is encoded into line 21 of the Vertical Blanking Interval (VBI).


There are a few approaches one could take to extract the closed captioning data from the CC stream. One approach would be to hook up a Dump filter to the CC stream, saving the CC stream’s raw contents to a file. This file could then be opened and parsed through a managed FileStream. This approach, while very simple, has some significant downsides. The streams that make up DVR-MS files are actually split up into a series of samples, where each sample contains data but also some metadata about that data. One important piece of metadata for each sample is the timecode at which that sample is supposed to be rendered when the file is played. However, when you use the Dump filter to dump the data contained within a stream, you’re simply dumping the data contents of each sample, not the metadata. Thus, there’s no way by examining the dumped bytes to determine with certainty what timecode each dumped byte corresponds to in the original file, especially if the CC stream isn’t continuous (for example, if commercials in the show aren’t captioned).


There’s another way to access the closed caption data, and this approach provides full fidelity, as it provides you access to the metadata for each sample. The Windows Media 9 Series Format SDK, provides classes and interfaces that allow you to read Windows Media files using synchronous calls. These interfaces can also be used with DVR-MS files, and as such the Format SDK allows you to process DVR-MS files using the IWMSyncReader interface. The WMCreateSyncReader function is used to create one of these synchronous reader objects.


In order to parse out the closed captions, I use an IWMSyncReader to find and walk through the closed captions stream in the DVR-MS file. This can involve reading in and looking through gigabytes of data, which means that this isn’t currently a fast process. In fact, to parse the captions from a half an hour show can take a minute or more on my laptop (though I’m running on battery power right now, on the plane on the way back from PDC 2005, so that might be slowing down the process further).


The closed caption stream is divided into a series of two byte instructions, some of which are data and some of which are command codes that detail how to process the data. For example, one command might inform the processor to render the current caption to the screen, and another command might ask the processor to clear the currently displayed caption. My parser is a very simple state machine that loops through the data looking at each byte pair, processing and reacting to each as they occur. There is a whole list of command codes which I’ve special cased in a switch statement: if any of those are found, the command itself is ignored and instead a space is added to the output text. If the command is 0x942f (end of caption) or 0x942c (erase displayed memory), it is treated as the end of the current caption, and whatever text seen up to this point is stored into a new ClosedCaption instance. This instance is then added to the collection of captions, and the text buffer is erased to prepare for the next caption. With the exception of special characters (which occupy two bytes and begin with either 0x11 or 0x19, both of which I’ve ignored and treated as commands) such as musical notes and registered marks, each text byte is inclusively between 0x20 and 0x7A and represents its ASCII equivalent. As such, if a byte isn’t part of a command, I simply cast it to Char and add it to the current text buffer (which will eventually be stored in a ClosedCaption instance when an end of caption or erase displayed memory command is received).


Two-bytes of closed captioning data are sent with every frame of the video, and so determining the time code at which to display a caption can be calculated based on in what frame it was sent (frames that contain no useful closed captioning will contain 0x8080 as a filler). There are two ways to compute the time code based on the frame: dropframe and non-dropframe. Non-dropframe is computed simply by dividing the frame number in which the closed caption instruction was sent by the number of frames per second; for NTSC, this is 29.97 frames per second, and for PAL, 25 frames per second. Dropframe, on the other hand, is used for most broadcast signals and attempts to account for the non-integer 29.97 value using a scheme similar to that employed for leap years (i.e. the earth actually spins 365.25 times in a year, so rather than worry about the quarter day each year, it’s simply rounded off and turned into a 366th day once every four years). Instead of computing the time code based on 29.97 frames per second, it’s computed based on 30 frames per second. This results in 18,000 frames in ten minutes, as opposed to using 29.97, which results in 17,982 frames in ten minutes, a difference of 18 frames every ten minutes. So, when computing timecodes with dropframe, the first two frames of each of the first nine minutes out of ten are ignored, thus eliminating the problem of having an extra 18 frames. Thus, if you use non-dropframe to decode a broadcast that used dropframe, by the end of each 10 minute cycle the timecodes could be off by a little more than half a second. For simplicitly, I’ve decided this is an acceptable discrepancy and have coded my timecode computation method to use non-dropframe. Feel free to change it if this bothers you.


To help with the speed issue I mentioned previously, by default the ClosedCaptionsParser caches a serialized version of the parsed ClosedCaptionCollection into a NTFS Alternate Data Stream associated with the DVR-MS file. After the captions have been parsed successfully once, by default any attempts to parse the captions in the future will first attempt to retrieve the cached captions from this stream. So, while this operation is very fast on all subsequent parsing operations, parsing a DVR-MS the first time can cause significant wait time.


Navigation


For me, one of the most interesting scenarios this capability presents is enhanced navigation. Most folks today with PVR capabilities are stuck in the television-watching mindset of fast-foward and rewind. But what about search? Search is huge! What if you could jump to a place in the video where a particular line was said? What if you want to show someone that really funny joke in the episode of Friends you recorded last night? Closed captions make that possible.


I’ve dubbed this first sample application I’ve implemented “Search and View,” and it does exactly what its name implies. The application hosts the Windows Media Player ActiveX control in order to play a DVR-MS file. When a file is selected to be played, the captions are parsed from the file and are displayed in a list box, allowing individual captions to be selected. When a caption is double-clicked, the video jumps to the location in the video where that caption was displayed to the screen, allowing you instance access to any spoken dialogue in the video. Moreover, there’s a search box on the form that provides very simple searching capabilities. You can enter a search term and have the list of captions narrowed to only those captions that include the search term. Obviously, there are a plethora of ways in which this application can be expanded upon and improved, but I happen to think it’s pretty darn cool as is. Thanks to Derek Del Conte for working with me to flush out the idea for this sample. 


Search


If you’re like me, you record many episodes of a few different shows, and sometimes it can be difficult finding the show you’re really interested in watching. What I really needed was a way to do an intelligent full-text search on all of the videos on my hard disk in order to narrow down my files to only those I’m interested in… oh wait, I already have that: Windows Desktop Search. In order to use Windows Desktop Search to allow me to search for recorded videos based on that funny dialogue I want to play for my fiance, I implemented an IFilter for DVR-MS files. This IFilter is written in managed code using COM interop to expose the necessary functionality to Windows Desktop Search so that it can index all of the closed captions contained in my recorded DVR-MS files. I can now simply type into the Desktop Search text box a phrase from a show I previously recorded, and voila, I’m instantly provided with the DVR-MS file I should view. Wow. 


Note: I mentioned earlier that it can be very slow to do the initial captions parse for a DVR-MS file. The way it’s currently implemented, this can cause problems for Desktop Search, which expects the IFilters it uses to be timely in their responses. If an IFilter takes too long to process a file (some number of minutes), Desktop Search does the right thing and assumes the IFilter has hung, aborting the indexing for that file. Additionally, the way I currently parse the file, I do all of the parsing in one fell swoop, which precludes Desktop Search from throttling the indexing; a robust IFilter would handle this much better, but, well, this is sample code. The IFilter will run very quickly if the captions have already been cached from a previous parse. Also note that the Desktop Search team recently released their own sample for how to implement managed IFilters.


Saving Captions When Converting To Other Formats


In my Fun with DVR-MS article, I demonstrated how it’s possible to use DirectShow to convert from DVR-MS files to other media formats such as WMV and WMA. However, those samples did not preserve the closed captions. And how could they; after all, WMV and WMA files don’t contain streams for captions, right? They do something even better. Both WMV and WMA files allow you to use the WM/Lyrics_Synchronised metadata header to store a collection of strings, each of which is associated with a particular time in the video file at which the string should be displayed. Sound familiar? Synchronized lyrics are visible in two different ways in Windows Media Player. The first, and most obvious, is through the synchronized lyrics editor that’s part of the Advanced Tag Editor in Media Player.

But these lyrics wouldn’t do any good if you couldn’t view them along with the media at the appropriate time. In fact, you can. If you enable captions/subtitles for a video (the option is available from the Play menu in Media Player), Media Player will show you the synchronized lyrics along with the video at the appropriate time. So, I’ve augmented the ConvertToWmv and ConvertToWma applications I provided in the Fun with DVR-MS article to extract the closed captions from the original DVR-MS file and to save them as synchronized lyrics into the metadata headers for the generated Windows Media files.


Summarizing Video Files


This one is admittedly a bit far fetched, but I still think the idea is neat and wanted to see how it would fair. Microsoft Word has the ability to provide automated summaries for documents. You specify how much the text in the document should be summarized, Word analyzes the textual content, and it provides a new document that is some percentage in size of the original text (25% by default, I believe). Wouldn’t it be neat if we could do the same thing for video files? I decided it’d be fun to use Word’s AutoSummarize feature to implement this for videos. First, I extract the captioning from a DVR-MS file. I then programmatically dump that textual content into a Word document and ask Word to summarize it for me. The summarized text is then mapped back to the original captions (in a slightly haphazard fashion, I admit) as parsed from the file in order to determine which captions should be kept as part of the summary. The RecComp class, as described in the Fun with DVR-MS article, is then used to create a video summary including the segments that contain the summary captions. Useful? Unsure. Cool? Yup, or at least I think so.


Other Ideas


At over 3500 words, this post didn’t end up being as short as I’d planned, but hey, the more the better I guess. There are so many neat things you can do with captions, I’m sure this only scratches the surface. Some additional ideas for things I’d implement if I had the time:


  • An add-in for Media Center that displays a list of captions for the current video and lets you jump to a caption. You can discover the recorded show that’s currently playing using code from my Time Travel with Media Center article.
  • An app that combines Desktop Search and my Search and View app, allowing you to search your whole disk for a particular phrase, show the video, and jump right to the phrase in the video.
  • A really powerful Tablet PC-based remote control for Media Center. An add-in in Media Center could expose through remoting not only the AddInHost, but also all of the captioning information. The Tablet could then expose an interface that allows you to navigate the show on your Media Center based on traditional navigation controls but also based on searching and selecting closed captions.
  • A speech-based navigation engine that buillds up a grammar based on the closed captions and lets you navigate the current show purely by speaking a line from the show.
  • A transcript generator. Grabs images from the video and includes them in a Word document along side the closed captions at the correct point within the document.
  • A Windows service that monitors for when new shows have been recorded and automatically starts the process of extracting the closed captions and saving them. This will then make it very fast for other applications to work with the captions, as they’ll already be parsed and available. Of course, in a sense this is one feature the IFilter provides in concert with Desktop Search.
  • In some circumstances, you can distinguish commercials from actual content based on the type and content of the closed captions… not that I’m suggesting anything.
  • A summary video generator that searches for keywords and uses IStreamBufferRecComp to generate a new summary video containing portions of the video with the specified keyword.

 


I’d love to hear what else you come up with and any feedback you might have (again, though, this is all unsupported, so while I’ll try to help where and when I can, I make no guarantees about anything). In the meantime, I hope this is helpful.


Happy programming!


-Steve

ClosedCaptions.zip

Comments (84)

  1. heaths says:

    Wow! Very nice. I especially like the IFilter. You should clean it up a little (make it robust, as you mentioned) and post it on http://addins.msn.com. I think that’s a wonderful idea and certainly makes videos more discoverable since metadata contains little about the content of the video.

  2. toub says:

    Thanks, Heath. Glad you like it! I do plan to rework the IFilter a bit when I have some more spare time; I’m constrained slightly by the interfaces exposed from sbe.dll, but it should be doable. In the meantime, the IFilter does work fairly well, so I hope it’s useful to folks.

  3. Stephen,

    Your XPMCE tools (especially the metadata tag editor and the DVR-MS Editor) are simply awesome. Both have proven very useful in optimizing my Media Center experience. I certainly hope Bill and Steve are keeping you happy. We want you around writing this blog for a long time to come… 🙂

    One question I have on the DVR-MS Editor. After I edit a file to remove commercials, I notice the KB/second rate in the output file is less than the original. Is there anything that is "lost" as a result (closed captioning, audio/visual quality, etc.)? I have also noticed that the Sonic DVD writer seems to have trouble recognizing the output file if I try to burn it to a DVD, complaining of an encoding issue. Is there a workaround for this?

  4. Yakov says:

    hi, see if you can help me.

    i’m looking for away to grab the closed caption data as string data from real time video stream. that mean i can get the text data of the closed caption when i’m waching video from my capture device.

    any suggestion would be appriciate.

    thanks

  5. Stuart says:

    I have been using the DVR-MS Editor, it’s very good, however My question relates to the transcoding to wmv, is there any way to increase the quality of the wmv file? it seems to be set pretty low, 800mb dvr-ms = 18mb wmv, I have read though you r articles but I can’t see any mention of increasing the quality of the wmv.

    thanks

  6. Stuart says:

    oops, found your answer on another post

    Sure, just specify a Windows Media Profile (.prx) as the second arg on the command line and that profile will be used instead of the poor-quality default. You can easily create profiles using the Windows Media Profile Editor that’s included with the Windows Media Encoder, available for free on the Microsoft site.

    excellent 🙂

  7. Alex Sirota says:

    What an awesome idea… Now what would be really interesting is to somehow start capturing the closed caption info and make it available in a peer to peer fashion so you could actually set favorites to record based on content rather than just by name.

    The real power in this is that the metadata currently available for a show is not that complete… Plus the credits, if available in the closed caption, basically summarize the entirety of the show to the most complete level imaginable. After all apart from the video, the titles and audio are all that is left…

    Once that is done automatically and in peer to peer fashion you could do things like:

    – record certain types of commercials, provided they are CC’d

    – record certain types of show based on keywords and other phrases

    The notion of intelligent recording would be revolutionized by making CC info available as part of the digital stream.

  8. One of the folks I talked to at CES 2006 last week asked me about how to use Media Center extensibility…

  9. Very interesting article.  Your listing of closed-captions is exactly what I’ve been looking for.  

    However, I need to know — I have a Toshiba Satellite A75 notebook (3.2 Ghz), but it does not include Windows Media Center.  What do I ultimately need in order to create the files for doing this — to create the dvr-ms files?  Can I install appropriate capture devices on a laptop, and create the appropriate files with some readily available software, or do I need to buy a computer with Windows Media Center installed?  

    Many thanks, and again, this ia a really cool application.

    Jon Rachlin

  10. toub says:

    Jon, glad you liked the article.  DVR-MS files are created by Media Center, and as of right now, I know of no other system that saves recorded content as DVR-MS.  Note that Media Center will be included in some versions of Windows Vista.  As for capturing TV signals, you’ll need a capture card of some sort, and companies like Hauppauge do make USB tuner devices, so you could certainly pick one of those up for your laptop.  All that said, what I described above could probably be accomplished for other file formats as well; I just based this on DVR-MS because I like the format and have a Media Center at home.

  11. RHR says:

    I want TV or DVR-MS can output TV’s cc but I don’t want MCE’s CC becuaes I not read small CC and no background, I would output CC available?

  12. RHR says:

    DVR-MS copy to BurnDVD, WMV, AVI, MEPG won’t work CC,

    Any idea?

  13. Josh says:

    Hi! interesting stuff, there!

    I have a tricky one for ya all: If I have website with an embedded WMP and want to enable the viewer to turn on and off his or hers cc without opening the WMP, how do I do That!?

  14. toub says:

    Josh, unfortunately I don’t think it’s currently possible to control that setting programmatically through the WMP API; there may be a way, and if you find one, I’d be interested in hearing about it.

  15. Amber Lopez says:

    Steve,

    I am not sure I am on the right page, but your article seemed interesting!  I just got a Lenovo 3000 C100.  I alos have Dishnetwork 625 DVR.  

    I know there is a way to get my tv to recognize my laptop, and vice versa.  I also know that there is a way for me to download my recorded shows on my dvr to my laptop for me to view.

    Could you either please help or direct me to the right direction?  I have googled my brains out.

    Mind you I am not as sharp nor smart as therest of thee.

    Thanks!

    Amber

  16. Jon Rachlin says:

    I just bought a new Media Center notebook  (see above post), and have tried your program on Spanish tv shows.  It works very nicely, with one exception.

    The extended characters of the EIA-708B don’t show up in your parser, even though they show up in the captions on tv.  For instance, the word niño shows up as nio.  

    I notice that in the NTSCClosedCaptionParser, you are only looking for the letters of the English alphabet: you have a line of code —

    if (b >= 0x20 && b <= 0x7a)

    Do you have any ideas about how to handle this?  

    I tried changing the code to

    if (b >= 0x20 && b <= 0xff)

    but it didn’t work.  Maybe I’m looking in the wrong place.

    Again, thanks for a great article.  I bet you didn’t think that one of its uses would be to improve one’s Spanish.

    Jon Rachlin

  17. toub says:

    Hi Jon- Glad you enjoyed the code and are finding it useful!  The code was really meant to be an approximation of 608b, and as such if you look at the spec and compare it to the code, you’ll see that I’ve omitted some things.  For example, as you saw, I only pay attention to values <= 7a, since those pretty much map to their ASCII equivalents and thus I can get close to correct results with minimal results simply by casting the value to a Char.  If you look at the spec though, you’ll see that the  is 0x7E, but it’s not in ASCII, which means you won’t get  by casting 0x7E to a Char, which is probably why this didn’t work for you when you extended the comparison range.  You could try adding code to explicitly convert certain values into Chars and see if that helps.  In general though, as I said, I intended this to be enough of a prototype/proof-of-concept to get basic applications up and running, but it’s definitely not (nor was it intended to be) a 100% compatible implementation of the specification.  Hope that helps!

  18. Jonathan says:

    Interesting ideas.  I have an issue that seemed simple, but there are no viable solutions to it — is there any way to take these extracted CCs and burn them onto DVDs with the video for use on a settop player.  I have run into many brick walls trying to research this topic, and was wondering if you had any insights as to how to address this.  This would be an immense asset to hard of hearing users that use MCE.

    Thanks

  19. toub says:

    It’s probably possible using a DVD editing application that allows you to add captioning.  I believe commercial apps let you do this, though I’ve never tried, and I’m not sure if any have SDKs or object models against which you can program.  You could certainly extract the closed captions along with time codes, so if you found a program that let you burn them to DVD, you could output them in whatever format was necessary for the DVD app to load.

  20. Mike Lanza says:

    I want to hire someone to implement a simple custom system that would generate a text file with timecodes and captions, then upload these timecode/caption pairs into a database.  Also, we’d want some other stuff (auto-transcoding the video file, uploading the transcoded file, etc.), , but the cc generation is where I’m stuck right now.  See http://www.click.tv to see what we’re driving at.

    Does anyone have any ideas who could do this job quickly and efficiently?

  21. masik says:

    Great article. I came across this from Green Button forum when I was directed to read this artichle for my question I had posted. I did not find answer to my question, which was while buring DVD of MCE recorded programs, CC data is not being written to DVD. In other words, DVDs created by ClickToDVD sw (from Sony on VAIO) of the MCE recorded TV shows, do not have CC function, even thought the CC exists in the recording on MCE. How would I know where to look and debug this problem? Any help form you is appreciated?

    Thanks in advance.

  22. toub says:

    Hi Masik-

    It’s not a problem that requires debugging; the feature you’re looking just wasn’t implemented in the program you’re using (I’m not aware of any programs that implement it).  You can probably get the functionality you require by using a professional DVD creation application, many of which allow for the creation of subtitles, but most of those, even if they accept DVR-MS files as input, probably wouldn’t extract the CC information from the DVR-MS and use it for the DVD subtitle track; you’d probably need to write your own tool to extract the CC data into a format consumable by the DVD application.  A good place to start, then, would be with the sample code I’ve provided here.

    Good luck.

  23. Jesper says:

    Hi, very nice site. I have here a dvd that includes closed captions. These are shown when playing in media center 2005. Is there a way to extract the captions to .srt or .sub format? Maybe a hint on how to do it?  

  24. Mircea says:

    Great work. Every article of yours brings so much to the table.

    I have though a problem situated before the CC stream gets into the dvrms file and because of this I can’t use the code in your examples. Hoping that you might have that kind of knowledge here’s what’s happening. While Live TV, the CC stream misses letters or words. This happens, for every line and it’s sometimes worse, sometimes not so worse. Only once, recently, for one show I couldn’t believe, CC was flawless, but it was just one time. In the recorded file the situation is the same so, it’s not a matter of displaying them wrongly. A lot of people told me that Hauppauge (the brand I own) makes bad drivers and this causes the problem. I don’t think so, because using the same hardware configuration (where MCE fails on CC), I tried another PVR-TV application (Chris TV) which displays 24/7 flawless CC. This leads me to believe that’s something in particular with the way MCE reads or interprets the CC from the cable signal. Now, believe me I searched hundreds of websites and I lost my hope to find an answer as it’s a long time now since I’m doing this. Any hint or recommendation of who or where to ask further would be welcomed.

    Regards.

  25. toub says:

    Mircea, I’ve never heard of this problem before, so unfortunately I don’t have a good answer for you.  I did forward your question to folks on the MCE team, and someone there might have a better idea of what’s happening.  Out of curiosity, what version of MCE are you using?  If you upgrade to Vista (currently at RC1), do you have the same problem?

  26. TYC says:

    Hey, I just found these tools to convert DVr-ms closed caption into .scc files, pretty cool.  Good article BTW.

    http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML#dvr2scc

  27. Mircea says:

    Stephen, sorry for taking so long.

    I didn’t try it on Vista yet  as it’s a "production" machine and I need to make some preparations before.

    Oh by the one that I’m using it’s a MCE 2005 and updated on an automatic fashion.

    I was thinking of calling MCE support but as I worked myself in Perf I know the many possible scenarios a ticket can go through. Maybe by e-mail … anyway.

    Thanks for the answer though, as I know you would’ve helped if you can. I’ll take this as one of my longest cases 🙁 and I don’t care about the survey as I’m my own customer :).

    Please, keep up the good work.

  28. TYC says:

    I’m curious too, how to convert the captions from ExtractCloseCaptions.exe to an .ssa or .srt subtitle format.  I’ll do the wmv as a last resort but I’d prefer just to use one of those subtitle formats

    "

    I have here a dvd that includes closed captions. These are shown when playing in media center 2005. Is there a way to extract the captions to .srt or .sub format? Maybe a hint on how to do it?  

    "

  29. Stephen Toub says:

    ExtractCloseCaptions is just a sample wrapper around my underlying sample library for extracting close captions from NTSC DVR-MS files.  If you look at the library, it gives you back a .NET collection of the captions, and ExtractCloseCaptions just iterates through them and writes them to the console.  You can write your own app to iterate through them and do whatever you want with them, including writing them to whatever format you desire.

  30. TYC says:

    Thanks for the quick reply Stephen, I’ll try and figure out how to do this!

  31. Jesper says:

    Does somebody know if there is a collection or object holding the closed captions when they are being shown during playing in MCE2005? If that is the case, i would you like to use it in a program which i code myself to output the closed captions from a dvd to a file.

  32. Murali says:

    Hi Stephen,

    I am trying to extract CC text from TV Tuner

    (ATI TV Wonder USB 2.0). Not succeeded in

    setting up the filter graph with ATI provided filters.

    What is the minimum set of filters required

    to extract CC text? I am not interested in the

    video as the software I am building has to

    analyze CC text alone.

    Thanks a ton in advance,

    Murali

  33. toub says:

    I’ve never used an ATI TV Wonder USB, and I unfortunately don’t know anything about the drivers and filters assocated it.  Regardless, best of luck with your project.

  34. Nick says:

    Hi Stephen,

    So I’m using converttowmv to convert from .dvr-ms files to .wmv files in Vista RC1, but the output video looks very strange. It basically has a greenish tinge and it looks like part of the video is reflected and distorted in the lower part of the frame. Using the standard .prx or another custom one that I made doesn’t seem to make a difference. Any ideas/suggestions for what might be going on? I thought that it might have to do with the frame size, which is why i tried the custom .prx that didn’t resize the frame, but that didn’t do it.

    Thanks,

    Nick

  35. toub says:

    To be honest, I haven’t tried converttowmv on Vista yet, though I plan to shortly.  I’ll let you know if I run into a similar issue, and if I do, if I end up with a solution.

  36. jeff says:

    Wondering if anyone out there could make this so extractclosedcaptions outputs to srt format…see (http://forum.videohelp.com/viewtopic.php?t=314307) for an example. I am not a programmer otherwise I am sure its not too difficult….raakjoer_AT_gmail.com

  37. Bill says:

    I was trying to compile the code with Microsoft Visual C# 2005 Express Edition, and everythine was fine except "using Microsoft.Office.Interop.Word;" is not there.

    I downloaded Microsoft.Office.Interop.Word.dll online, and the code is now working.

    I wanted to do some Closed Caption stuff with the drv-ms files I have.

  38. toub says:

    Bill, do you have Office installed?  If so, during Office installation, did you explicitly request to install the interop DLLs?  You can go back to Add/Remove Programs and change the installation so that it adds them.

    That aside, there’s just one sample project in the solution that relies on that DLL, and it’s not necessary for anything but that sample app.  Just remove the complaining project from the solution and all should be well.

    -Stephen

  39. Clifford Lazar says:

    Can you tell me the steps to get the SDK for Vista Digital Video Recorder.

    Is there a way I can contact an experienced Vista DVR programmer?

    Cliff Lazar

    cliff@lazardev.com

  40. Aurilieus says:

    Can anyone guide on getting CC working on Amcap. It would be a great help.

    Thnx

  41. Val says:

    Hi Stephen,

    "Display Time" and "Clear Time" frequently have same values in movies I record on my MS Media Center. Also, sometimes the difference between the two time values is less than .5 sec which is too short for normal audience.

    When I watch these recorded movies with captions enabled in MS Media Center, the captions show up and stay on the screen for a reasonable time, even when "Display Time" and "Clear Time" have indicate the same value.

    What’s wrong? Why am I not getting the true "Display Time" and "Clear Time" values in these fields? Any advice?

    Thank you,

    Val

  42. toub says:

    Val, are these SD or HD recordings?  It’s certainly possible that my state machine used in the decoding isn’t quite up to snuff, but I’d need to see any example before I could figure out why it wasn’t working the way you expect.

  43. Val says:

    Stephen — it’s the SD recordings I am dealing with. My system is Dell XPS 600 with a Dell-supplied Angel MPEG TV tuner and capture card. To my experience, the "Display Time" appears to be correct at least in most cases, but the "Clear Time" is likely incorrect and in some cases it simply cannot be correct. Below are the opening CC lines for the movie "Jersey Girl" I recently recorded:

    Start Display Clear Text

    00:00.562 00:00.763 00:00.763 (bell rings)

    00:00.764 00:01.464 00:01.964 (woman) Everyone,

    00:01.563 00:01.965 00:03.066 please take your seats.

    00:01.966 00:03.165 00:03.165 You heard the bell.

    00:03.166 00:04.068 00:04.366 You know what it means.

    00:04.167 00:04.367 00:04.367 Last week,

    00:04.368 00:06.971 00:07.569 the assignment was to write

    00:07.070 00:07.570 00:07.570 an essay about your family.

    00:07.571 00:13.675 00:13.978 Who they…

    00:13.776 00:14.076 00:17.981 (class) Are!

    00:14.077 00:17.982 00:20.083 And what they…

    00:18.080 00:20.084 00:22.385 (all) Mean to us!

    00:20.182 00:22.386 00:22.386 Excellent droning.

    00:22.484 00:23.086 00:23.586 So I want everyone

    00:23.185 00:23.587 00:23.587 to take out their essays.

    00:23.685 00:25.589 00:26.188 We’re going to read them aloud

    00:25.688 00:26.189 00:26.189 to the class right up here.

    00:26.190 00:30.093 00:30.594 My mom says that me and my dad

    00:30.192 00:30.692 00:30.692 have very healthy appetites.

    00:30.693 00:38.301 00:38.301 My mom says my dad’s eyes

    00:38.400 00:38.800 00:39.102 are brown because he is

    00:38.802 00:39.103 00:45.207 so full of sh…

    00:39.201 00:45.208 00:46.110 (teacher) Brian!

    As you can see, there are a few pairs of "Display Time" and "Clear Time" that make no sense:

    00:00.763 and 00:00.763

    00:03.165 and 00:03.165

    00:04:367 and 00:04:367

    00:07.570 and 00:07.570

    and so on.

    I am trying to figure out if there is an algorithm for showing closed captions with zero time difference between the Display Time and the Clear Time values. The pop-up captions sometimes appear on two or three lines, but so far I have not been able to get the algorithm for showing such captions. Apparently there should be something else, either some additional data, or the algorithm I don’t understand, that makes the closed captions work well in the movie. MS Media Center shows captions very well and they stay on the screen long enough so that I can read them!

  44. Mark Johnson says:

    This is some nice bit of coding!  Thank you for the great write up and sharing the code.  Keep up the good work.

  45. H.M. says:

    I have the similiar problem like Val’s.

    Closed caption doesn’t show up on Media Center. The SearchAndView shows there are closed captions in the dvr-ms file.

    The Display time and the Clear time is the same through the entire file.

    What could be the problem?

    Thank you in advance.

  46. Mehdi says:

    Thank you for this! Works perfectly, But I was just curious to find out if there was a way to make the SearchAndView to work for lets say a .WMV with its SAMI file?

  47. toub says:

    Mehdi, glad you like it.  One could certainly write an app to provide the same behavior for WMV/SAMI, but I haven’t done so.

  48. Val says:

    Stephen, I will appreciate any guidance you may have for me and others who experience the issue with the Display Time and Clear Time being the same. I want to write an SRT file generator, however I can’t use the Clear Time information as it works now since some captions won’t get displayed. I basically have to ignore the Clear Time and use only the Display Time information. Is there any better way? Since I use VBA for writing my applications and I don’t have MS Visual Studio, I can’t really do anything with your code.

    Many thanks for creating and sharing your libraries, and any further advice you may have!

  49. toub says:

    Hi Val- As I mentioned, it’s certainly possible that I have an error in my logic for parsing this stuff, but I haven’t found it yet.  I appreciate your providing the output results, but it’s still hard for me to diagnose without the input DVR-MS, so that I can debug how I’m actually processing the closed captioning data.  Regardless, I’m planning to release an updated version of the code; it’s possible in doing so I’ll stumble across the reason for why this is happening.

  50. H.M. says:

    Hi Stephen: If you need, I can send you the DVR-MS file that I recorded.

  51. toub says:

    Val provided me with a sample, thanks.

  52. Joe Clark says:

    You’re going to have some interesting errors pop up in character encoding, as I assure you that the character set in NTSC (changed twice, so there are actually three) does not map perfectly to US-ASCII.

  53. toub says:

    Yup.  There are other problems with this approach, too; for example, I’m not paying any attention to position layout on the screen and how that affects ordering of characters/words/sentences/etc.  But in most scenarios, with most of the text and styles used in most shows, it works just fine, and as it’s purely a sample and experiment to show the types of things you can do with this, that’s fine by me.  If someone wants to take the time to create a more robust implementation, great.

  54. Murtuza says:

    Is there any way to capture CC data from real time video stream, either DVD or TV on Line21.

    In other words I need text data of the closed caption when i’m waching video.

    Thanks in advance.

  55. toub says:

    I’m sure it’s *possible*, for example through a DirectShow filter plugged into the filter graph, but I don’t have a sample for doing so.

  56. Filiep Geeraert says:

    I ‘m really desperate on this.

    What I want to do, is simply record a TV show, that has PAL Teletext subtitles in it, then play it back on another PC (a laptop for instance).

    This laptop does not have Mediacenter installed on it, and it seems that any player I tried does not render the subtitles.

    I tried Windows Mediaplayer, Windows Mediaplayer Classic, Zoomplayer, BSPlayer, Videolan (VLC), even GeexBox (Linux bootable CD).

    All of them playback video and audio fine, but no subtitles.

    Does anyone have a solution to this ?

    I tried building a graph and extracting the teletext, but I ‘m not really getting much readable data.

    BTW : even the Medacenter PC itself only renders the subtitles when the DVR-MS file is being played from Mediacenter, not from Windows Media Player…

  57. For people interested in a Teletext subtitles extraction utility, have a look at my homepage.

    I think it will only work with Teletext captured from analogue PAL broadcasts.

  58. Forgot to add the URL.

    You can find my Teletext extraction utility at :

    http://www.extrabuttons.net

  59. saliim says:

    Was looking for the original article "Fun with DVR-MS". The link does not work. Could someone forward me to a valid link ?

  60. I always find it hard to read and adapt other people’s code.

    Is there anyone who could write a program which does the following :

    1) Synchronously read a DVR-MS file of choice

    2) write a dumpfile containing the 2nd stream (the one that contains the Teletext data, with a marker (for instance : <next frame>) at the beginning of each frame.

    The reason I am asking is this :

    My Teletext ripper program works quite well, the biggest problem I have though, is that I start with a dumpfile I create with graphedit.

    When I create that dumpfile timing information is lost, so I then get the timings from the Teletext clock that is displayed in the upper hand corner.

    However, this clock only has a 1 second accuracy, which is not accurate enough, especially for live subtitles.

    So if I could start with the same dump file, but it would contain frame or time markers, I could get (almost) perfect timings !

  61. kelvin wong says:

    hello,

    this is perfect tool for me, because I am deaf. of course, you aren’t racist!!!!!

    that is enough proof :), that is why I like Media Center (as love as my wife)

    kelvin

  62. badbob001 says:

    Due to a driver bug in my capture card that will never be fixed, the closed captioning of recorded videos is two seconds ahead of the audio. How practical is it to update the closed captioning timecode in a dvr-ms to subtract two seconds? I’m not interested in converting dvr-ms to wma, just updating the cc data in the dvr-ms.

    Thanks!

  63. Fnord says:

    I’ve been looking for a way to convert dvr-ms into divx, playable on the Toshiba DVD player/recorder.

    While I found some software to convert into playable divx, I miss the captioning.

    Many divx players support a *.srt file (same name as the movie playing)

    I found this wiki that provides an example and modified the code here to produce the required format.

     http://en.wikipedia.org/wiki/SubRip

    So, in ExtractClosedCaptions.cs

    From the comment line:

    // Save the captions out to a faux-spreadsheet (tab-delimited text file with .xls extension)

    string outputName = filename.Replace(".dvr-ms", ".srt");

    using(StreamWriter writer = new StreamWriter(outputName))

    int nCurrentIndex = 1;

    {

    foreach(NtscClosedCaption cc in ccs)

    {

     writer.WriteLine("{0}", nCurrentIndex++);

     writer.WriteLine("{0} –> {1}", cc.StartTimecode, cc.ClearTimecode);

     writer.WriteLine("{0}", cc.Text);

     writer.WriteLine("");

    }

    }

    Compile and run as before.

  64. nurav says:

    I desperately need some H.264 test streams which contains CC data. am done with the parsing nd decoding but am not able to verify content due to the lack of streams.. PLZ help..

    email me if u have any at

    amit0353@gmail.com

  65. tleung says:

    Downloaded the ClosedCaptions.zip file on to a machine with a fresh install of Vista Home Premium.

    I tried ConvertToWmv, and it gave an exception.

    Anyone else seen this ? I was able to run ConvertToWmv on a XP pro machine using the same dvr-ms file. Thanks.

    C:downloadClosedCaptionsCodeConvertToWmvbinDebug>convertToWmv LifeToday.dvr-ms

    Converting from LifeToday.dvr-ms to LifeToday.dvr-ms.wmv

    0.00%

    Unhandled Exception: Toub.DirectShow.DirectShowException: The operation completed successfully

      at Toub.MediaCenter.Dvrms.Conversion.Converter.RunGraph(IGraphBuilder graphBuilder, IBaseFilter seekableFilter)

      at Toub.MediaCenter.Dvrms.Conversion.AsfConverter.DoWork()

      at Toub.MediaCenter.Dvrms.Conversion.Converter.Convert()

      at Toub.MediaCenter.Tools.ConvertToWmv.Main(String[] args)

  66. Matt says:

    Hope you can help.  I have a TV Guardian that takes a CC signal to filter out curse words.  Anyway to make Vista Media Center pass a normal CC signal through to the TV?

    Thanks,

    Matt

  67. Howard says:

    I tried to extract CC from an HD recording dvr-ms file and get this error:

    Unhandled Exception: Toub.DirectShow.DirectShowException: The input media format

    is invalid. —> System.Runtime.InteropServices.COMException (0xC00D0BB8): Exce

    ption from HRESULT: 0xC00D0BB8.

      at Toub.DirectShow.IWMSyncReader.GetOutputFormatCount(Int32 dwOutputNum)

      at Toub.MediaCenter.Dvrms.ClosedCaptions.ClosedCaptionsParser.FindStreamNumbe

    r(IWMSyncReader reader, Guid streamTypeId)

      at Toub.MediaCenter.Dvrms.ClosedCaptions.NtscClosedCaptionsParser.Parse(IWMSy

    ncReader reader)

      at Toub.MediaCenter.Dvrms.ClosedCaptions.ClosedCaptionsParser.GetCaptions()

      — End of inner exception stack trace —

      at Toub.MediaCenter.Dvrms.ClosedCaptions.ClosedCaptionsParser.GetCaptions()

      at Toub.MediaCenter.Tools.ExtractClosedCaptions.Main(String[] args)

    Is it because it’s ATSC instead of NTSC? Can this tool handle ATSC recording files? Thanks.

  68. You made a few mistakes Stephen.

    What you wrote has almost no practical value because the main use of CC is to Closed Caption TV shows like infomercials.

    Each half hour infomercial I own airs 3,000 to 7,000 times a month on every NBC, ABC, CBS, and FOX TV station and each show must be Closed Captioned.

    Almost every Broadcast TV station in America accepts analog Betacam SP tape–that is the standard for 99% of the TV stations in AMerica.

    Most TV shows and infomercials are edited in either Media 100, AVID, or Final Cut Pro on the Mac. In fact, NOBODY uses a Windows based PC in the production business.

    So your article doesn’t explain how to CC a Betacam SP tape from a non-linear editing system on the Mac.

    BUT, your code can be used to do this! And I think it would benefit your readers to know how to IMPORT CC files you create into Adobe’s AfterEffects on the MAc.

    You can install AfterEffects on a PC to start and important your file into AfterEffects. AfterEffets composite video frame by frame and the top 1 pixel of each of your frames are the CC byte codes!

    If you look at the top of each frame created in your program the grayish rectangles are the byte codes. Simply mask out everything but the top 1 pixel and combine with the file format you want for the Mac and VIOLA!  you have a file that is Closed Captioned on Line 21 that can be imported in a Mac for editing.

    You must have know that you can do this Stephen? Is it politically incorrect to talk abot the Mac in this forum?

    You wrote a great piece of software!

  69. Kyuho says:

    Hello,

    I guess finally I’ve found a solution to capture Caption. I took me a long time to find this article.

    I’ve tried to capture TV show with TV card on my Vista and XP PC but It was not easy.

    I decided to buy ATI TV Wonder Card for only use on XP PC.

    But I want to record CC from analog broadcasting with HDTV card on Vista PC.

    To record CC, to search words and others on the article are all the things I’ve wanted TV card sortware to have .

    Actually, as a normal user I cannot understand about the article at this time but I try to figure it out with help of the article.

    Thank you very much.

  70. MJ Hufford says:

    I would love to take a look at your code and see if it could be modified to create a language filter for DVR-MS files in Vista MCE.

    For example, you could create an XML file of "unapproved" words.  When those words are found in CC, then the audio could be muted momemtarilly until the word has cleared the screen.

    You could also show CC while the audio is muted and even replace the unapproved word with another less offensive one.

    I know this is being done with hardware devices like "TV Guardian", but would love to see it as a service in Vista MCE…fully configurable by the user.

    Any thoughts from the peanut gallery?  

  71. Bill says:

    The link to the code doesn’t appear to be working.  Anyboady know a working one?

    Thank you.

  72. David says:

    @Howard:

    Line 21 for ATSC has a different GUID. If you change the GUID from : 670AEA80-3A82-11D0-B79B-00AA003767A7 to b88b8a89-b049-4c80-adcf-5898985e22c1 on line 16 of the file: CodeToub.MediaCenter.DvrmsDirectShowAmMediaType.cs it starts working for ATSC High Def programs. Or you could add a new static variable if you want to use both old and new GUIDs.

  73. troester says:

    Toub,

    CGood article. What happened with CC in DVDs in Windows 7 MC?

  74. Bill SerGio says:

    Hi,

    It is a FACT that 99% of all professional Video production for television is done on the Mac—I know because this is my business.

    There is only ONE program for teh Mac, called "MacCaption" that sells for $5,000 that can closed caption video on the Mac. It would be nice and profitable if a Closed Captioned video file that was ALL black video with the CC byte codes could be converted to a quicktime file in the animation codec, no compression with frame size at 720×486 at 29.97fps. A quicktime animation file in black, uncompressed wih 720×486 will combine with any other quicktime format in AfterEffects on the Mac to Close Caption any video in any of the NLEs for the Mac like Media 100, AVID, and Final Cut Pro.

    Does anyone know how or if it is possible to ceate such a quicktime animation file for the mack from the  PC file format created here in this article?

    Bill SerGio, The Infomercial King

    tvmogul1@yahoo.com

  75. Jeff says:

    Can we make it work with .wtv files in windows 7?

  76. Jeff says:

    In Windows 7, I can convert a .wtv file to .dvr-ms by right-clicking a .wtv file; however, it cannot read converted .dvr-ms file either. It has no problem with original .dvr-ms files recorded in Windows Vista.

    Any ideas? Can we fix it? Thanks!

  77. Lli says:

    Hi Toub,

    Is there any progress for Windows 7? Have the same question as Jeff.

    Thanks a lot