Scripting an Internet Explorer window


Today's Little Program takes a random walk through MSDN by starting at the Create­Process page and randomly clicking links. The exercise is not as important as the technique it demonstrates.

function randomwalk(ie, steps) {
 for (var count = 0; count < steps; count++) {

  WScript.StdOut.WriteLine(ie.document.title);

  var links = ie.document.querySelectorAll("#mainSection a");
  do {
   var randomLink = links[Math.floor(Math.random() * links.length)];
  } while (randomLink.protocol != "http:");

  WScript.StdOut.WriteLine("Clicking on " + randomLink.innerText);
  randomLink.click();

  while (ie.busy) WScript.Sleep(100);
 }
}

(I'm assuming the reader can figure out what language this script is written in. If you have to ask, then you probably won't understand this article at all. I am also not concerned with random number bias because Little Program.)

To talk a random walk through MSDN, we ask for all the links in the main­Section element. Note that I'm taking an undocumented dependency on the structure of MSDN pages. This structure has changed in the past, so be aware that the script may stop working at any time if the MSDN folks choose to reorganize their pages. I'm not too worried since this is a demonstration, not production code. In real life, you are probably going to script a Web page that your team designed (as part of automated testing), so taking a dependency on the DOM is something the QA team can negotiate with the development team. (If your real life scenario really is walking through the MSDN content, then you should use the MSDN content API. Here's sample code.)

Anyway, we grab a link at random, but throw away anything that is not an http: link. This avoids us accidentally navigating into a mailto: link, for example.

We then invoke the click() method on the link to simulate the user clicking on it. We could also have just navigated to randomLink.href, but I'm using the click() method because it is more general. Your script may want to tick some checkboxes and then click the Submit button, and those actions can't be performed by navigation.

We then wait for the Web page to settle down. I'm lazy and am simply using a polling loop. If you want to be clever, you could listen on the on­ready­state­change event, but this is just a Little Program, so I'm content to just poll.

Once we have settled on the new page, we loop back and do it again.

Now we just need to drive this helper function.

var ie = new ActiveXObject("InternetExplorer.Application");
ie.visible = true;
ie.navigate("http://msdn.microsoft.com/ms682425");

// Wait for it to load
while (ie.busy) WScript.Sleep(100);

randomwalk(ie, 10);

ie.Quit();

We create our own instance of Internet Explorer so we can change its carpet without getting anybody upset, navigate it to the Create­Process page, and wait for the page to load. We then use our random­walk function to click on ten successive links, and then when we're done, we bring in the demolition crew to destroy the browser we created.

For extra evil, you could commandeer an existing Internet Explorer window rather than creating your own. (Now you're barging into somebody's house and rearranging the furniture.)

var shellWindows = new ActiveXObject("Shell.Application").Windows();
for (var i = 0; i < shellWindows.Count; i++) {
 var w = shellWindows.Item(i);
 if (w.name == "Windows Internet Explorer") {
  randomwalk(w, 10);
  break;
 }
}

Making the appropriate changes to random­walk so as not to be MSDN-specific is left as an exercise.

Comments (12)
  1. Vitor Canova says:

    Wow. This kind of code will trigger a lot of Browser warnings.

    Clearly javascript and [access denied] things. ;)

  2. The real improvement this little program could use would be to start from http://www.google.com/search - I'm sure that the MSDN folks have changed the URL of the CreateProcess page at least three times since the publication of this article.

  3. Adam Rosenfield says:

    My first thought on seeing the "while (randomLink.protocol != "http:")" condition was "but what about HTTPS links?".  Ok, it's a Little Program, fair enough.  And as it turns out, MSDN doesn't really have many HTTPS links (just a couple relating to login), so it's no big deal here.

    What concerns me more, though, is that MSDN supports HTTPS — they went to the trouble of obtaining SSL certificates — but it actively redirects back to HTTP.  Go to any page on MSDN as HTTPS, and it redirects back to the insecure page.  I know that that's beyond Raymond's purview, but it makes no sense.  "Oh, you want a secure connection?  Come back with an insecure connection instead."

  4. Tim says:

    You can also receive events from the IE automation object if you use create the object using WScript.CreateObject and provide a prefix for the event handling functions. e.g.

    var done = false;

    var ie_NavigateComplete2 = function (pDisp, URL) {

        done = true;

        WScript.echo('event caught');

    };

    var ie = WScript.createObject('InternetExplorer.Application', 'ie_');

    ie.visible = true;

    ie.navigate('http://www.google.com&#39;);

    while (!done) WScript.sleep(100);

  5. Joshua says:

    @Adam Rosenfield: While MSDN disagrees with HTTPS everywhere, they use HTTPS on the parts of MSDN that it makes sense for (anything that actually makes use of your login).

  6. Gabe says:

    My guess why they redirect https to http is that they want to avoid having to implement SSL on their whole CDN. If they don't implement SSL on their CDN, all users will see security errors when trying to download things like images when the main URL is https.

  7. Aaron says:

    "To talk a random walk"  ==>  "To *take* a random walk"

  8. cheong00 says:

    I think the CDN already supports HTTPS, though. The MSDN forum is full of javascript that has HTTPS link.

    I do hope they use URL begin with "//" (follow the recommended way to use Google's CDN) though. So users that don't login can continue to use full page HTTP, and the logged-in user can use full page HTTPS.

  9. foo says:

    @Adam Rosenfield. You could use regex to handle the randomLink.protocol != "http:" string comparison problem, but then you would have two problems </obligatory>

  10. Neil says:

    This post doesn't seem to have made it on to the feed for some reason...

  11. Vitor Canova says:

    @Neil I received it by RSS with no problem. Even the comments.

  12. Neil says:

    @Vitor Indeed I can see it today, it must have been fixed in the mean time.

Comments are closed.

Skip to main content