old (and long) e-mail ramble about agent technology and other trivia

Peter’s
response to my TiVo post
for some reason reminded me of a rambling e-mail conversation
I had several months ago; for your amusement I repeat it below.  Incidentally,
mail threads like this are reason #128 why I like working at Microsoft…


From: Ken

Sent: Thursday, July 17, 2003 12:12
PM

To: Barry; Michael; Wade; Richard; Mark;
John; Keven; Praveen; Bruce

Subject: How many lines of code can
fit on the head of a pin?

Importance: Low

Discussion that arose questioning how many lines of code there are in the world.

We were discussing Feynman and his thoughts on miniaturization, nano-tech, etc….many
years ago he put forward a challenge to reduce a page of text to 1/2500th of
it’s original size. At the time he made this challenge, all of the books in the world
would be able to be “printed” on a sheet 3 meters on each side (9m^2 area). Note that
this was not encoding, but actually forming all letters, pictures, etc. The original
challenge was met (to make this a bit more “real”, think of the 30 volume Encyclopedia
Britannica…at 1/2500 it would all of the pages would fit on the head of a pin).

So, to us geeks, the natural question arises, how big of an area to print all the
code? What if you didn’t print it, but encoded it (I would guess that you’d want at
least 3 atoms/bit to make retrieving it a bit easier.) Of course you could use single
atoms to represent more than a bit, one carbon atom might represent ‘0’, gold atom
for ‘1’, platinum for ‘2’, lead for ‘3’…just right there we’ve reduced the size of
a two-dimension printout by a factor of four.

My initial “quick-thought” below.

K.


 A co-worker and I were discussing this today:

·         How many lines of code (LOC) are
there in the world?

·         What if you limit it to 'active'
code?

·         What's the ratio of total lines
to active lines?

·         Are there more LOCs than lines in
books?

I'd guess somewhere in the realm (within a couple orders of magnitude) of 10 Billion
LOC (which is based somewhat on how many LOC I think are at Microsoft), with the Total/Active
ratio somewhere around [2-3]:1.

Based on an older, but probably still "right ballpark", figure of 30 million books
in the world (taken from Feynman's The Pleasure
of Finding Things Out
) and supposing that an average book has 750 pages
with 50 lines each we'd get 1,125,000,000,000 lines (a bit over a trillion lines).
As this is several orders of magnitude above my guess, I’d bet that books have the
lead.


PS: Why the strange “To:” line? Figured I’d include the folks that worked on the horrid
LOC project, some of the stranger thinkers I know, a tried and true researcher (who
probably also fits in the previous category), the man whose initials are found in
BILLIONS of executables. No “deep-thought” answers required, but if you’d like to
chime in let me know if I can include your thoughts on my web page.


From: Mark

Sent: Thursday, July 17, 2003 2:54 PM

To: Ken; Barry; Michael; Wade; Richard;
John; Keven; Praveen; Bruce

Subject: RE: How many lines of code
can fit on the head of a pin?

Rambles:

I think there are way more than 30M books.  Harvard’s library has 11M+. 
What does LOC (Library of Congress) have in it?

Feynman was merely talking about taking the existing representation (aka print) and
shrinking it.  Clearly, printing code and then shrinking it will have the same
set of issues.

Certainly, we can encode the book in
atoms, but thermal, chemical, and quantum issues make reading the
data interesting.  If reading wasn’t interesting, then I could encode it all
in one bit.  0 means not the entire set of code. 1 means the entire set of code. 
If you valued reading, then the issue of specifying the decoding algorithm should
be brought in.  With the single bit interpretation above, you could end up with
a mighty complex decoding algorithm.

Thermal issues:

Atoms move about unless really close to absolute zero, even in solids.  Trying
to measure single atoms might be fairly difficult;

Chemical issues:

OK, are we talking one-time encoding or an encoding that can be used for archival? 
The former is easy, but things like oxidation, photochemistry, etc. will impact just
how dense we can make things.

Quantum issues:

Yes, atoms are quantum beasts.  Measuring them in a non-destructive fashion might
also be difficult.

Backing out of physics, from the standpoint of encoding, you could simply encode the
ASCII representation (what do you do about those EBCDIC Cobol and RPG programs?). 
But code is also notoriously compressible (ala LZW). LZW is nice because it’s dictionary-free;
the dictionary is the previously input data.  You can do even better with Markov
modeling of the code, but you’d have to include the size of the Markov tables as well
in the size. The Markov models could be for individual characters or for lexical items. 
You could also do something along the lines of encoding the parse
tree
of the source file (suitably annotated with comments).  Handling
.h files might be dicey.

Enough with rambles…


From: Ken

Sent: Thursday, July 17, 2003 3:59 PM

To: Mark; Barry; Michael; Wade; Richard;
John; Keven; Praveen; Bruce

Subject: RE: How many lines of code
can fit on the head of a pin?

Importance: Low

11M is the number that Feynman quoted for Lib o’ Congress…I think he gave the speech
where quoted 30M in the late 60’s, but the book I’m “reading” (listening to) jumps
around from childhood right up to the late 80’s. During the speech he actually describes
a method for reading the “small print”…but at the time the technology hadn’t gone
far enough to print it yet. With the rise of the Internet/PC/easy publishing, I’d
be surprised if the number was not at least around 60M these days (double the number
in the 60’s)…while I’d be surprised if it was in the 100M or above, I don’t see even
that sort of number as inconceivable.

As we approach the means to really make good on some of Feynman’s “plenty of room
at the bottom” dreams (Cornell made a 1-atom wide transistor a bit over a year ago),
I find myself looking at things like this more and more. Due to my sordid past, the
notion of a coming up with a good (and defensible) estimate for the number of LOC
in existence strikes me as a fun exercise. The real meat isn’t there, nor is it, other
than as a “perspective exercise”, in figuring out how big (small) of a piece of something
you would need to hold it.

Granting that there will be a host of new difficulties when we start getting components
that are made at the atomic level, I think that we will live to see the truly tiny
become reality. As we creep into nano-tech, the processes and research will (I hope
and believe) begin to build on each other. While we may not be part of the world of
physics (and chemistry, and bio-chem, and…) that will be building these gadgets straight
out of the most far-fetched science fiction, it will be up to us to help define at
least how some of this stuff will benefit the “common man”. What changes do we make
to the operating system, to Office, etc when a person can have a couple of terabytes
of storage with them all the time? How soon before it’s not only feasible but practical
to record everything we read (https://SIS is
close on this front), hear, say, or see…index it all while we sleep, and “auto-fill”
details the next day when we start writing a report?

I used to chuckle when I read old Robert A. Heinlein books and he had a character
load terabytes of information onto a small cube.

Thermal, Quantum, Chemical…and the list goes on, but I do think the world has the
set of minds as well as the preparation of the giants of the last century to beat
these problems (actually, in some ways the thermal might work for us). Anyone want
to adopt me so I can head back to college…thinking I’d enjoy a decade or two back
in school.

K.


From: Bruce

Sent: Thursday, July 17, 2003 6:55 PM

To: Ken; Mark; Barry; Michael; Wade;
Richard; John; Keven; Praveen

Subject: RE: How many lines of code
can fit on the head of a pin?

We are moving to a world where there is more information than a single person can
process.  People will have to become more and more selective in the data they
choose to read.  (or see, or hear, or taste…)  As a computer person, I view
this as a problem to be solved, and my first instinct is to hypothesize a solution
involving a software agent that can selectively choose and display only those facts
and media that fit some criteria we give it.

However, I see this as bringing us to a more devided and insular world.  Democrats
will have agents that spin things they way they want to see them; Republicans likewise. 
I’m sure the sci-fi fans will continue to form their own strange sub-culture. 
Ones entire world view will be shaped by what one chooses to experience, and when
there is a surfeit of information that does appeal, folks will be less and less inclined
to view that which does not.

In some sense, our choice of agent will decide who we are, and who we become. 
(At least to the extent that one believes in nurture over nature.)  Could one
then change oneself by altering the agent programming?  In any kind of serious,
personality-altering way?  What happens if some hacker gets into your agent,
or the men in the black helicopters do?

Beware – here there be dragons


From: Richard

Sent: Friday, July 18, 2003 2:36 PM

To: Bruce; Ken; Mark; Barry; Michael;
Wade; John; Keven; Praveen

Subject: RE: How many lines of code
can fit on the head of a pin?

Everyone reading this thread is the product of a lifetime of agents choosing and biasing
what media we consume.

It starts with our parents and families, and churches, friends, Television, the so
called “mass-media”, then later teachers in school, employers, and government, etc.

These agents also already have their hackers, which can be though of as agents themselves. 
These are the same form as the agents that affect us.  Though they may be different
TV shows, different media, different teachers, and different employers.

Every time you hear of censorship, political correctness, and boycott are examples
of agents attempting to restrict the exposure of content.

Every time you hear of praise and awards agents are promoting content.

Other things like reviews and bias reporting can swing either way.

But it is all affecting the perceived worth of consuming the target content and ideas.

Its all a tight feedback system.  It would seem that the truest “individuals”
would live in the wilderness with no outside influence.  But if we met someone
like that we’d likely not enjoy their company nor would they enjoy ours.  We
enjoy being with people more when we can communication and have thoughts and topics
to share.  This is one of the feedbacks.

We choose our agents when we choose our books, schools, subscriptions, employer, clubs,
homepage, and program our Tivo.

No doubt this does affect who we become.

I wonder if agent selection will ever become as sophisticated as allowing me to choose
what personal traits I’d like to make stronger.

You turn up your agent’s sensitivity dial and Tivo starts recording “Little House
and the Prairie” and “Touched by an Angel”

Ramble on…


From: John

Sent: Friday, July 18, 2003 2:36 PM

To: Bruce; Ken; Mark; Barry; Michael;
Wade; Richard; Keven; Praveen

Subject: RE: How many lines of code
can fit on the head of a pin?

 (This thread is getting pretty far a field.)

As Richard points out knowledge is already being filtered by "agents" (many of which
are not user configurable.) So at the end of the day we have to examine the trust
relationship between ourselves and those agents.

Some of the things I have noticed:

- The less interested you are in a topic, the more you are willing to trust the agent.
The more you are interested (and experienced) in a topic the more scrutiny you will
give to the agent.

- Once an agent has seriously disappointed you never trust the agent again.

- The more you are exposed to agents as a class the less you trust them.

Think about really, really good internet trolls: they contain enough fact, wit, and
divisiveness to cause chaos amongst the most well ordered communities. What about
trolls that build a trust relationship first?

Think about what makes a trust relationship. Generally you have seen some
objective demonstration that convinces you, and you are willing to believe the
demonstrator on related subjects without objective proof. Perhaps the objective demonstration
is abstracted through a certification authority, like a drivers license: you didn't
see me take the test, but I can show you my license. Think about all of the points
of failure in this trust system. Having or not having a drivers license doesn't prove
or disprove ability to drive.

While you could certainly manipulate people through media collection agents,
There is still (and always will be) informal agents with a disproportionate amount
of trust. If your grandmother told you something you would probably believe her, unless
you had objective proof, or special knowledge. Grandma is an informal agent that gets
a huge amount of trust, and it is very difficult to erode that trust (unless she drinks
or something)

The concept of tweaking your agents for "self-improvement" is interesting (assuming
a direct correlation between watching lots of Lifetime TV and being more sympathetic)
although you would also have to tweak your informal agents as well. Basically you
watch all of the Oprah in the world but until you quit the Hell's Angels I don't think
you are going to start crying at movies.

- John