The Data Scientist’s Computer


Everyone uses a computer for lots of things, from e-mail to chat, from gaming to office work. And yet, there are some specific needs a Data Scientist has for their primary system.

While I don’t recommend a specific brand or model (these things change too quickly to make this notebook entry useful for any length of time), there are four things to think about inside a PC or laptop:

  1. CPU
  2. Disk (or I/O)
  3. Memory
  4. Network

And three things to think about outside a PC or laptop:

  1. Screen
  2. Input (Keyboards, mice, etc.)
  3. Output (Ports)

I’ll cover a longer explanation in another tutorial, but if you want the “Bottom Line, Up Front” (BLUF), here are the things you need to know:

If you use R

  • Since R loads data into a dataframe in memory, you need lots of RAM. The more the better, and the faster the better. Spend money on this component.
  • More, and fast, storage is also important if you plan to store the input and output data on your own system.
  • The CPU is less important, but of course the faster your budget allows is better.
  • The keyboard is important, a gaming mouse is not. Screen is not as important in R if you are not taking the data to the next step and visualizing it.
  • Fast networking is only important if you are importing or exporting the data over the wire.

If you use Python

  • The CPU is important, and since you can parallelize the processing, more cores are better. Spend more money on this component.
  • Memory is important, but skew the budget towards CPU.
  • Faster and more storage is important if you plan to ingress and output the data locally.
  • The keyboard is important, and the mouse is important.
  • The screen is not as important in Python if you are not taking the data to the next step and visualizing it.
  • Fast networking is only important if you are importing or exporting the data over the wire.

 

If you use Hadoop, Virtual Machines, or Machine Learning

Here, everything is important. Since all of these systems use distributed processing, all four internal components should be as many as you can, as fast as you can, and as new as you can. Screen size, or multiple screens, become important so that you can see all of the panels these systems display. Keyboard and mouse is essential, since you’ll navigate quickly among lots of interfaces. And ports come into play here as well – you’ll often need to connect to external storage or even run a Virtual Machine on external storage. You’ll also need to think about taking data in from the “Internet of Things”, so you may need more than one networking or other interface to stream data in or out.

If you’re developing against a larger or distributed system

In my case, I only focus on two things: Lots of screens (and big ones), and a really nice keyboard and trackball.

My production environment, and in some cases my development environment, is Microsoft Azure (although other cloud platforms exist, as I understand it), so I have tens of thousands of cores at my command at any time. My process is to design and create the systems locally, and then I deploy that to a distributed system that grows and shrinks with demand. In some cases (like AzureML), I can develop online, so anything with a good screen and keyboard is all I need.

However, I use a tower system so that I have a dedicated graphics card to push three or more monitors. I use two monitors for development, and the third as my monitor for presenting. That third monitor is relatively small and cheap, so that the windows I present are large and readable for my students. I use an HD camera for recording.

I also have a gaming keyboard and a Logitech “marble” thumb-ball track system.

I don’t watch TV, so I put all my money into fiber Internet access for presenting and teaching.

Speaking of teaching, I triple-boot my system to Windows 7 (thanks a lot, WebEx for requiring that), Windows 10, and Ubuntu Linux, depending on what I am teaching. I can’t use a VM for the multiple OS’s since I need to present and develop on the OS I’m teaching, so I need the ports direct for the HD camera and so on. You might have a similar need if you are presenting a great deal or doing visualizations.

For travel – I use a Microsoft Surface Pro 3. I like having a tablet to read on the plane, and I like that it’s a full computer for presentations and work. I can still remote-desktop or SSH to my Azure systems from the Surface.

Those are the general guidelines. Your mileage may vary, and if you want to really go deep into the tech here are some resources to check out:

Comments (0)

Skip to main content