Thursday, January 22, 2009

Computer Science 101 - For the Edification of My Dear Readers

Yesterday, one of things I discussed with JLK was how to the general population "Computer Science" really doesn't mean anything. She said she had wondered what a thesis in computer science would really entail, and she didn't know. The number of times someone has asked me what I do, and I say, "I'm a grad student in computer science," I usually get a really blank, "oh, ok." So, I thought I'd do a series of posts on what computer science is and talk about what some of the really exciting problems in it are.

I'll start with just a basic CS 101.

CAVEAT: There are a lot of simplifications in here, so if you're an actual computer scientist and take issue with something, please don't complain about it - this is not intended for the edification of computer scientists, but rather non-computer scientists.

What is computer science?

To start off, I went to wikipedia. My concept of computer science is so large that I had a hard time coming up with a reasonably concise descriptor. I have to say, I was pretty impressed with the shakedown in their entry.

The first sentence:
Computer science (or computing science) is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems.

I like this definition. It's a good one, and it has a lot to say about the many layers of computer science. Basically, you want computers to do stuff for you. In order for this to happen, you need to express to the computer how to do what you want it to do. Since computers are, well, computers, you can't say to it, "well, you see, when this happens do this, but when that happens, do that, but sometimes you might still want to do the first thing even in the 2nd circumstance...." You also cannot say, "I want X to happen. You figure it out." You need to express to the computer in very very rigorous ways what you want it to do, because, after all, it's kind of like a robot. You need to be clear, cover all the cases, and tell it exactly what you want it to do in every circumstance.

There are many many levels for how this occurs. There's the design of the computer itself. There is the precise and concise way to approach a problem you want the computer to solve. There is the efficient expression of that approach which you must express to the computer in a programming language, which brings into the mix both the art of programming it as well as the effective design of the computer language itself. So there are a LOT of things going on in computer science.

Now, I want to make sure I get through also what computer science is NOT. Again, I'll defer to wikipedia:

The general public sometimes confuses computer science with other vocational areas that deal with computers, such as information technology (IT), or think that it relates to their own experience of computers, which typically involves activities such as gaming, web-browsing, and word-processing. However, the focus of computer science is more on understanding the properties of the programs used to implement software such as games and web-browsers, and using that understanding to create new programs or improve existing ones.[5]

Anyway. So, I'll go through a little example that goes through many of the layers of computer science in the context of one problem. Let's talk about a simple idea of search.

Whole Shebang Example - Search

Algorithms, Computability, Complexity

Let's say you are playing "Go Fish". Your opponent asks you, "do you have a 7 of spades?" You look at your deck, and look for the 7 of spades. What's the easiest and most straightforward way to do this? Start from the beginning, and look at each card in sequence, right? That kind of approach is actually really easy to express to a computer.

However, this is not what we'd call a scalable solution. It's fine for Go Fish, but what if you are trying to find X amongst N items? And N is, say, 1 trillion? Is searching from the beginning to end really the best way? And what if you are playing Mega Go Fish, and you have to search for X this turn, and then Y next turn, and then Z? Is searching through (up to) 1 trillion things EACH TIME the best way to go? No way Jose!!! So, this type of approach (algorithm, in CS-speak), is not scalable and described as O(N). In other words, the length of time to solve the problem in the general case scaled linearly with N.

Now let's say you are a smart Mega Go Fish player, and you keep your deck of cards sorted in order all the time. In this case, you can use the fact that your cards are sorted to use another sort of search - called Binary Search. Basically, you cut your deck in half. If X is greater than the value of the middle card, you focus only on the upper half. If X is smaller than the middle card, you focus on the lower half. Then you repeat this process, until you've found your X (or determined that you don't have it). This is a MUCH more efficient way to search, but depends on your deck being sorted at all times. This has a cost of O(logN), i.e. the time it takes to find your card scales with logN, which is significantly better than N.

So, one aspect of computer science is taking basic problems like this and finding more efficient ways to solve them in the general case. Google, for example, keeps track of basically ALL the webpages on the internet. When you type, "PhizzleDizzle" into your search box, do you think Google is iterating through all those pages one after another looking for a match to PhizzleDizzle? I should think not! So that's one aspect of CS - efficient ways to solve certain general classes of problems.

Programming Languages

Now that people are working on this kind of stuff, let's say I want to actually write a program to do it. So, someone needs to have written a language that *I* the human being can understand without wanting to rip my eyes out, and program this search algorithm concisely. Natural human language won't do. So let's say I pick the language Python (and code the basic search algorithm):

found = False
for i in mega_go_fish_deck:
if i == search_value:
found = True

if found:
print "success!"
print "failed - not in deck"

I'm sure even if you're not a computer scientist this largely made sense to you, right?

Now, there are other languages, like C, where the program would look similar but not the same. I'm no HTML master, and I'm too lazy to get this to actually formate correctly, so I'm not going to write out the code. But suffice it to say that one aspect of language design is anticipating the types of things a programmer will want to do quickly and easily. For example, the print function I have in the Python code is actually decomposable into lots of little parts, but the designers of the language know that it's useful and people would want to do it all the time. So they built print into the language so that the human doing the coding doesn't need to reinvent the wheel every time. As computational needs change, so can languages (though that changes kind of slowly), and so that's why there are tons of different languages out there, many targeted specifically for a certain purpose, with various tradeoffs. For example, there are certain things that are hard to express in the C language that are easy to express in Python. At the same time, Python can be a lot slower than C for reasons I won't go into. Depending on what you're trying to do, one might be more appropriate than another.


Now, you've also go to be able to translate this human-y language down to the computer level. Remember, these languages have been made for humans to work with, but you've still got to boil it down to 1's and 0's for the actual computer itself. There is also a lot of work in this whole "translation" area (called compilation in CS-speak). In fact, the first female winner of the Turing Award, Dr. Frances Allen, did a lot of work in compilers. In the old days, people just had to write in machine language, and man, that's a bitch.


Then there's the design of the computer itself. In the end, this lovely algorithm you've figured out, then written, then perhaps the end it's got to execute on a real-live computer, and there are lots of different ways to make a computer too. So, there's a whole field of people trying to make computers faster, better, more suited to our needs, etc. This is the type of stuff Intel and AMD work on.

Detailed Design of Basic Building Blocks for Making a Computer

This is starting to get into Electrical Engineering, and in some ways Material Science. EE really is quite adjacent to CS.


As you can see, there are a lot of different things that count as computer science. I've sort of taken a whole shebang example and talked it from top to bottom, but you can also move laterally and make the same top-to-bottom approach. At the top, here I talked about search algorithms. There are also domains like Artificial Intelligence and Graphics, where you do similar high-level theoretical approaches from the top, lots of math, and then in the end it also has to boil down to the bottom, actual hardware. One tough thing about computer science is because it scales from the theoretical down to actually physical transistors, that's a lot of levels of work that are striated and largely remain independent for comparative advantage purposes, but in the end the whole system has to work all the way up and down. It's a lot to wrap your head around.

Anyway, I hope this was helpful to my dear readers, and let me know if you have any questions.


EthidiumBromide said...

This was actually a really fascinating post to read. ::Applause::

I wish I was half as good at breaking down what I do to people outside of science... I basically just say I make drugs better and leave it at that.

PhizzleDizzle said...

Oh, why thank you EB :). I'm glad you liked it.

One day, if you decide to do so, I'd also love to hear about how you make drugs better. :)

Silver Fox said...

My bro is a computer programmer who came into that by way of a degree in math. Probably a lot of what he does fits under the IT umbrella, I would guess, although IT did not exist when he started his work back in the 80's. One thing he was involved in was figuring out ways to fix various networked computer y2k problems, and then implementing the fixes.

Anyway, thanks for your discussion of the large field you work in!

Frozone said...

I enjoyed reading this! 'Especially the distinction with I.T.-- for those times when I fail to help someone with their computer problem and they protest, "but you're in computer science!". I sorta feel guilty that I should know more than I do about repairing them.... But I.T. is a different field. Like Dijkstra said, "computer science is no more about computers than astronomy is about telescopes."

PhizzleDizzle said...

Thanks Frozone!! And I know the feeling, sometimes Mr. Phizz accuses me of being "useless" when I can't fix his computer :). I might be somewhat well-versed in computers, but I am by no means an expert. For a number of things, you are better off going to Geek Squad than me :). I really like the Dijkstra quote - I've heard of it but had forgotten about it, and it's so true.

It looks like you are a fellow CS chick - woohoo, found another one! :) I'll start checking out your blog.

PhizzleDizzle said...

Silver Fox, I'd love a description of geology someday - because I don't know much about it. I know it's about rocks. But I'm sure there's a lot more to it than that. If you get a moment....hint hint....:).

Mrs. Comet Hunter said...

Thanks for sharing! I love hearing people break down what they do into terms all of us can understand :)

Like Dijkstra said, "computer science is no more about computers than astronomy is about telescopes."

I like it, especially since I'm in astronomy - even worse is when people ask me about 1) the constellations or 2) Greek mythology associated with those constellations. I know more now because I do a lot of outreach, but we just don't study that stuff ;)

Silver Fox said...

PhizDizz, I'll have to think about this one for a bit, and now that I have some time, I've been thinking about posting something about consulting. And I see your call for posts!

JLK said...

Yay! This parallels so much of what you said in our conversation. You're really, really good at breaking this stuff down into easy-to-understand terms!

I totally want to do your WTF do you do meme, but I really shouldn't do it today (I might do it tonight anyway). If not, you have to remind me when I get back!

Gail said...

I like the go-fish example. I think I might "steal" that for my CS mini-course (taught to grade 8 girls). :D

Nicky said...

I said that Dijkstra quote to somebody outside of CS once (probably when they wanted me to fix their computer - ha!) and they responded with this: "Yeah, but astronomers don't go around calling their field Telescope Science, so you can see people's confusion." Which is an excellent point.

The obvious follow-up question then, is what would be a better name for Computer Science? Information Science? Informatics? Complexity? ....

PhizzleDizzle said...

Nicky, that's such a good point! Perhaps the right term would be computational science? I don't know the right answer to that one.

Gail - have fun with the mini-course!

Cath@VWXYNot? said...

Wait, so you can't help me figure out why my scanner dosn't always detect colours?


Nice summary, I understand it a lot better now!

PhizzleDizzle said...

Cath, HA!!! Right now I can't even get my printer to print. I think I accidentally deleted my printer drivers. Oops :).