CSCI 485: Spring 2011 Collected Midterm Questions


Question 1: Research project

  1. Briefly explain your research project topic.
  2. Describe what aspects of your project are the most technically challenging with respect to the course topics.
  3. For your anticipated presentation, describe what aspects of the project should be of most interest to other students in the class.


Question 2: Lab project

For the current lab project, the server needs to know when clients change their location, task, or activity.

There are two very different viable approaches:

  1. The client sends an update every few seconds, giving the user's current location, task, and activity, and does not require acknowledgements to these updates.
  2. The client only sends an update when the user actually changes their location, task, or activity, but then requires an acknowledgement to be certain the server got it.

First, describe when/why the first approach would be superior, then describe when/why the second would be superior.


Question 3: blocking vs nonblocking

  1. Are there circumstances under which blocking receives are preferable to nonblocking? Justify your answer.
  2. Are there circumstances under which blocking keyboard input is preferable to nonblocking? Justify your answer.


Question 4: TCP vs UDP

Explain the different circumstances under which a tcp-based communication system would be preferable to a udp-based one, and why.


Question 5: Scaling an RDBMS

One suggestion on large scaling was to grossly denormalize the DB, actually running several seperate databases, meant to model the same (or, at least, overlapping) set of logical data but with completely different internal designs.

Each DB would be specially designed to handle specific kinds of queries.

Discuss the design/implementation difficulties associated with such an approach.


Question 6: Database sharding
Effective use of database sharding is critical to many of the massively scaled systems we discuss this semester. Describe and explain circumstances under which database sharding would be extremely difficult to carry out.


Question 7: Skype
Skype's use of client machines to act as 'supernodes' is what enables Skype to survive without massive infrastructure investments, but is also the basis for most of the privacy and resource concerns people have about Skype. Carefully explain what concerns you most about the way Skype operates and why.


Question 8: Amazon
One of the key ideas we talked about with respect to Amazon's data handling approach was the idea of 'always writeable'. Explain this concept in your own words and discuss what makes this such a radical departure from traditional database approaches.


Question 9: YouTube
To be successful, YouTube has had to solve a number of conflicting scalability problems (searching, streaming, thumbnail delivery, etc). Describe which aspect of YouTube's solution you find most interesting and carefully explain the technical challenge behind it.


Question 10: Search engines part I: web crawlers

A search engine crawler needs plans/policies in four key areas:

  1. selection policy: which urls to visit
  2. revisit policy: when to check/revisit a previously visited page
  3. politeness policy: to limit the intrusiveness of the crawler
  4. parallelization policy: to allow simultaneous exploration by a collection of 'crawlers'
Briefly summarize the key objectives and problems associated with each of those areas.


Question 11: Search engines part II: query handling

In your own words, explain the role of Bigtable, tablets, and the Google File System (GFS) in query handling for the Google search engine.


Question 12: Multiplayer game scaling: Eve's one big world approach

Discuss the validity (or lack thereof) of the following claim:


Question 13: Message handling: Twitter and Facebook

Discuss the validity (or lack thereof) of the following claim: