CSCI 485: exams

There will be two exams for the course, each worth 25% of your final grade.

The first will be held in lectures October 24th, the second will be held in the university final exam period: 1-4pm on Thursday Dec. 13th

Each will focus on material from the most recent course material, but the later exam will probably draw on earlier material and discussions.

The format will be four essay style questions, all equally weighted.

The exams are open book, open notes, but no electronics permitted.

Here are the collected 2011 midterm questions (from the three 'midterms' that were held that spring).
For the 2012 version of the course, questions 1-6 would be suitable for the October midterm, while 7-13 would be suitable for the December exam.

Other sample questions for the two midterms are listed below.


CSCI 485 Sample Questions: Midterm 1
====================================

Research project
----------------
 (i)   Briefly explain your research project topic, emphasizing
 (ii)  what about it is technically challenging with     
       respect to the course content
 (iii) why, at presentation time, will the topic be of
       interest to other students in the class


Lab project i
-------------
  For the current lab project, the server needs to
    know when clients change their location, task,
    or activity.
  There are two very different viable approaches:
    (1) the client sends an update every few seconds,
        giving current location/task/activity,
        and doesn't require acknowledgements to these
    (2) the client only sends an update when they
        actually change their location/task/activity,
        but then requires an acknowledgement to be
        certain the server got it
  Question:
    (i)  when/why would (1) be a superior approach
    (ii) when/why would (2) be a superior approach


Lab project ii
--------------
For the current lab project, suppose the clients
   needed to be able to send messages to one another,
   either directly or routed through the server.
Describe the design issues and alterations this change 
   would force in general, and with your solution approach
   in particular.
   

Lab project iii
---------------
For the current lab project, suppose the client was meant
   to be run on a mobile device mounted in a vehicle, with
   the expectation that connectivity would frequently be
   lost for short intervals.
Describe how this would change your design and implementation,
   with appropriate justification/explanation of your answer.


Threading i
-----------
When threads must communicate, the two primary options are
via message passing or via shared data elements.

(i) Describe circumstances under which message passing
    would be the preferable approach and why.
(ii) Describe circumstances under which shared data
     (and semaphores) would be preferable and why.


Threading/states ii
-------------------
  - you have a read thread and a control thread
  - the read thread uses blocking reads, in a loop like
     do {
        char c;
        cin.get(c);
        sharedBuffer.enque(c);
     } while (!quit);
    where the buffer is shared with the control thread
  1. where should the logic to decide/set 'quit = true' go, 
     and why
  2. given your answer to (1.), 
     where should any semaphore(s) go, and why


Blocking vs nonblocking
-----------------------
  (i)  Are there circumstances under which blocking receives
       are preferable to nonblocking?  Justify your answer.
  (ii) Are there circumstances under which blocking keyboard
       input is preferable to nonblocking?  Justify your answer.


TCP vs UDP
----------
  Explain the different circumstances under which a 
  tcp (connection-based) communication system would 
  be preferable to a udp (connectionless) one, and why


Message composition
-------------------
  In lectures we discussed formats that would allow
  highly flexible yet maintainable message structures.

  Suppose you were implementing a simple chat room
  application.  Provide a preliminary design for the 
  message structure you would use and justify your 
  design choices.


Shared data
-----------
  One of the design issues that becomes particularly
  important as system size increases is the mechanism
  for sharing data critical to establishing communication
  and validating who you are communicating with.

  One of the typical approaches involves the potential users
  somehow finding the appropriate site (e.g. through a search
  engine) then following a registration/login process, possibly
  with more limited access for 'guests'.

  Discuss the potential pitfalls/limitations/drawbacks you can
  see with such a scheme.

 
Synchronizing systems
---------------------
  One of the issues when dealing with distributed systems
  and replicated data is that of synchronization: how can
  two servers decide what order different events should
  take place in, especially given variations in system clocks
  and the time lags involved in communication.
 
  This is particularly true in 'eventually consistent' systems,
  where data transactions percolate through to different servers
  at different times and in different orders.
 
  We discussed the use of time counters - internal counters which
  each server uses to identify time 'steps', and which are sent
  as part of each message transmitted.  The receiver always updates
  their internal counter to be the greater of their current counter
  and the timestamp received as part of a message.

  Discuss the potential pitfalls/weaknesses you see in such a
  synchronization scheme, and suggest potential workarounds.


Data handling
-------------
  Suppose you were designing/implementing an application
  'from scratch'.  Describe the criteria you would use
  to decide if a traditional RDBMS was the appropriate
  data storage mechanism for your application, and justify
  your decision.


Data scaling
------------
  Suppose within the first few years after release, the 
  scale/scope of your data storage needs wildly exceeded 
  your original design expectations.

  Assuming the original design centered around an RDBMS
  system, describe 
    (i)  describe (and justify) the approaches you would use
         to try to cope with the increased data handling needs
    (ii) the criteria you would use to decide when an
         RDBMS was simply no longer a practical (i.e.
         sufficiently scalable) solution to your problem


Scaling an RDBMS
----------------
One suggestion on large scaling was to grossly denormalize the DB,
   actually running several seperate databases, meant to model the
   same (or, at least, overlapping) set of logical data but with
   completely different internal designs.
Each DB would be specially designed to handle specific kinds of queries.

Discuss the design/implementation difficulties associated with
   such an approach.


Client-side storage
-------------------
Suppose most of your users always connect from their 'regular' machine,
   be it a desktop, laptop, or phone, at home or at work.
However, a significant minority of your users regularly connect from
   public or shared machines.
Discuss  the implications this has for the design of your client-side
   data storage.


Sharding i
----------
First, describe an application that you think would be highly suitable
   for sharding, and why it is highly suitable.
Second, describe a data-driven application whose scale is sufficient
   to consider sharding, but where the nature of the data or queries
   makes it unsuitable for sharding.


Sharding ii
-----------
Describe what you see as the most important criteria to consider
when investigating whether sharding is the right approach for
your high-volume, high-demand data management solution, and 
justify your answer.


Erlang
------
We have spent some time in lectures and labs examining erlang.
   Briefly discuss the key strengths and weaknesses of erlang
   for use in highly distributed, highly scalable applications.


Distributed rate limiting
-------------------------
In recent lectures and labs we examined some of the issues associated
   with distributed rate limiting.
One of the solutions discussed was allowing nodes in the system
   to 'trade' capacity for the resource being distributed.

Inter-node communication can become an issue in such a scheme,
   especially if nodes are widely seperated geographically
   (increasing both the time required to communicate and the
    likelihood of failures in links between the nodes).

Consider the following tiered variation of the capacity-trading idea:
   Nodes are grouped into clusters which are geographically 
       very close to one another, each cluster has one lead server.
       Nodes within a cluster can trade capacity with one another.
   Clusters are grouped into data centres, covering larger
       geographical areas, each data centre has one lead server
       The clusters (through their lead servers) can trade their
       capacities with one another.  (The lead servers would then
       tell the other nodes within the cluster to scale up or down
       their current capacities as appropriate.)
   Data centres (through their lead servers) can trade their total
       capacities with one another (again with the lead servers telling
       their clusters to scale up/down, with the clusters then telling
       their nodes to scale up/down).
Discuss the potential problems and benefits associated with such a scheme.

CSCI 485 Sample Questions: Midterm 2
====================================

Cassandra vs Amazon S3
----------------------

In the lectures we discussed Ebay's use of Cassandra for their
write-heavy operations (likes, owns, wants, etc).  Compare and
constrast the db/storage architecture used there with the one
used by Amazon's Simple Storage Server.

Consistent Hashing
------------------

Describe the concept of consistent hashing and how it is used
by Amazon, Google, and others as a key part of their scalable
data storage approaches.

Skype
-----

If you had to re-do Skype using a data storage system patterned
after one of the following: Amazon's S3, Google's BigTable, EBay's
use of Cassandra, or Facebook's use of HBase, discuss which you
would choose and why.

Philosophy
----------

Compare and contrast the technical and business philosophies
of either (i) Ebay and Amazon, or (ii) Facebook and Twitter.

In particular, discuss the impact the philosophies have on
the design and operation of their hardware/software systems.

Vector clocks
-------------

Clearly describe and discuss the concept of vector clocks and 
give an example of their use in synchronizing actions across 
multiple servers.

Optimization
------------

Many of the systems we have examined in recent weeks rely heavily
on partitioning a technical problem into distinct layers or distinct
subproblems and optimizing solutions for each of the layers/subproblems
seperately.

Discuss and give examples of this for each of Twitter, Google, and Ebay.

Multiplayer gaming
------------------

If you were to re-architecture Eve Online's data storage infrastructure,
what are the key approaches you would investigate and why?

Meta-operating systems
----------------------

Many of the systems we have examined in recent weeks include something
like a meta-operating system as a key part of their infrastructure,
for automated monitoring, control, and repair of their networks and
for automated roll-out of code updates.

Compare and contrast the meta-operating system used by
 two of the following: Google, EBay, Facebook, or Amazon.

Privacy and Security
--------------------

Describe, compare and constrast the privacy and security issues
associated with Facebook, Skype, and Ebay.

Search engines
--------------

Describe and discuss the search engine and supporting
infrastructure for one of the following systems:
  Facebook, Google, Twitter.