The first will be held in lectures October 24th, the second will be held in the university final exam period: 1-4pm on Thursday Dec. 13th
Each will focus on material from the most recent course material, but the later exam will probably draw on earlier material and discussions.
The format will be four essay style questions, all equally weighted.
The exams are open book, open notes, but no electronics permitted.
Here are the collected 2011 midterm questions
(from the three 'midterms' that were held that spring).
For the 2012
version of the course, questions 1-6 would be suitable for the October
midterm, while 7-13 would be suitable for the December exam.
Other sample questions for the two midterms are listed below.
CSCI 485 Sample Questions: Midterm 1 ==================================== Research project ---------------- (i) Briefly explain your research project topic, emphasizing (ii) what about it is technically challenging with respect to the course content (iii) why, at presentation time, will the topic be of interest to other students in the class Lab project i ------------- For the current lab project, the server needs to know when clients change their location, task, or activity. There are two very different viable approaches: (1) the client sends an update every few seconds, giving current location/task/activity, and doesn't require acknowledgements to these (2) the client only sends an update when they actually change their location/task/activity, but then requires an acknowledgement to be certain the server got it Question: (i) when/why would (1) be a superior approach (ii) when/why would (2) be a superior approach Lab project ii -------------- For the current lab project, suppose the clients needed to be able to send messages to one another, either directly or routed through the server. Describe the design issues and alterations this change would force in general, and with your solution approach in particular. Lab project iii --------------- For the current lab project, suppose the client was meant to be run on a mobile device mounted in a vehicle, with the expectation that connectivity would frequently be lost for short intervals. Describe how this would change your design and implementation, with appropriate justification/explanation of your answer. Threading i ----------- When threads must communicate, the two primary options are via message passing or via shared data elements. (i) Describe circumstances under which message passing would be the preferable approach and why. (ii) Describe circumstances under which shared data (and semaphores) would be preferable and why. Threading/states ii ------------------- - you have a read thread and a control thread - the read thread uses blocking reads, in a loop like do { char c; cin.get(c); sharedBuffer.enque(c); } while (!quit); where the buffer is shared with the control thread 1. where should the logic to decide/set 'quit = true' go, and why 2. given your answer to (1.), where should any semaphore(s) go, and why Blocking vs nonblocking ----------------------- (i) Are there circumstances under which blocking receives are preferable to nonblocking? Justify your answer. (ii) Are there circumstances under which blocking keyboard input is preferable to nonblocking? Justify your answer. TCP vs UDP ---------- Explain the different circumstances under which a tcp (connection-based) communication system would be preferable to a udp (connectionless) one, and why Message composition ------------------- In lectures we discussed formats that would allow highly flexible yet maintainable message structures. Suppose you were implementing a simple chat room application. Provide a preliminary design for the message structure you would use and justify your design choices. Shared data ----------- One of the design issues that becomes particularly important as system size increases is the mechanism for sharing data critical to establishing communication and validating who you are communicating with. One of the typical approaches involves the potential users somehow finding the appropriate site (e.g. through a search engine) then following a registration/login process, possibly with more limited access for 'guests'. Discuss the potential pitfalls/limitations/drawbacks you can see with such a scheme. Synchronizing systems --------------------- One of the issues when dealing with distributed systems and replicated data is that of synchronization: how can two servers decide what order different events should take place in, especially given variations in system clocks and the time lags involved in communication. This is particularly true in 'eventually consistent' systems, where data transactions percolate through to different servers at different times and in different orders. We discussed the use of time counters - internal counters which each server uses to identify time 'steps', and which are sent as part of each message transmitted. The receiver always updates their internal counter to be the greater of their current counter and the timestamp received as part of a message. Discuss the potential pitfalls/weaknesses you see in such a synchronization scheme, and suggest potential workarounds. Data handling ------------- Suppose you were designing/implementing an application 'from scratch'. Describe the criteria you would use to decide if a traditional RDBMS was the appropriate data storage mechanism for your application, and justify your decision. Data scaling ------------ Suppose within the first few years after release, the scale/scope of your data storage needs wildly exceeded your original design expectations. Assuming the original design centered around an RDBMS system, describe (i) describe (and justify) the approaches you would use to try to cope with the increased data handling needs (ii) the criteria you would use to decide when an RDBMS was simply no longer a practical (i.e. sufficiently scalable) solution to your problem Scaling an RDBMS ---------------- One suggestion on large scaling was to grossly denormalize the DB, actually running several seperate databases, meant to model the same (or, at least, overlapping) set of logical data but with completely different internal designs. Each DB would be specially designed to handle specific kinds of queries. Discuss the design/implementation difficulties associated with such an approach. Client-side storage ------------------- Suppose most of your users always connect from their 'regular' machine, be it a desktop, laptop, or phone, at home or at work. However, a significant minority of your users regularly connect from public or shared machines. Discuss the implications this has for the design of your client-side data storage. Sharding i ---------- First, describe an application that you think would be highly suitable for sharding, and why it is highly suitable. Second, describe a data-driven application whose scale is sufficient to consider sharding, but where the nature of the data or queries makes it unsuitable for sharding. Sharding ii ----------- Describe what you see as the most important criteria to consider when investigating whether sharding is the right approach for your high-volume, high-demand data management solution, and justify your answer. Erlang ------ We have spent some time in lectures and labs examining erlang. Briefly discuss the key strengths and weaknesses of erlang for use in highly distributed, highly scalable applications. Distributed rate limiting ------------------------- In recent lectures and labs we examined some of the issues associated with distributed rate limiting. One of the solutions discussed was allowing nodes in the system to 'trade' capacity for the resource being distributed. Inter-node communication can become an issue in such a scheme, especially if nodes are widely seperated geographically (increasing both the time required to communicate and the likelihood of failures in links between the nodes). Consider the following tiered variation of the capacity-trading idea: Nodes are grouped into clusters which are geographically very close to one another, each cluster has one lead server. Nodes within a cluster can trade capacity with one another. Clusters are grouped into data centres, covering larger geographical areas, each data centre has one lead server The clusters (through their lead servers) can trade their capacities with one another. (The lead servers would then tell the other nodes within the cluster to scale up or down their current capacities as appropriate.) Data centres (through their lead servers) can trade their total capacities with one another (again with the lead servers telling their clusters to scale up/down, with the clusters then telling their nodes to scale up/down). Discuss the potential problems and benefits associated with such a scheme. CSCI 485 Sample Questions: Midterm 2 ==================================== Cassandra vs Amazon S3 ---------------------- In the lectures we discussed Ebay's use of Cassandra for their write-heavy operations (likes, owns, wants, etc). Compare and constrast the db/storage architecture used there with the one used by Amazon's Simple Storage Server. Consistent Hashing ------------------ Describe the concept of consistent hashing and how it is used by Amazon, Google, and others as a key part of their scalable data storage approaches. Skype ----- If you had to re-do Skype using a data storage system patterned after one of the following: Amazon's S3, Google's BigTable, EBay's use of Cassandra, or Facebook's use of HBase, discuss which you would choose and why. Philosophy ---------- Compare and contrast the technical and business philosophies of either (i) Ebay and Amazon, or (ii) Facebook and Twitter. In particular, discuss the impact the philosophies have on the design and operation of their hardware/software systems. Vector clocks ------------- Clearly describe and discuss the concept of vector clocks and give an example of their use in synchronizing actions across multiple servers. Optimization ------------ Many of the systems we have examined in recent weeks rely heavily on partitioning a technical problem into distinct layers or distinct subproblems and optimizing solutions for each of the layers/subproblems seperately. Discuss and give examples of this for each of Twitter, Google, and Ebay. Multiplayer gaming ------------------ If you were to re-architecture Eve Online's data storage infrastructure, what are the key approaches you would investigate and why? Meta-operating systems ---------------------- Many of the systems we have examined in recent weeks include something like a meta-operating system as a key part of their infrastructure, for automated monitoring, control, and repair of their networks and for automated roll-out of code updates. Compare and contrast the meta-operating system used by two of the following: Google, EBay, Facebook, or Amazon. Privacy and Security -------------------- Describe, compare and constrast the privacy and security issues associated with Facebook, Skype, and Ebay. Search engines -------------- Describe and discuss the search engine and supporting infrastructure for one of the following systems: Facebook, Google, Twitter.