Saturday, December 28, 2013

Streaming Systems history, part 3

Finally, here we are on the third part of the streaming system history. I'm sorry it took so long, but I had many, many things to do for work and I was very busy with real life stuff.
In this part I'm going to introduce systems from 2011. If you find any mistake or anything incomplete feel free to let me know by commenting the post! Not all the 2011 systems are shown in this post, I will complete it with the final, fourth part.

In this part I'm going to quickly talk about Storm (2011) and WebRTC (2011).

Storm is a distributed computational environment, free and open source (check out the link!) originally developed by Twitter. The building blocks, where information can be manipulated and created are called "spouts" (blocks that produce data) and "bolts" (blocks that receive streamed data). With them, developers can build a pipeline; they are indeed very similar to MapReduce jobs, with the only difference that they can (theoretically) run forever. Every spout and bolt can be run by one or more worker, where the core of the Storm computation is executed.

Given spouts and bolts as basic building blocks for a pipeline, developers can build Direct Acyclic Graph pipelines. The programming language is Java, thus the programming representation is imperative. Storm pipelines can be deployed on the cloud: that is they can run on local machine and have spouts or bolts somewhere on the cloud. This implementation detail became very important as time passed by. The pipeline is dynamic at node level, that is a Storm spout or bolt can alter the number of workers at runtime. Faults are dealt by reconfiguring the pipeline at runtime thanks to some controller threads that constantly run checking for errors. I call these kind of controllers "dynamic adaptive", like the ones seen in DryadLINQ and S4.

Web Real-Time Communication (WebRTC) is an API currently drafted by the World Wide Web Consortium (W3C). It enables browser-to-browser connection for applications like video, chat or voIP. Many examples of this API have been implemented since its initial release, for example peer-to-peer file sharing or video conferencing applications. This API is not a real streaming system per se, but brings a whole new connectivity for what concerns browser applications, opening path for future browser streaming applications.

Given the fact that WebRTC is only an API, we can build any kind of pipeline (arbitrary pipelines). It can only be programmed in JavaScript, thus likewise Storm, its programming representation is imperative while the deployment, for now, is only the web browser. The pipeline flexibility is dynamic at topology level, as nodes (browsers) can connect and disconnect from the pipeline at runtime. Of course there is no fault tolerance or load balancing, thus once the pipeline is disrupted, it can't be reconstructed again. Ideally, we would need a software on top of it to do so.

This was a small part, in respect to the previous two. As I mentioned, I'm very busy lately thus I only found time to write this. The next part will hopefully be the last one (this one was supposed to be, but it turned out to be very long).
See ya!


DYHV7EDQ28UB