There is a new book i found during a great talk by Mathew Graham, (Center for Advanced Computing Research, California Institute of Technology, USA) i heard at a summerschool for Astrostatisics and Datamining on La Palma, which i just attend:
I strongly recommend taking a look at it!
Comment: The “fourth” paradigm means: there are some paradigms in science (if you dont know what a paradigma is take a look at http://en.wikipedia.org/wiki/Paradigma):
1. Experiments
2. Theory
3. Numercal simulations
4. Data driven science
It seems that Theory and Simulations correlate in the same way as experiments and data driven science.
Some notes:
- hadoop: reads HDFS
- NoSql, manages Petabytes in a non-relational database
- SciDB -> works with arrays instead of tables, Query languages: AQL, AFL
- MapReduce ->used in Astronomy in the Cloud (http://arxiv.org/abs/1010.1015v1): Allows to parallelize the map and the reduce process on different nodes (see http://en.wikipedia.org/wiki/MapReduce). It is used to completely regenerate Google’s index. It therefore uses atomic database transactions (http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29)
- HIVE: organizes data into tables, queries are converted into MapReduce jobs, allows bucketing in multiple dimensions. You can still use SQL Syntax!
- Pregel: Graphs