Some notes on Data management

There is a new book i found during a great talk by Mathew Graham, (Center for Advanced Computing Research, California Institute of Technology, USA) i heard at a summerschool for Astrostatisics and Datamining on La Palma, which i just attend:

“The fourth Paradigm”.

I strongly recommend taking a look at it!

Comment: The “fourth” paradigm means: there are some paradigms in science (if you dont know what a paradigma is take a look at http://en.wikipedia.org/wiki/Paradigma):

1. Experiments
2. Theory
3. Numercal simulations
4. Data driven science

It seems that Theory and Simulations correlate in the same way as experiments and data driven science.

Some notes:

  • hadoop: reads HDFS
  • NoSql, manages Petabytes in a non-relational database
  • SciDB -> works with arrays instead of tables, Query languages: AQL, AFL
  • MapReduce ->used in  Astronomy in the Cloud (http://arxiv.org/abs/1010.1015v1): Allows to parallelize the map and the reduce process on different nodes (see http://en.wikipedia.org/wiki/MapReduce). It is used to completely regenerate Google’s index. It therefore uses atomic database transactions (http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29)
  • HIVE: organizes data into tables, queries are converted into MapReduce jobs, allows bucketing in multiple dimensions. You can still use SQL Syntax!
  • Pregel: Graphs

This entry was posted in IT/Software/etc., Philosophie/Wissenschaftstheorie. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>