by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
FCW : April 15, 2013
2. Map, reduce and crunch 3. Draw insights from the results 1. Collect data of all types datasets. Skinner s group evaluates new technologies and makes them accessible to the thousands of scientists who use the lab s National Energy Research Scienti c Comput- ing Center (NERSC). "We re very interested in technologies that deliver data transparency and allow people to do analysis with large sets of data," said Skinner, whose group has been explor- ing scalable data solutions for a couple of years. "Science is increasingly inundated with data. If we can revolutionize the way we think about what scientists can do with data analy- sis, it would change the perspective on what is possible." The fundamentals Hadoop evolved out of Google researchers work on the MapReduce framework, which Yahoo programmers brought into the open-source Apache environment. Core Hadoop consists of the Hadoop Distributed File System for stor- age and the MapReduce framework for processing. Que- ries migrate to the data rather than pulling the data into the analysis, yielding fast load times but potentially slower queries. In addition, Hadoop queries require higher-level programming skills compared with the user-friendly SQL, so developers have released additional software solutions with colorful names such as Cassandra, HBase, Hive, Pig and ZooKeeper to make it easier to program Hadoop and perform complex analyses. "Like a database, Hadoop is a mechanism for storing, manipulating and querying data," said Steven Hillion, chief product of cer at Alpine Data Labs. "Unlike databases, Hadoop can handle data in a very uid way. It doesn t insist that you ve structured your data. Hadoop is sort of a big dumping ground for whatever data you can throw at it. People who have struggled to deal with big data have found Hadoop to be a cheap and exible and powerful platform for dealing with these very large volumes of unstructured and uid data." Because Hadoop evolved in the Internet space --- Linked- In and Facebook were early adopters --- it is well-suited to the kind of data you nd in those environments: log les, text les and the like. However, users should be aware of the upsides and downsides to parallel processing, which is Hadoop s salient characteristic. "While the MapReduce programming model is very power- ful because it makes it very easy to express a problem and April 15, 2013 FCW.COM 31 How it works: Querying big data in Hadoop Unlike traditional databases, Hadoop works well with both structured and unstructured data --- server logs, social media streams, images, geodata, raw text and more. Once the master node aggregates the query results, the information can be fed into dashboards, business intelligence systems and other analytical tools. Hadoop makes the data usable but generally does not deliver the insights directly to the end user. The Hadoop environment uses a single "master node" to manage the names and jobs, and then any number of servers to house the data and run queries against it. This distributed approach allows for rapid scaling, and also provides redundancy and fault tolerance. Multimedia GIS data Plain text More on FCW.com: Caveats to consider
March 30, 2013
April 30, 2013