by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
FCW : April 15, 2013
What the heck is Hadoop? Every day, people send 150 billion new email messages. The number of mobile devices already exceeds the world s population and is growing. With every keystroke and click, we are creating new data at a blistering pace. This brave new world is a potential treasure trove for data scientists and analysts who can comb through massive amounts of data for new insights, research breakthroughs, undetected fraud or other yet-to-be-discovered purposes. But it also presents a problem for traditional relational databases and analytics tools, which were not built to handle the data being created. Another challenge is the mixed sources and formats, which include XML, log les, objects, text, binary and more. "We have a lot of data in structured databases, traditional relational databases now, but we have data coming in from so many sources that trying to categorize that, classify it and get it entered into a traditional database is beyond the scope of our capabilities," said Jack Collins, director of the Advanced Biomedical Computing Center at the Fred- erick National Laboratory for Cancer Research. "Computer technology is growing rapidly, but the number of [full-time equivalent positions] that we have to work with this is not growing. We have to nd a different way." Enter Apache Hadoop, an open-source, distributed pro- gramming framework that relies on parallel processing to store and analyze tremendous amounts of structured and unstructured data. Although Hadoop is far from the only big-data tool, it is one that has generated remarkable buzz and excitement in recent years. And it offers a pos- sible solution for IT leaders who are realizing that they will soon be buried in more data than they can ef ciently manage and use. "In the last 10 years, this is one of the most important developments because it s really transforming the way we work, our business processes and the way we think about data," said Ed Granstedt, a vice president at predictive ana- lytics rm GoldBot Consulting. "This change is coming, and if government leaders don t understand how to use this change, they re going to get left behind or pushed aside." Why it matters Hadoop is more than just a faster, cheaper database and analytics tool. In some cases, the Hadoop framework lets users query datasets in previously unimaginable ways. Take the Frederick laboratory, whose databases con- tain scienti c knowledge about cancer genes, including the expression levels of a gene and what chromosome it is on. New projects seek to mine literature, scienti c articles, results of clinical trials and adverse-event databases for relat- ed or useful connections. Other researchers are exploring whether big-data analysis of patient blogs, Google searches and Twitter feeds can also provide useful correlations. "In many cases, we re trying to nd associations, so we re doing mining and asking questions that weren t previously imagined," Collins said. Last summer, his team conducted a study of two Hadoop implementations with both real and simulated data to see whether the framework would improve performance and allow for new types of analysis. The project reduced hours- long computations to minutes. The next phase aims to better integrate data and improve visualization of results. "Data is the new natural resource," said Josh Sullivan, a vice president at Booz Allen Hamilton and founder of the Hadoop-DC Meetup group. "Hadoop is the rst enterprise tool we have that lets us create value from data. Every agency should be looking at Hadoop." However, implementation is not as simple as converting existing databases into a Hadoop framework. That would be a missed opportunity for strategic data analysis, Sullivan said. Moreover, many existing databases should be maintained separately and connected to Hadoop databases and analytics. As a general rule, any group with more than 2T of data should consider Hadoop. "Anything more than 100T, you absolutely want to be looking at Hadoop," Sullivan said. David Skinner, leader of the Outreach, Software and Programming Group at the Energy Department s Lawrence Berkeley National Laboratory, said he hopes Hadoop will offer a solution to the growing problem of data blindness, which keeps scientists from deeply understanding their own ExecTe c h BY KATHERINE REYNOLDS LEWIS The open-source tool simpli es big-data management and frees users to explore information in a whole new way 30 April 15, 2013 FCW.COM
March 30, 2013
April 30, 2013