BIG Data and the Public Sector

By Anthony Cecchini Newsletter Archives September 9, 2012

BIG Data and the Public Sector

TAKE NOTE(Insights into the SAP solution and technology)

Big Data is probably the most talked-about topic in IT right now. Studies reveal that companies making data-driven decisions show higher productivity gains than other factors can explain. That’s reason enough for most companies to start running big data analyses. The challenge lies in making the data available for analysis, i.e. storing and managing the data.

The amount of data to be stored is growing at a rate of 65 percent annually. Eighty percent of that will be unstructured data, which is fundamentally more difficult to manage. Companies need to find a way to store big data without overloading their data centers. Since unstructured data usually isn’t the mission critical information that requires high performance storage, companies can use a lower-cost tier of storage to save money. In addition, most unstructured data is untouched after 90 days. Companies could easily archive this data, which means they wouldn’t have to keep replicating (and paying for) information they don’t use.

Big Data and the Federal Government

Big Data offers a grand challenge to reinvent major sectors of the US economy like healthcare, retail, energy and manufacturing.

Deputy Director Thomas A. Kalil of the White House Office of Science and Technology Policy (OFSTP), defined grand challenges to be “ambitious yet achievable goals that capture the public’s imagination and require innovation and breakthroughs in science and technology.”

He further noted that “while a national initiative can support longer term basic research, we also need to think about what the private sector, academia and non-profits can do to develop new products that help solve our greatest economic, societal and scientific challenges.”

It is through creative collaborations that the administration’s $200M “Big Data Initiative” can dramatically advance US economic opportunity and help solve some of our Nation’s most pressing problems.

The Economic Power of Big Data

With new artificial intelligence algorithms and historical data, predictive models are now possible for everything — from healthcare diagnostics to transportation planning, personalized teaching, disaster preparedness, even national security.

Some industry experts think extracting actionable insights from Big data will surpass both the PC revolution and the Internet in transforming the way we live and work.

In economic terms, the McKinsey Global Institute reported that harnessing Big data could unleash $300B of value per year in US Healthcare alone, with $200B of that from an annual reduction in healthcare expenditures.

That equates to $1000 of patient-centric healthcare at a lower cost – for every person every year.

Big Data could also help retailers increase operating margins by 60% and reduce manufacturing’s product development and assembly costs by up to 50%.

In fact, Big Data can help any company increase overall productivity and profit by 5%.

The end result could be a “tech-driven jobs boom” for new IT and business roles such as
a) data scientists to “turn data into knowledge,” and
b) functional roles to “turn knowledge into action” via new products and services.

Harnessing Big Data broadly will require interagency collaborations and public/private partnerships with large scale investment and more skills nationally to fill nearly two million jobs in the growing Big Data workforce.

Key Big Data Initiatives Already Underway

A key interagency collaboration is the BIGDATA program between the National Institutes of Health (NIH) and the National Science Foundation (NSF). This program will fund projects of common interest that yield new core techniques, tools and technologies to advance Big Data science, biomedical research and engineering.

Because scientists spend so much time developing customized tools to manage and analyze data, having access to standard tools and software algorithms will free them to focus more on scientific investigation and discovery.

Several of the National Library of Medicine’s BIGDATA priorities include

computer simulations with published knowledge to investigate new hypotheses
interactive publications that integrate data/knowledge resources with tools and ways to add data and reanalyze findings.

EarthCube is a Geosciences community BIGDATA project. The EarthCube community will create an integrated data and knowledge management system to foster geo- and cyber-scientist collaboration. Its aim is to “develop a framework to understand and predict the Earth system from the sun to the center of the Earth.”

In other agency-funded programs, DARPA’s XDATA program will develop computing techniques and tools to process and analyze vast amounts of mission-oriented information for defense initiatives. A key DARPA challenge is managing sensor and communications systems data used for battlefield awareness and planning.

The Department of Energy’s new Scalable Data Management, Analysis, and Visualization (SDAV) Institute at the Lawrence Berkeley National Laboratory will build end-to-end solutions for scientific data management, visualization and analytics of large datasets on emerging supercomputing architectures. The SDAV consortium includes six national laboratories, seven universities and an industry partner.

MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) “bigdata@CSAIL” initiative will develop new computational solutions for how people share, store and manipulate massive quantities of information in medicine, finance, social media and security.

Intel will join CSAIL and five other universities in an Intel Science and Technology Center (ISTC) for Big Data. ISTC will expand CSAIL’s efforts to analyze big datasets for solutions in government, manufacturing and retail in addition to medicine and finance.

To achieve leadership in the new data economy, Massachusetts Governor Deval Patrick launched a statewide initiative to establish Massachusetts as a Big Data research hub. The state will also support MIT in training the next generation of data scientists.

In technology headquarters in California, the University of California at Berkeley’s AMPLab will develop open source software at the intersection of machine learning, cloud computing, and crowdsourcing. AMPLab aims to integrate Algorithms, Machines, and People to make sense of massive data for cancer genomics, real-time traffic sensing and prediction, urban planning and Internet security.

While the above examples are just a snapshot of Big Data Initiatives underway, the United States has embarked on an auspicious “grand challenge” to foster economic growth, create jobs and transform all national priorities in the process – all with Big Data.

Q&A (Your Questions answered)

Q. Can you explain a little bit of what SAP HANA is?

A. Sure, right after I boil the ocean. This is a bit of a high level question, so I am going to try and stay a little bit at the 50,000 foot level.

SAP HANA is a game-changing, real-time platform for analytics and applications. While simplifying the IT stack, it provides powerful features like: significant processing speed, the ability to handle big data, predictive capabilities and text mining capabilities.

Accelerate key business processes with rapid analysis and reporting
Invent new business models and processes by leveraging innovative solutions
Reduce TCO with less hardware and maintenance

HANA DB takes advantage of the low cost of main memory (RAM), data processing abilities of multi-core processors and the fast data access of solid-state drives relative to traditional hard drives to deliver better performance of analytical and transactional applications. It offers a multi-engine query processing environment which allows it to support both relational data (with both row- and column-oriented physical representations in a hybrid engine) as well as graph and text processing for semi- and unstructured data management within the same system.

While HANA has been called an acronym for HAsso’s New Architecture (a reference to SAP founder Hasso Plattner), it also has been used to describe High Performance ANalytic Appliance.

Check out this video for a VERY high level SAP explanation

If you have a technical question you’d like answered, post the question to our Facebook page www.facebook.com/itpsapinc

BIG Data and the Public Sector

BIG Data and the Public Sector

A Guide to the New ALV Grid Control – Part 3