Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

nube

¿Está preparado para empezar?

Descargue sandbox

¿Cómo podemos ayudarle?

cerrarBotón de cerrar
cta

Fast, Easy and Secure Big Data Ingestion

Transform Data Ingestion From Months To Minutes

nube Learn how you can make data ingestion fast, easy and secure

Download Whitepaper

What Is Data Ingestion?

Big data ingestion is about moving data - and especially unstructured data - from where it is originated, into a system where it can be stored and analyzed such as Hadoop.

Data ingestion may be continuous or asynchronous, real-time or batched or both (lambda architecture) depending upon the characteristics of the source and the destination. In many scenarios, the source and the destination may not have the same data timing, format or protocol and will require some type of transformation or conversion to be usable by the destination system.

As the number of IoT devices grows, both volume and variance of data sources are expanding rapidly, sources which now need to be accommodated, and often in real time. Yet extracting the data such that it can be used by the destination system is a significant challenge in terms of time and resources. Making data ingestion as efficient as possible helps focus resources on big data analysis, rather than the mundane efforts of data preparation and transformation.

HDF Makes Big Data Ingest Easy

Before

Complicated, messy, and takes weeks to months to move the right data into Hadoop

After

Streamlined, Efficient, Easy

Typical Problems of Data Ingestion

Complex, Slow and Expensive

*

Purpose-built and over-engineered tools make big data ingest complex, time consuming, and expensive

*

Writing customized scripts, and combining multiple products together to acquire and ingest data associated with current big data ingest solutions takes too long and prevents on-time decision making required of today’s business environment

*

Command line interfaces for existing tools create dependencies on developers and fetters access to data and decision making

Security and Trust of Data

*

The need to share discrete bits of data is incompatible with current transport layer data security capabilities which limit access at the group or role level

*

Adherence to compliance and data security regulations is difficult, complex and costly

*

Verification of data access and usage is difficult and time consuming and often involves a manual process of piecing together different systems and reports to verify where data is sourced from, how it is used, and who has used it and how often

Problems of Data Ingestion for IoT

*

Difficult to balancing limited resources of power, computing and bandwidth with the volume of data signals being generated from data sources

*

Unreliable connectivity disrupts communication outages and causes data loss

*

Lack of security on most of the world’s deployed sensors puts businesses and safety at risk

Optimizing Data Ingestion with Hortonworks DataFlow

Rápido, Fácil, Seguro

*

The fastest way to address many big data ingestion problems today

*

Real-time, interactive point and click control of dataflows

*

Accelerated data collection and movement for increased big data ROI

*

Real-time operational visibility, feedback, and control

*

Business agility and responsiveness

*

Real-time decision making from streaming data sources

*

Unprecedented operational effectiveness is achieved by eliminating the dependence and delays inherent in a coding and custom scripting approach

*

Off-the shelf, flow based programming for big data infrastructure

*

Secure, reliable and prioritized data collection over geographically dispersed, variable bandwidth environments

*

End-to-end data provenance that enables a chain-of-custody for data compliance and data “valuation” and dataflow optimization and trouble shooting

Single, Flexible, Adaptive Bi-Directional Real-Time System

*

Integrated data-source agnostic collection from dynamic, disparate and distributed sources

*

Adaptive to fluctuation conditions of remote, distributed data sources over geographically disperse communication links in fluctuating bandwidth and latency environments

*

Dynamic, real-time data prioritization at the edge to send, drop or locally store data

*

Bi-Directional movement of data, commands and contextual data

*

Equally well designed to run on the small scale data sources that make up the Internet of Things as well as on large scale clusters in today's enterprise data centers

*

Visual chain of custody for data (provenance) provides real-time event-level data lineage for verification and trust of data from the Internet of Things

 
How real-time dataflows accelerate big data ROI
Secure dataflows from IoT
Real-time, visual data lineage
Secure data access and control
Dynamic prioritization of data in motion

Use-Cases of Data Ingestion with Hortonworks Dataflow

CASO DE USO 1

On-Ramp Into Hadoop

Accelerate the time typically required to move data into Hadoop, from months to minutes through a real-time drag and drop interface. Read about a real-world use case and see how to move data into HDFS in 30 seconds.

 

Prescient Vídeo | Blog
30 Second Live Demo View Now

CASO DE USO 2
img multimedia

Log Collection / Splunk Optimization

Log data can be complex to capture, typically collected in limited amounts and difficult to operationalize at scale. HDF helps efficiently collect, funnel and access expanding volumes of log data and eases integration with log analytics systems such as Splunk, SumoLogic, Graylog, LogStash, etc. for easy, secure and comprehensive data ingest of log files.

 

Log Analytics Optimization whitepaper DOWNLOAD NOW

CASO DE USO 3
img multimedia

IoT Ingestion

Realizing the promise of real-time decision making enabled by real-time IoT data is a challenge due to the distributed and disparate nature of IoT data. HDF simplifies data collection and helps push intelligence to at the very edge of highly distributed networks.

 

A. Edge Intelligence for IoT LEARN MORE
B. Retail and IoT LEARN MORE
C. Open Energi IoT LEARN MORE

USE-CASE 4
img multimedia

Deliver data into stream processing engines

NiFi Kafka and Storm blog, slides, webinar LEARN MORE
Comcast NiFi into Spark from Keynote at Hadoop Summit VIDEO