boletín de noticias

Reciba actualizaciones recientes de Hortonworks por correo electrónico

Una vez al mes, recibir nuevas ideas, tendencias, información de análisis y conocimiento de macrodatos.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Una vez al mes, recibir nuevas ideas, tendencias, información de análisis y conocimiento de macrodatos.

cta

Empezar

nube

¿Está preparado para empezar?

Descargue sandbox

¿Cómo podemos ayudarle?

* Entiendo que puedo darme de baja en cualquier momento. Agradezco asimismo la información complementaria porporcionada en la Política de privacidad de Hortonworks.
cerrarBotón de cerrar
Proyectos de Apache
Apache Kafka

Apache Kafka

MENÚ

INFORMACIÓN GENERAL

A fast, scalable, fault-tolerant messaging system

Apache™ Kafka is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. Kafka is often used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication.

Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data. Kafka can message geospatial data from a fleet of long-haul trucks or sensor data from heating and cooling equipment in office buildings. Whatever the industry or use case, Kafka brokers massive message streams for low-latency analysis in Enterprise Apache Hadoop.

Qué hace el Kafka

Apache Kafka supports a wide range of use cases as a general-purpose messaging system for scenarios where high throughput, reliable delivery, and horizontal scalability are important. Apache Storm and Apache HBase both work very well in combination with Kafka. Common use cases include:

  • Procesamiento de flujo
  • Website Activity Tracking
  • Recopilación y monitoreo de métricas
  • Agregación de registro

Some of the important characteristics that make Kafka such an attractive option for these use cases include the following:

Característica Descripción
Escalabilidad
    Distributed system scales easily with no downtime
Durabilidad
    Persiste los mensajes en el disco y proporciona replicación intra-clúster
Fiabilidad
    Replicates data, supports multiple subscribers, and automatically balances consumers in case of failure
Rendimiento
    Alto rendimiento para publicar y suscribir, con las estructuras de disco que proporcionan un rendimiento constante, incluso con muchos terabytes de mensajes almacenados

Cómo trabaja Kafka

El diseño del sistema de Kafka puede considerarse como la de un registro de confirmación distribuida, donde los datos de entrada se escriben secuencialmente en el disco. Hay cuatro componentes principales involucrados en mover los datos dentro y fuera de Kafka:

  • Temas
  • Producers
  • Consumidores
  • Brokers

Kafka Cluster Diagram

In Kafka, a Topic is a user-defined category to which messages are published. Kafka Producers publish messages to one or more topics and Consumers subscribe to topics and process the published messages. Finally, a Kafka cluster consists of one or more servers, called Brokers that manage the persistence and replication of message data (i.e. the commit log).

Kafka Partition Diagram

One of the keys to Kafka’s high performance is the simplicity of the brokers’ responsibilities. In Kafka, topics consist of one or more Partitions that are ordered, immutable sequences of messages. Since writes to a partition are sequential, this design greatly reduces the number of hard disk seeks (with their resulting latency).

Another factor contributing to Kafka’s performance and scalability is the fact that Kafka brokers are not responsible for keeping track of what messages have been consumed – that responsibility falls on the consumer. In traditional messaging systems such as JMS, the broker bore this responsibility, severely limiting the system’s ability to scale as the number of consumers increased.

Kafka Broker Diagram

For Kafka consumers, keeping track of which messages have been consumed (processed) is simply a matter of keeping track of an Offset, which is a sequential id number that uniquely identifies a message within a partition. Because Kafka retains all messages on disk (for a configurable amount of time), consumers can rewind or skip to any point in a partition simply by supplying an offset value. Finally, this design eliminates the potential for back-pressure when consumers process messages at different rates.

Recent Kafka Releases

Hortonworks is working to make Kafka easier for enterprises to use . New focus areas include creation of a Kafka Admin Panel to create/delete topics and manage user permissions,  easier and safer distribution of security tokens and support for multiple ways of publishing/consuming data via a Kafka REST server/API.

Apache Kafka Version Enhancements Versión HDP HDF Version
0.10.0.1
  • Message Timestamps
  • Automated Replica Leader Election
  • Rack Awareness
  • New Consumer APIs
  • More stability to Producer APIs
2.5 2.0
0.9.0.1
  • Wire encryption using SSL
  • SASL support
  • User defined quotas
  • New Producer APIs
2.4 1.2

Latest Developments

  • Rack awareness for Increased resilience and availability such that replicas are isolated so they are guaranteed to span multiple racks or availability zones.
  • Automated replica leader election for automated, even distribution of leaders in a cluster capability by detecting uneven distribution with some brokers serving more data compared to others and makes adjustments.
  • Message Timestamps so every message in Kafka now has a timestamp field that indicates the time at which the message was produced.
  • SASL improvements including external authentication servers and support of multiple types of SASL authentication on one server
  • Ambari Views for visualization of Kafka operational metrics

Kafka Security

  • Kafka security encompasses multiple needs – the need to encrypt the data flowing through Kafka and preventing rogue agents from publishing data to Kafka, as well as the ability to manage access to specific topics on an individual or group level.
  • As a result, latest updates in Kafka support wire encryption via SSL, Kerberos based authentication and granular authorization options via Apache Ranger or other pluggable authorization system.

Foros

Kafka Tutorials

Kafka in the Press

Seminarios web y presentaciones