Installation Environment Vmware Workstation pro It is recommended to use the snapshot to store the state of each installation stage to avoid installation failures and causing the installation to start from scratch. Ubuntu 22.04 windows 11 Hardware settings create 3 VM: 4 cores and 4G memory and 100G capacity Before installing K8s (All use the root user) set host: 192.168.47.135 master 192.168.47.131 node1 192.168.47.132 node2 set root ssh connection: sudo su - echo "PermitRootLogin yes" >> /etc/ssh/sshd_config systemctl restart sshd sudo passwd ssh-keygen for i in {master,node1,node2}; do ssh-copy-id root@$i; done set Ipvs and conf create conf file: for i in {master,node1,node2}; do ssh root@$i 'cat << EOF > /etc/modules-load.d/containerd.conf overlay br_netfilter EOF'; done execute conf: for i in {master,node1,node2}; do ssh root@$i 'modprobe overlay;modprobe br_netfilter;'; done create 99-kubernetes-cri.conf file: for i in {maste...
Kafka
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Why use a Message Queue like Kafka
1. Asynchronous
The program can continue execution without waiting for I/O to complete, increasing throughput.
2. Decoupling
It refers to reducing the dependencies between different parts of the system so that each component of the system can be developed, maintained, and evolved relatively independently. The main goal of decoupling is to reduce tight coupling between components to improve system flexibility, maintainability, and scalability.
3. Peak clipping
Peak clipping is essential to delay user requests more, filter user access needs layer by layer, and follow the principle of "the number of requests that ultimately land on the database is as small as possible".
Basic Concept
1. Client
Including producer and consumer.
2. Consumer Group
Each consumer can specify the consumer group to which he belongs. Each message will be consumed by multiple interested consumer groups.
But within each consumer group, a message means that it will be consumed by the consumer once.
3. Server-side Broker
A Kafka Server is a broker.
4. Topic
In a logical concept, a topic can be viewed as a set of business information that clients can produce or consume by binding specific topics of interest.
Producer Workflow
1. Serialize
The serialization mechanism is a very important optimization mechanism in high-concurrency scenarios. Efficient serialization implementation can greatly improve distributed systems
Systematic network transmission and data disk capabilities.
2. Partition
According to the specified key, the message is allocated to a specific partition through a specific algorithm.
3. Compression
In this step, the producer record is compressed before it’s written to the record accumulator.
4. Record accumulator
The message to be sent by KafkaProducer will be in
Cached in ReocrdAccumulator and then sent to Kafka Broker in batches
5. Sender
The sender is an independent thread in KafkaProducer used to send messages. As you can see here, each KafkaProducer object corresponds to a sender thread. He will be responsible for sending messages from the RecordAccumulator to Kafka.
The Sender does not send all the messages cached in the RecordAccumulator at once, but only takes out a part of the messages at a time. He only obtains the ProducerBatch messages whose cache content in the RecordAccumulator reaches the BATCH_SIZE_CONFIG size.
Of course, if there are relatively few messages, the message size in ProducerBatch cannot reach BATCH_SIZE_CONFIG for a long time, and the Sender will not wait forever. The maximum waiting time is LINGER_MS_CONFIG. Then the messages in ProducerBatch will be read.
6. ACK
Determine whether the message is successfully sent to the Broker.
Acks=0
The producer does not care whether the Broker writes the message to the Partition, it just sends the message and then forgets it. Highest throughput, but lowest data security.
Acks=all or -1
The producer needs to wait for all Partitions (Leader Partition and its corresponding FollowerPartition) on the Broker side to be written before getting the return result. This method is the most secure for data, but it will take longer to send messages each time. swallow
Throughput is the lowest.
Acks = 1
Is a relatively neutral strategy. After the Leader Partition writes its own message, it returns the result to the producer.
7. Producer message idempotence
Kafka ensures that no matter what the Producer sends to the Broker, No matter how many times the data is repeated, the Broker only retains one message.
8. Producer message transaction
Cluster MetaData
The main metadata of the Kafka cluster is stored in Zookeeper. There are two main states.
Controller
Among multiple Brokers, one Broker needs to be elected to serve as
Controller role. The Controller role manages the partition and replica status of the entire cluster.
Leader
In multiple Partitions under the same Topic, a Leader role needs to be elected. The Partition of the Leader role is responsible for data interaction with the client.
留言
張貼留言