跳到主要內容

發表文章

目前顯示的是有「programming」標籤的文章

Distributed transactions

Table of contents [ hide ] Basic theory  CAP States that any distributed data store can provide only two of the following three guarantees. Consistency Every read receives the most recent write or an error. Availability Every request receives a (non-error) response, without the guarantee that it contains the most recent write. Partition tolerance The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. Typical architecture of distributed systems When a network partition failure happens, it must be decided  whether to do one of the following: CP: cancel the operation and thus decrease the availability but ensure consistency AP: proceed with the operation and thus provide availability but risk inconsistency. BASE Basically-available, soft-state, eventual consistency. Base theory is the practical application of CAP theory, that is, under the premise of the existence of partitions and copies, through certain syste

Program design template

Table of contents [ hide ] Designing programs is a common job for every engineer, so templates help simplify the job of designing each program. Update History Let everyone know the latest version and update information. background Write down why we need this program and what is the background for building this program. Target Target functionality The goal of the program, what function to achieve, the main module and submodule, and these modules' relationship. Target performance Specific benchmarks such as QPS or milliseconds to evaluate programs. Target architecture Stability. Readability. Maintainability. Extendability. ... Others Target Overall design Design principles and thinking Explain how and why the program was designed. Overall architecture An overall architectural picture. Dependency Module dependencies on other modules or programs. Detail design Program flow design Program flow design diagram. API design The details of the API, and how to interact with the frontend

I/O system

Table of contents [ hide ] I/O hardware Port: a connection point between I/O devices and the host. E.g.: user ports. Bus: a set of wires and a well-defined protocol that specifies messages sent over the wires. E.g.: PCI bus. Controller: a collection of electronics that can operate a port, a bus, or a device. A controller could have its own processor and memory. Etc. (e.g.: SCSI controller). Basic I/O Method (Port-mapped I/O) Each I/O port (device) is identified by the unique port address. Each I/O port consists of four registers (1~4bytes). Data-in register: read by the host to get input Data-out register: written by the host to send output Status register: read by the host to check I/O status Control register: written by the host to control the device The program interacts with an I/O port through special I/O instructions (different from mem. access). X86: IN, OUT I/O methods Categorization Depending on how to address a device: Port-mapped I/O. Use different address spaces for me

Virtual memory

Table of contents [ hide ] Virtual memory Separation of user logical memory from physical memory. To run an extremely large process. Logical address space can be much larger than physical address space. To increase CPU/resource utilization. A higher degree of multiprogramming degree. To simplify programming tasks. A free programmer from memory limitation. To run programs faster. Less I/O would be needed to load or swap. Process & virtual memory Demand paging: only bring in the page containing the first instruction. Copy-on-write: the parent and the child process share the same frames initially, and frame-copy. when a page is written. Allow both the parent and the child process to share the same frames in memory. If either process modifies a frame, only then a frame is copied. COW allows efficient process creation(e.g., fork()). Free frames are allocated from a pool of zeroed-out frames(for security reasons). The content of the frame is erased to 0 Memory-Mapped File: map a fi

Memory

Table of contents [ hide ] Modern operating systems use paging to manage memory. In this process, physical memory is divided into fixed-sized blocks called frames and logical memory into blocks of the same size called pages. Memory-management unit Hardware devices that map virtual to the physical address. The value in the relocation register is added to every address generated by a user process at the time it is sent to memory. Logical vs Physical Address Logical address: generated by CPU, a.k.a. virtual address. Physical address:  seen by the memory module. Compile-time & load-time address binding: Logical address = physical address. Execution-time address binding: logical address != physical address. The user program deals with logical addresses; it never sees the real physical address. Swap A process can be swapped out of memory to a backing store, and later brought back into memory for continuous execution. Backing store: a chunk of the disk, separated from the file system

Deadlocks

Table of contents [ hide ]  A set of the blocked processes each holding some resources and waiting to acquire a resource held by another process in the set. Necessary conditions All four conditions must hold for possible deadlock: Mutual exclusion:  Only 1 process at a time can use a resource. Hold and wait: A process holding some resources is waiting for another resource No preemption: A resource can be only released by a process voluntarily. Circular wait: There exists a set {P0,P1…} of waiting processes such that P0 -> P1 -> P2 …..Pn -> P0 Handling Deadlocks Ensure the system will never enter a deadlock state. Deadlock prevention: ensure that at least one of the 4 necessary conditions cannot hold. Deadlock avoidance: dynamically examines the resources allocation state before allocation. Allow to enter a deadlock state and then recover. Deadlock detection. Deadlock recovery. Ignore the problem and pretend that deadlock never occurs in the system. Use by most operating s

Synchronization

Background Concurrent access to shared data may result in data inconsistency. Maintaining data consistency requires a mechanism to ensure the orderly execution of the cooperating process. Race condition Race condition: the situation where several processes access and manipulate shared data concurrently. The final value of the shared data depends upon which process finishes the list. To prevent a race condition, concurrent processes must be synchronized. On a single-processor machine, we could disable interrupt or use non-preemptive CPU scheduling. Commonly described as a critical section problem Critical Section Requirements Mutual Exclusion: if process P is executing in its CS, no other processes can be executing in its CS. Process: if no process is executing in its CS and there exist some processes that wish to enter their CS, these processes cannot be postponed indefinitely. Bounded waiting: a bound must exist on the number of times that other processes are allowed to enter their C

Cpu scheduling

Table of contents [ hide ] Basic concept The idea of multiprogramming: Keep several processes in memory. Every time one process has to wait, another process takes over the use of the CPU. CPU-I/O burst cycle: Process execution consists of a cycle of CPU execution and I/O wait(i.e., CPU burst and I/O burst). Generally, there is a large number of short CPU bursts and a small number of long CPU bursts An I/O-bound program would typically have many very short CPU bursts. A CPU-bound program might have a few long CPU bursts. CPU scheduler Select from the ready queue to execute(I.e allocates an APU for the selected process) CPU scheduling  decision may take place when a process: Switch from running to waiting state. Switch from running to ready state. Switch from waiting to ready. Terminates. Non-preemptive scheduling: Scheduling under 1 and 4(no choice in terms of scheduling). The process keeps the CPU until it is terminated or switched to the waiting state. Preemptive scheduling: Sche

Thread and concurrency

Table of contents [ hide ] Thread Thread is a lightweight process, a basic unit of CPU utilization. All threads belonging to the same process share: Code section. Data section. OS resources(e.g. open files and signals) Each thread has its own(thread control block): Thread ID. Program counter. Register set. A tack. Benefits of multithreading: Responsiveness. Resource sharing. Utilization of NP arch. Economy. User vs Kernel threads User threads Thread management is done by the user-level thread library. POSIX Pthreads. Win32 threads. Java threads. Thread library provides support for thread creation, scheduling, and deletion. Generally fast to create and manage. If the kernel is single-threaded, a user-thread block -> entire process blocks even if other threads are ready to run. Kernel threads Supported by the kernel(OS) directly. Windows 2000(NT) Solaris Linux Tr64 UNIX The kernel performs thread creation, scheduling, etc. Generally slower to create and manage. If a thread is blo

Process

Table of contents [ hide ] A process is a program that is executed in memory. Process layout A process includes: Code segment. Data section-global variables. Stack-temporary local variables and functions. Heap-dynamic allocated variables or classes. Current activity(program counter, register contents). A set of associated resources. Process status Process control block A process control block (PCB) is the kernel data structure that represents a process in an operating system.   Context switch means the kernel saves the state of the old process and loads the saved state for the new process. Process state. Program counter. Cpu register. Cou scheduling. Memory-management information. I/O status information. Accounting information. Process Schedulers Short-term(CPU scheduler) Selects which process should be executed and allocated CPU(ready state -> run state). Long-term(job scheduler) Selects which processes should be loaded into memory and brought into the ready queue(new state -

OS basic

Table of contents [ hide ] OS architecture The operating system architecture consists of three parts, user mode, kernel mode, and hardware. User mode is for the application to execute the user's program. Kernel mode is to control all the I/O devices and system stability. Storage device hierarchy System call The system call is a kind of software interrupt, including six categories. Process control. File management. Device management. Information maintenance. Communication. Protection. System calls use three methods to pass parameters. Registers. The table in memory. Push onto the stack. A view of operating system services reference: https://www.amazon.com/-/zh_TW/Operating-System-Concepts-Abraham-Silberschatz/dp/1119800366/ref=sr_1_1?keywords=Operating-System-Concepts&qid=1669538704&s=books&sr=1-1

Internet Model

Table of contents [ hide ] The Internet Model includes 4 layers and 7 layers, usually, we use 7 layers for education and communication. Physical Specifies how data is processed into bits and physically transferred over cables. Media type. Connector type. Signal strength. Coding Mechanism. Data link Provides the link for how data packaged into frames is communicated through hardware to transport across a medium. Medium Access. Physical Address. Frame. Ethernet_frame: Network The network layer provides the functional and procedural means of transferring packets from one node to another connected in "different networks" Defines the format of the logical address. Provide routing and best path. IPV4 packet Icmp ICMP messages are typically used for diagnostic or control purposes or generated in response to errors in IP operations (as specified in RFC 1122). ICMP errors are directed to the source IP address of the originating packet. It is used by network devices, like routers

Consistency

Table of contents [ hide ] Consistency is essential for transactions between distributed systems and individual systems. Some concepts will be discussed here. ACID Atomicity: Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely or fails completely. Consistency: Consistency ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. Isolation: Isolation ensures that the concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Durability: Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., a power outage or crash). Isolation Levels

Class Loading

Class life cycle Class loader reference: https://www.amazon.com/depth-understanding-Java-Virtual-Machine/dp/7111641248

JVM Tools

Table of contents [ hide ]  Java has many useful tools for debugging at runtime or analyzing an application after it has crashed. Commands jps - JVM process Status Tool Same as the Linux command ps, only to find the process under the JVM. jinfo - Configuration Info for Java Show and adjust the JVM arguments. jmap - Memory Map for Java Getting a memory error dump is the most common technique used at runtime. The dump file can be used to analyze the cause of the failure. jstack - Stack trace for Java Check the thread runtime status. GUI VisualVM VisualVM is a tool that provides a visual interface for viewing detailed information about Java applications while they are running on a Java Virtual Machine (JVM). Memory Analyzer (MAT) The Eclipse Memory Analyzer is a fast and feature-rich Java heap analyzer that helps you find memory leaks and reduce memory consumption. reference: https://www.amazon.com/depth-understanding-Java-Virtual-Machine/dp/7111641248 https://en.wikipedia.org/wiki/

Garbage Collectors

Table of contents [ hide ] The JVM ships with various options for garbage collection to support a variety of deployment options. With this, we get flexibility in choosing which garbage collector to use for our application. Serial GC The Serial collector uses a single thread for garbage collection. The serial collector is not even that powerful compared to other GCs but is useful in a uniprocessor or hardware-constrained environments. ParNew  GC  The parallel version of the Serial collector, before the emergence of G1 GC, the combination of ParNew and CMS worked together, and ParNew was responsible for the garbage collection of the new generation. Parallel Scavenge  GC  Aa a young generation GC, compare to other GCs, focuses on the throughput, and another important feature is the auto adjustment strategy to find the best throughput and pause time. Serial Old GC Serial Old GC is for the old-generation version of Serial GC. It has two purposes, one is to work with Parallel Scavenge

GC Basic Algorithm

Theory Most GCs follow the theory of generational collection, which is based on the following two: Week generational hypothesis:  Most of the objects are short-living. Strong generational hypothesis:  The more times an object survives GC, the harder it is to die. Mark-Sweep After the marking phase has been completed all space occupied by unvisited objects is considered free and can thus be reused to allocate new objects. There may exist plenty of free regions but if no single region is large enough to accommodate the allocation, the allocation is still going to fail. Mark-Copy To avoid excessive fragmentation, it splits the space into two parts and copies the surviving objects into empty parts. The disadvantage is the need for one more memory region, which should be large enough to accommodate survived objects. Mark-Compact Like mark-copy, it doesn't require another space to copy but a more complex operation to move the object. Reachability analysis First, GC defines some specific

Java memeory model

The java memory model has multiple areas to do different jobs, this model is in user space because it has no I/O control. There are these parts: Program counter register: For java multi-thread switching different threads, and same functionality as OS counters. (PC) Registers are created every time a new thread is created. The PC holds a pointer to the current statement being executed in its thread. If the currently executing method is 'native', then the value of the program counter register will be undefined. Stack: Private for each thread. It contains method-specific primitive values and references to objects referenced from methods in the heap. Whenever we call a new method, a new block is created on top of the stack which contains values specific to that method Native Method stacks: JVM that supports native methods will have native method stacks. It is used for native methods, and created per thread.  Heap: Heap data area is used to store objects of classes and arrays.  Heap

Thread Status

Table of contents [ hide ] The java thread state is similar to the operating system process state, but the java state is in the JVM. Operating system process state New: The process is being created. Ready: The process is waiting to be assigned to a processor. Running: Instructions are being executed. Waiting: The process is waiting for some event to occur(such as an I/O completion or reception of a signal). Terminated: The process has finished execution. Java thread state New:  It represents the first state of a thread that is the NEW state. Runnable:  It represents the runnable state. It means a thread is waiting in the queue to run. Blocked:  It represents the blocked state. In this state, the thread is waiting to acquire a lock. Waiting:  It represents the waiting state. A thread will go to this state when it invokes the Object. wait() method, or Thread.join() method with no timeout. A thread in the waiting state is waiting for another thread to complete its task. Timed wai

Concurrency

Table of contents [ hide ] Glossary Synchronous:   When you start a program, you must wait until it finishes before moving on to the next step. Asynchronous: When you start a program, it returns immediately and runs the program in the background, and you can move on to the next step. Concurrency: Concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in the partial order, without affecting the final outcome Parallelism: Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Blocking: A process that is blocked is one that is waiting for some event, such as a resource becoming available or the completion of an I/O operation Non-blocking: An algorithm is called non-blocking if the failure or suspension of any thread cannot cause the failure or suspension of another thread Concurrency level Blocking: A thread is blocked and it cannot execute until other threa

Annotation

Table of contents [ hide ] Design pattern Annotations are an implementation of the decorator design pattern. The decorator pattern is a design pattern that allows behavior to be added to an individual object, dynamically, without affecting the behavior of other objects from the same class. Annotations in Java Annotations, a form of metadata, provide data about a program that is not part of the program itself. Annotations have no direct effect on the operation of the code they annotate. Annotations have a number of uses: Information for the compiler:  Annotations can be used by the compiler to detect errors or suppress warnings Compile-time and deployment-time processing:  Software tools can process annotation information to generate code, XML files, and so forth. Runtime processing:   Some annotations are available to be examined at runtime. Annotations in Spring Spring uses annotations extensively as a core feature, especially Spring AOP, and almost all the functions we use in S