Chap10 | Mass-Storage Structure#

Overview#

Magnetic disks provide bulk of secondary storage of computer system.
hard disk is most popular.

Disk Structure#

可见 DB 笔记

Positioning time(random-access time) is time to move disk arm to desired sector
- seek time: move disk to the target cylinder
- rotational latency: for the target sector to rotate under the disk head
Performance
- transfer rate: theoretical 6 Gb/sec; effective (real) about 1Gb/sec
- seek time from 3ms to 12ms (9ms common for desktop drives)
- latency based on spindle speed: 1/rpm * 60
  - average latency = ½ latency
time计算考试会考

Example

主要慢在 seek 和 rotation 部分。

Disk Scheduling#

磁盘也需要调度，以减少 access time。

以前 OS 会负责调度，现在由 firmware 负责（disk controller）。

There are many disk scheduling algorithms

FCFS
SSTF
SCAN, C-SCAN
LOOK, C-LOOK

We use a request queue of cylinders “98, 183, 37, 122, 14, 124, 65, 67” ([0, 199]), and initial head position 53 as the example.
注意这里说的是 cyclinder 柱面（包含若干等距离的 track），只有不在同一柱面的才需要 seek，同一柱面不同 track 不需要动磁头，不同 sector 就靠转动。

FCFS#

First-come first-served, simplest scheduling algorithm.

Example

Total head movements of 640 cylinders

Advantage:
- Every request gets a fair chance
- No indefinite postponement
Disadvantages:
- Does not try to optimize seek time
- May not provide the best possible service

SSTF#

shortest seek time first.

类似 SJF，选择离现在 head position 最近的 request。但是 SSTF 不一定最好。可能发生 starvation。

Example

Advantages:
- Average Response Time decreases
- Throughput increases
Disadvantages
- Overhead to calculate seek time in advance
- Starvation may exist
- High variance : favors only some requests

SCAN#

也叫 elevator 电梯算法，先扫到一头，再往另一头扫，如果遇到 request 就读取。

Example

Advantages:
- Average Response Time
- High Throughput
- Low variance of response time
Disadvantages:
- Long waiting time for requests for locations just visited by disk arm
  
  如果刚好错过电梯，就要等电梯触底再上来，等待时间很长。

C-SCAN(Circular-SCAN) is designed to provides a more uniform wait time. 只做单向的扫，到达一端时立刻回到开头，随后从底往上扫，这样最多只用等待一圈。

LOOK#

在 SCAN / C-SCAN 的基础上，只走到一端最后一个任务（look 是否有请求）而不走到 disk 的头。

LOOK is a version of SCAN, C-LOOK is a version of C-SCAN.

Example

Advantage:
- Prevents the extra delay which occurred due to unnecessary traversal to the end of the disk.

Selecting Disk-Scheduling Algorithm

依赖于请求的模式，而请求本身又依赖于文件分配策略。文件系统如果注重空间局部性，能够提供很好的表现提升。

如果 I/O 比较少，FCFS 和 SSTF 即可。如果是大型的服务器或者数据库，一般使用 C-LOOK。如果是 SSD（不用 seek），一般使用 FCFS。

Nonvolatile Memory Devices#

If disk-drive like, then called solid-state disks (SSDs).
固态硬盘

Can be more reliable than HDDs，

与磁盘相比，寿命短，容量小，速度快（Bus 慢，需要直接连到 PCIE 上）。没有 arm 也不需要转，因此不存在 seek time 和 rotational latency.

Read and written in “page” increments (think sector) but can’t overwrite in place.
- Must first be erased, and erases happen in larger “block”
- Assume block size: 64k
- Can only be erased a limited number of times before worn out – ~ 100,000
  
  寿命短，里面是用门电路实现。充放电（擦写）门会被击穿，就无法区分 0/1 了。
- Life span measured in drive writes per day(DWPD)

Each cell has lifespan, so need to write equally to all cells.

Magnetic Tape#

磁带，容量很大，但是很慢。因为需要倒带，一般都做顺序访问而不是随机访问。现在主要用来做备份。

data stored on the tape are relatively permanent.

Disk Management#

使用这些介质（磁盘、固态硬盘、磁带）的时候，需要先格式化。

u盘使用FCFS

Physical formatting: divide disk into sectors for controller to read/write.
即把介质上分好不同的部分。

partition disk into groups of cylinders, each treated as a logical disk.
logical formatting partitions to make a file system on it.

Disk Attachment#

host-attached storage
- hard disk, RAID arrays, CD, DVD, tape...
  
  可以插到 I/O Bus 上。
network-attached storage
storage area network

RAID#

Disks are unreliable, slow, but cheap. Simple idea: let’s use redundancy to improve reliability and speed.

HDDs 越来越小和便宜，因此如果一个系统可以拥有大量磁盘，那么就能改善数据的读写速率（因为可以并行）和可靠性（使用冗余来降低出现错误的期望）。这样的磁盘组织技术称为 磁盘冗余阵列 (Redundant Arrays of Independent Disk, RAID) 技术。

Data Mirroring
- Keep the same data on multiple disks
Data Striping
- Keep data split across multiple disks to allow parallel reads
Error-Code Correcting (ECC) - Parity Bits
- Keep information from which to reconstruct lost bits due to a drive failing

这里 RAID 0 的技术主要是 Striping，RAID 1 的技术主要是 Mirroring，因此我们有时会说 5+1，5+0，表示几个技术结合。

RAID 0#

没有 redundancy，什么也不做，把数据分散在不同的磁盘。

Improves performance, but not reliability.

Example

Fixed strip size
5 files of various sizes
4 disks

RAID 1#

也被称为 mirroring，存在两个磁盘，一个是主磁盘，一个是备份磁盘。主磁盘写入数据后，备份磁盘也写入相同的数据。

Example

5 files of various sizes
4 disks

Reliability is ensured unless you have (extremely unlikely) simultaneous failures, performance can be boosted by reading from the disk with the fastest seek time.

但是浪费了一半的磁盘。

RAID 2#

stripes data at the bit-level; uses Hamming code for error correction (not used).

没有被实际应用，因为粒度太小，现在无法单独读出来一个比特，至少读出一个字节。

RAID 3#

Data is striped across multiple disks, with one dedicated parity disk that stores the parity information for all the data disks.

纠错码就在一个磁盘里。

RAID 4，5，6#

RAID 4: Basically like RAID 3, but interleaving it with strips (blocks)

用块来做 strip，纠错码单独存在一个盘里。这个纠错盘一直会被读写，很有可能先损坏。
RAID 5: Like RAID 4, but parity is spread all over the disks as opposed to having just one parity disk.

parity bit 被分散地存到了不同的磁盘里。相比于 RAID 4，每个盘的读写比较均衡。
RAID 6: extends RAID 5 by adding an additional parity block.

又加了一个 parity bit，也是分散存储。

RAID and File Systems#

RAID 只检测磁盘失效，并不知道对应的是哪个文件失效。

ZFS adds checksums to all FS data and metadata.
这样可以检验磁盘是否写错。

Takeaway#

Takeaway

Disk structure
Disk scheduling
- FCFS, SSTF, SCAN, C-SCAN, LOOK, C-LOOK
RAID 0-6

Last update: 2024年12月26日 19:07:16
Created: 2024年9月17日 14:52:21