Skip to content

Chap10 | Mass-Storage Structure#

Overview#

Magnetic disks provide bulk of secondary storage of computer system.
hard disk is most popular.

Disk Structure#

可见 DB 笔记

  • Positioning time(random-access time) is time to move disk arm to desired sector

    • seek time: move disk to the target cylinder
    • rotational latency: for the target sector to rotate under the disk head
  • Performance

    • transfer rate: theoretical 6 Gb/sec; effective (real) about 1Gb/sec
    • seek time from 3ms to 12ms (9ms common for desktop drives)
    • latency based on spindle speed: 1/rpm * 60
      • average latency = ½ latency

Example

主要慢在 seek 和 rotation 部分。

Disk Scheduling#

磁盘也需要调度,以减少 access time。

以前 OS 会负责调度,现在由 firmware 负责(disk controller)。

There are many disk scheduling algorithms

  • FCFS
  • SSTF
  • SCAN, C-SCAN
  • LOOK, C-LOOK

We use a request queue of cylinders “98, 183, 37, 122, 14, 124, 65, 67” ([0, 199]), and initial head position 53 as the example.
注意这里说的是 cyclinder 柱面(包含若干等距离的 track),只有不在同一柱面的才需要 seek,同一柱面不同 track 不需要动磁头,不同 sector 就靠转动。

FCFS#

First-come first-served, simplest scheduling algorithm.

Example

Total head movements of 640 cylinders

  • Advantage:
    • Every request gets a fair chance
    • No indefinite postponement
  • Disadvantages:
    • Does not try to optimize seek time
    • May not provide the best possible service

SSTF#

shortest seek time first.

类似 SJF,选择离现在 head position 最近的 request。但是 SSTF 不一定最好。可能发生 starvation。

Example

  • Advantages:

    • Average Response Time decreases
    • Throughput increases
  • Disadvantages

    • Overhead to calculate seek time in advance
    • Starvation may exist
    • High variance : favors only some requests

SCAN#

也叫 elevator 电梯算法,先扫到一头,再往另一头扫,如果遇到 request 就读取。

Example

  • Advantages:
    • Average Response Time
    • High Throughput
    • Low variance of response time
  • Disadvantages:
    • Long waiting time for requests for locations just visited by disk arm

      如果刚好错过电梯,就要等电梯触底再上来,等待时间很长。

C-SCAN(Circular-SCAN) is designed to provides a more uniform wait time. 只做单向的扫,到达一端时立刻回到开头,随后从底往上扫,这样最多只用等待一圈。

LOOK#

在 SCAN / C-SCAN 的基础上,只走到一端最后一个任务(look 是否有请求)而不走到 disk 的头。

LOOK is a version of SCAN, C-LOOK is a version of C-SCAN.

Example

  • Advantage:
    • Prevents the extra delay which occurred due to unnecessary traversal to the end of the disk.

Selecting Disk-Scheduling Algorithm

依赖于请求的模式,而请求本身又依赖于文件分配策略。文件系统如果注重空间局部性,能够提供很好的表现提升。

如果 I/O 比较少,FCFS 和 SSTF 即可。如果是大型的服务器或者数据库,一般使用 C-LOOK。如果是 SSD(不用 seek),一般使用 FCFS。

Nonvolatile Memory Devices#

If disk-drive like, then called solid-state disks (SSDs).
固态硬盘

Can be more reliable than HDDs,

与磁盘相比,寿命短,容量小,速度快(Bus 慢,需要直接连到 PCIE 上)。没有 arm 也不需要转,因此不存在 seek time 和 rotational latency.

  • Read and written in “page” increments (think sector) but can’t overwrite in place.

    • Must first be erased, and erases happen in larger “block”
    • Assume block size: 64k
    • Can only be erased a limited number of times before worn out – ~ 100,000

      寿命短,里面是用门电路实现。充放电(擦写)门会被击穿,就无法区分 0/1 了。

    • Life span measured in drive writes per day(DWPD)

Each cell has lifespan, so need to write equally to all cells.

Magnetic Tape#

磁带,容量很大,但是很慢。因为需要倒带,一般都做顺序访问而不是随机访问。现在主要用来做备份。

data stored on the tape are relatively permanent.

Disk Management#

使用这些介质(磁盘、固态硬盘、磁带)的时候,需要先格式化。

Physical formatting: divide disk into sectors for controller to read/write.
即把介质上分好不同的部分。

  • partition disk into groups of cylinders, each treated as a logical disk.
  • logical formatting partitions to make a file system on it.

Disk Attachment#

  • host-attached storage

    • hard disk, RAID arrays, CD, DVD, tape...

      可以插到 I/O Bus 上。

  • network-attached storage

  • storage area network

RAID#

Disks are unreliable, slow, but cheap. Simple idea: let’s use redundancy to improve reliability and speed.

HDDs 越来越小和便宜,因此如果一个系统可以拥有大量磁盘,那么就能改善数据的读写速率(因为可以并行)和可靠性(使用冗余来降低出现错误的期望)。这样的磁盘组织技术称为 磁盘冗余阵列 (Redundant Arrays of Independent Disk, RAID) 技术。

  • Data Mirroring

    • Keep the same data on multiple disks
  • Data Striping

    • Keep data split across multiple disks to allow parallel reads
  • Error-Code Correcting (ECC) - Parity Bits

    • Keep information from which to reconstruct lost bits due to a drive failing

这里 RAID 0 的技术主要是 Striping,RAID 1 的技术主要是 Mirroring,因此我们有时会说 5+1,5+0,表示几个技术结合。

RAID 0#

没有 redundancy,什么也不做,把数据分散在不同的磁盘。

Improves performance, but not reliability.

Example

  • Fixed strip size
  • 5 files of various sizes
  • 4 disks

RAID 1#

也被称为 mirroring,存在两个磁盘,一个是主磁盘,一个是备份磁盘。主磁盘写入数据后,备份磁盘也写入相同的数据。

Example

  • 5 files of various sizes
  • 4 disks

Reliability is ensured unless you have (extremely unlikely) simultaneous failures, performance can be boosted by reading from the disk with the fastest seek time.

但是浪费了一半的磁盘。

RAID 2#

stripes data at the bit-level; uses Hamming code for error correction (not used).

没有被实际应用,因为粒度太小,现在无法单独读出来一个比特,至少读出一个字节。

RAID 3#

Data is striped across multiple disks, with one dedicated parity disk that stores the parity information for all the data disks.

纠错码就在一个磁盘里。

RAID 4,5,6#

  • RAID 4: Basically like RAID 3, but interleaving it with strips (blocks)

    用块来做 strip,纠错码单独存在一个盘里。这个纠错盘一直会被读写,很有可能先损坏。

  • RAID 5: Like RAID 4, but parity is spread all over the disks as opposed to having just one parity disk.

    parity bit 被分散地存到了不同的磁盘里。相比于 RAID 4,每个盘的读写比较均衡。

  • RAID 6: extends RAID 5 by adding an additional parity block.

    又加了一个 parity bit,也是分散存储。

RAID and File Systems#

RAID 只检测磁盘失效,并不知道对应的是哪个文件失效。

ZFS adds checksums to all FS data and metadata.
这样可以检验磁盘是否写错。

Takeaway#

Takeaway

  • Disk structure
  • Disk scheduling
    • FCFS, SSTF, SCAN, C-SCAN, LOOK, C-LOOK
  • RAID 0-6

Last update: 2024年11月9日 17:03:32
Created: 2024年9月17日 14:52:21