Skip to content

Lecture 1 | Introduction#

Abstract
  • Database Systems
  • Database Applications
  • Purpose of Database Systems
  • View of Data
  • Data Models
  • Database Languages
  • Database Design
  • Database Engine
  • Database Users and Administrators
  • History of Database Systems

Database Systems#

DBMS (Database Management System)

Purpose of Database Systems#

Database systems offer solutions to all the below problems.

  • data redundancy (数据冗余) and inconsistency
  • data isolation (数据孤立,数据孤岛)
  • difficulty in accessing data
  • Integrity problems (完整性问题)

    • 完整的约束条件被藏在数据和代码中,而不是显式地声明。 e.g. "account balance \(\geq\) 1"
  • Atomicity problems (原子性问题)

    • Failures may leave database in an inconsistent state with partial updates carried out e.g. 从 A 账户转账到 B, 我们必须保证 A 转出 B 转入这两件事同时进行,不能被打断。
  • Concurrent access anomalies (并发访问异常)

    • Uncontrolled concurrent accesses can lead to inconsistencies
    Example

  • Security problems

    • Authentication (认证), Priviledge (权限), Audit (审计)

Characteristics of Databases#

  • data persistence (数据持久性)
  • convenience in accessing data (数据访问便利性)
  • data integrity(数据完整性)
  • concurrency control for multiple user (多用户并发控制)
  • failure recovery (故障恢复)
  • security control (安全控制)

View of Data#

Three-level abstraction of databases

  • physical level: 物理中如何保存、实现(如磁盘)
  • logical level: 数据库中存储什么数据?数据之间的关系?
  • view level: 不同人看到不同的他们所需要的数据

Advantages:

  • Hide the complexities
  • Enhance the adaptation to changes
    硬件环境变化 (physical level), 可以通过调整逻辑关系和映射来适应新的硬件环境。
    逻辑环境变化 (logic level), 可以通过 view 和 logic 的映射使得 view 尽量少变化。

Schema and Instance#

Similar to types and variables in programming languages.

  • Schema (模式)– the logical structure of the database (physical/logical)
  • Instance (实例) – the actual content of the database at a particular point in time

Data Independence#

  • Physical Data Independence (物理数据独立性) – the ability to modify the physical schema without changing the logical schema
  • Logical Data Independence (逻辑数据独立性) - the ability to modify the logical schema without changing the user view schema

映射修改,但不用修改 schema.

Data Models#

Data models is a collection of tools for describing data, data relationships, data semantics, data constraints.

  • Relational model(关系模型)
    (表格)数据库系统层面
  • Entity-Relationship(实体-联系) data model
    需求分析层面
  • Object-based data models
    • Object-oriented (面向对象数据模型)
    • Object-relational (对象-关系模型模型)
  • Semistructured data model (XML)(半结构化数据模型)
  • Other older models:
    • Network model (网状模型)
    • Hierarchical model(层次模型)

Relational Model

12 tuples, 4 attributes.

Datahase Languages#

Data Definition Language (DDL, 数据定义语言)#

Example

C 语言里的 struct 经过编译后全部变为了代码(有一张符号表)但数据库里不会。

Data dictionary contains metadata (元数据,i.e. data about data)

  • DDL compiler generates a set of table templates stored in a data dictionary (数据字典)
  • Database schema
  • Integrity constraints (完整性约束) Primary key (ID uniquely identifies instructors, 主键) Referential integrity (references constraint in SQL, 参照完整性) e.g. dept_name value in any instructor tuple must appear in department relation
  • Authorization (权限)

Data Manipulation Language (DML, 数据操作语言)#

Two classes of languages

  • Procedural (过程式)– user specifies what data is required and how to get those data e.g. C
  • Declarative (nonprocedural,陈述式,非过程式) – user specifies what data is required without specifying how to get those data e.g. SQL

SQL is the most widely used query language

SQL Query Language#

Database Access from Application Program#

数据库必须由过程式语言编写。

Application programs generally access databases through one of * Language extensions to allow embedded SQL e.g. 通过预处理器,将 select 语句识别出来,翻译成 C 语言的函数调用。 * API (Application program interface) e.g. ODBC/JDBC which allow SQL queries to be sent to a database.

Database Design#

  • Entity Relationship Model (实体-联系模型)
    一对一/一对多/多对一/多对多

图是中立的,可以用关系数据库/面向对象数据库实现

  • Normalization Theory (规范化理论)
    Formalize what designs are bad, and test for them

    Example

    这个表存在冗余, department 有重复,应该拆分为两个表(前四列和后三列)

Database Enginee#

  • The storage manager,
  • The query processor component,
  • The transaction management component.

Storage Manager#

为了数据持久化,放在硬盘里,但数据处理要进入内存,这之间存在数据鸿沟。
按块访问内存 (4k/16k), 缓冲管理。

  • File manager
  • Buffer manager
  • Authorization and integrity manager
  • Transaction manager

数据库里放的是

  • data files 真正的数据
  • data dictionary 放的是 metadata
  • statistical data 用于数据库的查询处理
  • indices

Query Processor#

The query processor components include:

  • DDL interpreter -- interprets DDL statements and records the definitions in the data dictionary.
  • DML compiler -- translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands.
    • The DML compiler performs query optimization (查询优化); that is, it picks the lowest cost evaluation plan from among the various alternatives.
      执行计划会根据统计数据的改变而改变。
  • Query evaluation engine -- executes low-level instructions generated by the DML compiler.
  • Parsing and translation
  • Optimization
  • Evaluation

Transaction Management (事务管理)#

银行转账,A 转账到 B, A 余额减掉 B 余额加上。 要有隔离性,延迟写回

  • Recover Manager
    ensures that the database remains in a consistent (correct) state despite system failures (e.g. power failures and operating system crashes) and transaction failures.
    日志,防止写回数据库时出现断电或者崩溃。
  • Concurrency-control manager
    controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Database Users#

  • Database Administrator (DBA) 数据库管理员,要做定义模式,数据库调谐,故障时恢复(备份)等工作。
    老师:我不看好!

History#

  • 1973 Turing Award: Charles W. Bachman
    father of databases
  • 1981 Turing Award: Edgar F. Codd
  • 1998 Turing Award: Jim Gray
  • 2014 Turing Award: Michael Stonebraker

Last update: 2024年6月15日 15:57:56
Created: 2024年3月19日 11:22:29