Improve the database FilmDB and Course Materials - DSH-Project-II

· · 科技·工程

Improve the database FilmDB and Course Materials - DSH-Project-II

[TOC]

Publishment

Github

Tsawke's Blog

Luogu Blog

Tasks

1. Database Enhancement

2. Lecture Notes Review

3. Course Content Proposal

Preface

The provided filmdb.sql includes directives like PRAGMA that are not compatible with PostgreSQL/openGauss, and the country codes in it are not strictly ISO-3166, and the most significant, it's not latest (only until 2019). Overall, this report aims to refresh the database and keep it latest.

Acknowledgement

I acknowledge the use of multiple AI tools during the preparation of this report (including ChatGPT 5.2 Thinking, Gemini 3 Pro, Gemini 3 Pro (High, Antigravity), Claude Opus 4.5 (Thinking, Antigravity), GPT-5.1-codex-max (cursor), e.t.c.) to support drafting, code assistance, and language refinement. All AI-generated content was subsequently manually reviewed, reorganized, edited, and cleaned to ensure accuracy, originality, and clarity. Where applicable, the outputs were further validated through practical experiments to confirm correctness and executability.

To ensure transparency and reproducibility, the core code has been fully disclosed, and execution screenshots are included in the report, so that all presented content remains meaningful, concise, and valuable to the reader. In addition, as the data is used solely for research and educational purposes, the data usage and attribution comply with the relevant requirements and guidelines of sources such as Kaggle and TMDB.

Environment

Tips: The Tutorial for Database Setup.pdf is provided for installing databases in this environment.

1. Database Enhancement

1.1 Data Sources

This project requires “as new as possible”, therefore ideally updated to the day the program runs, and extensible to run daily in the future. This report provides a union plan to solve it.

1.1.1 Kaggle (Daily-updated TMDB dataset)

Use a daily refreshed Kaggle dataset based on TMDB as the main baseline source.

1.1.2 TMDB Official API

Kaggle is updated daily, but the program must reflect “today” whenever it runs. Therefore, TMDb official API is used as an incremental “delta” source after Kaggle import.

1.2 Workflow

1) Download Kaggle daily snapshot (baseline). 2) Load into staging tables. 3) Merge into FilmDB core tables (movies, countries, and any new enhancement tables). 4) Read last sync timestamp from a local metadata table. 5) Use TMDb Changes API to fetch changed IDs from last sync date to today. 6) Pull details for each changed movie ID; apply inserts/updates in a controlled transaction. 7) Validate constraints and export SQL.

1.3 Design

1.3.1 Preserve the original tables, add new tables and columns

Keep existing core tables (movies, countries, people, credits, e.t.c.). Add new columns and new tables for external identifiers and richer metadata.

1.3.2 New high-utility tables

Enhance movies with new columns (nullable):

Add new tables:

1) movie_update_log(sync_id, run_ts, source, dataset_version, start_date, end_date, status, notes) to record each update run. 2) country_code_alias(alias_code, canonical_code) to map ISO codes into FilmDB’s internal codes (e.g., map es to sp to avoid breaking the existing countries table). 3) staging_movies_* tables (temporary or persistent) to allow safe bulk imports and deterministic merges.

Principles:

1.4 Compatibility Strategy for PostgreSQL and openGauss

1.4.1 Approach

Produce two SQL outputs for maximum safety:

Even though openGauss can run in PG compatibility mode, differences still exist (e.g. notably upsert syntax).

Generating two scripts avoids unexpected failures in real environments.

1.4.2 Avoid DB-specific UPSERT syntax by using staging + merge logic

Instead of relying on PostgreSQL ON CONFLICT or openGauss ON DUPLICATE KEY UPDATE, implement portable merge logic:

1) Insert Kaggle rows into staging tables without strict constraints. 2) Normalize and deduplicate in SQL queries that use NOT EXISTS joins. 3) Update existing rows with deterministic rules (only update if new data is non-null and different). This approach works consistently across PostgreSQL and openGauss with minimal dialect differences.

1.5 Pipeline

This pipeline is designed to be reproducible and runnable on any day. Kaggle provides a daily-updated TMDb baseline snapshot, and the TMDb official API is used to catch up to “today” by applying incremental changes since the last sync date.

1.5.1 One-time setup

  1. Configure Kaggle API:

    • Save Kaggle token to ~/.kaggle/kaggle.json.

    • Typical permissions requirement: chmod 600 ~/.kaggle/kaggle.json.

    • Download the dataset snapshot with Kaggle CLI:

      • kaggle datasets download -d alanvourch/tmdb-movies-daily-updates -p ~/workspace/LargeFiles/ --unzip
  2. Confirm baseline CSV location:

    • The extracted Kaggle CSV is already available at ~/workspace/LargeFiles/TMDB_all_movies.csv.
  3. Configure TMDB API:

    • Obtain a TMDb API credential (API Read Access Token / Bearer token).

    • Export it for the pipeline runner:

      • export TMDB_BEARER_TOKEN="YOUR_TOKEN_HERE".

      • Here's my API Read Access Token, it's necessary to apply one on https://www.themoviedb.org/settings/api.

  4. Initialize DB schema enhancement (run once):

    • Apply schema extensions (new columns on movies, new tables pipeline_state, movie_update_log, country_code_alias, and staging tables).

    • This is executed by running 01_schema_enhancement.sql.

1.5.2 Daily baseline import from Kaggle

  1. Normalize Kaggle CSV to a stable staging format

    • Motivation: Kaggle datasets often contain malformed quoting that breaks standard parsers. The updated script uses Python's robust csv standard library to pre-parse data before vectorizing with Pandas, recovering ~99% of valid records (from <10k to >670k rows).
    • Input: ~/workspace/LargeFiles/TMDB_all_movies.csv
    • Output: ~/workspace/LargeFiles/tmdb_kaggle_normalized.csv
  2. Load Kaggle normalized CSV into staging

    • Truncate staging and load the normalized CSV into staging_movies_kaggle.

    • Use 02_load_kaggle_staging.sql.

    • PostgreSQL: Uses client-side \copy for simplicity.

    • openGauss: Uses robust container-file-copy (docker cp) plus server-side COPY to avoid shell buffer limits and encoding issues unique to gsql.

  3. Merge Kaggle staging into FilmDB core tables

    • Normalize and map country codes (e.g., ISO es mapped to FilmDB sp) via country_code_alias.

      • Insert new movies and update existing ones with deterministic rules:
      • Use tmdb_id as the primary identifier when available.
      • If tmdb_id is NULL, fall back to (title, country, year_released) as the legacy identity key.
    • Handle legacy-key collisions deterministically (e.g., if a new row’s legacy key matches an existing movie with a different tmdb_id, apply a traceable title suffix within length limits).

    • This is executed by 03_merge_kaggle_into_core.sql.

  4. Log baseline completion

    • Insert a success record into movie_update_log with source='kaggle' and the dataset version string if recorded.

    • This is included in 03_merge_kaggle_into_core.sql.

1.5.3 Incremental catch-up using TMDB API

  1. Determine the incremental window

    • Read the last successful sync date from pipeline_state.tmdb_last_sync.

    • Use it as start_date; set end_date to the current date (run day).

    • This can be retrieved by running 04_read_last_sync.sql.

  2. Fetch changed movie IDs and write a TMDb delta CSV

    • Use the TMDb “Changes” endpoint to fetch changed movie IDs between start_date and end_date (chunk into <=14-day windows if needed).

    • Fetch details for each changed ID, normalize fields into a stable delta CSV.

    • Output: ~/workspace/LargeFiles/tmdb_delta_normalized.csv.

  3. Load delta CSV into staging and merge into core

    1) Load tmdb_delta_normalized.csv into staging_movies_tmdb_delta using 05_load_tmdb_delta_staging.sql.

    1. Merge delta staging into core:

      • Insert missing movies by tmdb_id.

      • Update existing movies by tmdb_id using “fill-if-null / improve-if-empty” rules.

      • Update pipeline_state.tmdb_last_sync to the run day for the next one-click incremental update.

      • This is executed by 06_merge_tmdb_delta_into_core.sql.

  4. Log delta completion

    • Insert a success record into movie_update_log with source='tmdb'.

    • This is included in 06_merge_tmdb_delta_into_core.sql.

1.5.4 Data validation and integrity checks

After changes:

1) Verify referential integrity:

2) Verify uniqueness invariants:

3) Verify “newest year” signal:

4) Log validation results:

These checks are executed by 07_validate_and_log.sql, orchestrated by the wrapper script validate_all.sh which runs verify on both databases.

1.6 Implementation

To implement the update pipeline more conveniently and make it fully reproducible, we prepared a set of batch scripts (shell scripts) that automate the complete workflow. A specialized wrapper run_with_cpu_limit.sh is provided to enforce CPU affinity (pinning to cores 0-1) to prevent resource contention during intensive parallel data loading.

1.6.1 Environment Preparation

Before running any scripts, ensure the following environment is ready:

1.6.2 Database Initialization

These scripts rebuild the filmdb database from scratch, load the original filmdb.sql, and then apply schema extensions via 01_schema_enhancement.sql. This ensures the database contains the required staging tables, metadata tables, and the extra columns needed for the new pipeline (e.g., tmdb_id).

./init_pg.sh
./init_og.sh

Note: This step will drop (delete) the existing filmdb database and recreate it. Do not run it if you need to preserve the current database state.

1.6.3 Daily Update (Kaggle Baseline + TMDB Delta + Validation)

These scripts execute the full daily ETL process:

  1. Kaggle baseline import If ~/workspace/LargeFiles/TMDB_all_movies.csv exists, the script will automatically normalize it into a stable schema (fixed columns) and load it into staging tables, then merge into FilmDB core tables.
  2. TMDB incremental sync (delta layer) The script reads the “last sync date” from the database metadata table, calls the TMDB API to fetch changes since that date up to “today”, downloads the latest details for changed movies, loads the delta into staging, and merges it into core tables.
  3. Validation After merges, the script runs integrity checks (e.g., country foreign-key validity and tmdb_id uniqueness) and records run status in the update log.
export TMDB_BEARER_TOKEN="your_token_here"
./run_with_cpu_limit.sh ./daily_update_pg.sh
export TMDB_BEARER_TOKEN="your_token_here"
./run_with_cpu_limit.sh ./daily_update_og.sh

1.6.4 Export Deliverables (Final SQL Dumps)

After daily updates are complete, export the final deliverable SQL files. This produces two outputs to maximize compatibility and avoid environment-specific failures:

./run_with_cpu_limit.sh ./export_pg.sh

Output file: filmdb_pg.sql

./run_with_cpu_limit.sh ./export_og.sh

Output file: filmdb_og.sql

1.6.5 Troubleshooting

chmod +x *.sh
echo $TMDB_BEARER_TOKEN

If the baseline CSV is missing, re-download the Kaggle dataset and confirm it is extracted to the expected location before running the daily update scripts.

1.7 Results

1.7.1 init_pg.sh

1.7.2 init_og.sh

1.7.3 daily_update_pg.sh

1.7.4 daily_update_og.sh

1.7.5 validate_all.sh

1.7.6 export_pg.sh

1.7.7 export_og.sh

1.7.8 Ultimate Files

2. Lecture Notes Review

2.1 CS213 Lecture-01 CS213-DB-01.pdf

2.1.1 Slides Updates

2.1.1.1 Page 3

Replace with

2.1.1.2 Page 14

Change

2.1.1.3 Pages 15–16

Fix

2.1.1.4 Page 25

Replace with

2.1.1.5 Page 26

Replace with

2.1.1.6 Pages 29–36

Add one slide after Page 36

2.1.1.7 Pages 38–40

Replace with two concepts

2.1.1.8 Page 41

Replace with

2.1.1.9 Pages 61–64

Change

2.1.1.10 Page 74

Replace definitions with

Link: OpenTextBC normalization

2.1.1.11 Pages 76–83

Add one slide after Page 83

2.1.2 openGauss insertion

  1. Insert after Page 11 (DBMS introduction) Reason: The lecture introduces “DBMS” but gives no modern concrete system example; openGauss provides a locally relevant, PostgreSQL-like relational DBMS for later demos and labs. Link: openGauss overview

  2. Insert after Page 16 (history timeline ends at 2010s) Reason: Timeline stops at 2010s; add 2020s trends and introduce openGauss (2020–) to keep the “history” section current. Link: Huawei announcement

  3. Insert after Page 25 (row order) Reason: Students often confuse relational theory with SQL behavior; add a short openGauss query demo to reinforce “ORDER BY is required for guaranteed order”. Demo snippet for the new slide:

    SELECT title, year FROM movies;
    SELECT title, year FROM movies ORDER BY year;

    Link: openGauss ORDER BY

    Note: openGauss may run in different compatibility modes; in PostgreSQL-like mode the behavior matches PostgreSQL, but you should still treat order as unspecified without ORDER BY.

  4. Insert after Page 26 (duplicates) Reason: The slide states duplicates are forbidden, but SQL query results can contain duplicates; add openGauss demo to show DISTINCT and mention constraints (PRIMARY KEY, UNIQUE). Demo snippet for the new slide:

    INSERT INTO directors(firstname, surname) VALUES ('Chris','Nolan');
    INSERT INTO directors(firstname, surname) VALUES ('Chris','Nolan');
    
    SELECT firstname, surname FROM directors;
    SELECT DISTINCT firstname, surname FROM directors;

    Link: openGauss DISTINCT

  5. Insert after Page 36 (keys) Reason: Keys are introduced conceptually; add one openGauss DDL slide mapping keys to real constraints, so students see “concept → SQL”. Demo snippet for the new slide:

    CREATE TABLE directors (
       director_id SERIAL PRIMARY KEY,
       firstname   TEXT NOT NULL,
       surname     TEXT NOT NULL,
       CONSTRAINT uq_director_name UNIQUE(firstname, surname)
    );
  6. Insert after Page 83 (ER modeling) Reason: ER is taught, but students need the bridge to SQL; add an openGauss “ER → DDL” example, especially M:N junction tables. Demo snippet for the new slide:

    CREATE TABLE movie_actor (
       movie_id INT NOT NULL REFERENCES movies(movie_id),
       actor_id INT NOT NULL REFERENCES actors(actor_id),
       PRIMARY KEY (movie_id, actor_id)
    );

2.1.3 References

2.2 CS213 Lecture-02 CS213-DB-02.pdf

2.2.1 Slides Updates

2.2.1.1 Page 9

Fix

2.2.1.2 Page 17

Add after the “SQL standard exists” point

2.2.1.3 Pages 24–25

Replace with

2.2.1.4 Page 28

Replace with

2.2.1.5 Pages 30–31

Fix

Replace the date-time bullets with

2.2.1.6 Page 32

Replace the “bad CREATE TABLE” example with a clean version

CREATE TABLE people (
    peopleid    SERIAL PRIMARY KEY,
    first_name  VARCHAR(30),
    surname     VARCHAR(30) NOT NULL,
    born        INT NOT NULL,
    died        INT,
    CONSTRAINT ck_life CHECK (died IS NULL OR died >= born)
);
2.2.1.7 Page 36

Add

2.2.1.8 Page 46

Add

2.2.1.9 Page 49

Update

2.2.1.10 Pages 50–53

Add after the foreign key definition

2.2.1.11 Page 60

Replace with

2.2.1.12 Page 63

Add

2.2.1.13 Pages 65–66

Replace with

2.2.2 openGauss insertion

  1. Insert after Page 17 (SQL standard vs dialects) Reason: Students need one concrete dialect target; openGauss is a PostgreSQL-like dialect and fits the lecture’s “dialects” message. Link: openGauss overview

  2. Insert after Pages 24–25 (identifiers and naming) Reason: Quoting and case rules vary by product; add openGauss guidance to avoid teaching “quoted identifiers everywhere” habits. Suggested note: “Prefer lowercase snake_case; avoid quoted names unless required.”

  3. Insert after Page 28 (NULL vs empty string) Reason: Oracle’s empty-string-as-NULL is a common confusion point; add a one-line openGauss contrast so students don’t generalize Oracle behavior.

  4. Insert after Pages 30–31 (date/time types) Reason: The lecture mentions DATETIME vs TIMESTAMP; add an openGauss mapping slide so students know what to use in openGauss.

  5. Insert after Pages 50–53 (constraints and foreign keys) Reason: This is where students decide “do I actually enforce rules?”; add openGauss examples to show CHECK/UNIQUE/FK are real and enforceable. Demo snippet for the new slide:

    CREATE TABLE countries (
       country_code CHAR(2) PRIMARY KEY,
       country_name TEXT NOT NULL UNIQUE
    );
    
    CREATE TABLE movies (
       movieid        INT PRIMARY KEY,
       title          TEXT NOT NULL,
       country        CHAR(2) NOT NULL REFERENCES countries(country_code) ON DELETE RESTRICT,
       year_released  INT NOT NULL CHECK (year_released >= 1895),
       UNIQUE (title, country, year_released)
    );
  6. Insert after Pages 65–66 (INSERT, quoting, dates) Reason: This is where students write lots of literals; add openGauss “safe quoting + ISO date” examples to reduce beginner errors. Demo snippet for the new slide:

    INSERT INTO movies(movieid, title, country, year_released)
    VALUES (1, 'L''Avventura', 'it', 1960);
    
    SELECT CURRENT_DATE, CURRENT_TIMESTAMP;

2.2.3 References

2.3 CS213 Lecture-03 CS213-DB-03.pdf

2.3.1 Slides Updates

2.3.1.1 Page 9

Replace with

2.3.1.2 Page 11

Fix

Replace the derived-table example with a clean version

SELECT *
FROM (
    SELECT *
    FROM movies
    WHERE country = 'us'
) AS us_movies
WHERE year_released BETWEEN 1940 AND 1949;
2.3.1.3 Page 13

Fix

Add

2.3.1.4 Page 14

Add

2.3.1.5 Page 15

Add

2.3.1.6 Page 23

Add

2.3.1.7 Page 26

Fix

Replace with

2.3.1.8 Page 27

Fix

Replace with

2.3.1.9 Pages 29–31

Replace with

2.3.1.10 Pages 33–36

Replace with

2.3.1.11 Page 55

Replace with

2.3.1.12 Page 57

Fix the concatenation examples

2.3.1.13 Page 58

Replace with

SELECT
    title || ' was released in ' || CAST(year_released AS varchar) AS movie_release
FROM movies
WHERE country = 'us';
2.3.1.14 Page 65

Fix

2.3.1.15 Pages 72–73 and 78–79

Replace with

2.3.2 openGauss insertion

  1. Insert after Page 9 (string vs identifier quoting) Reason: Students copy quoting habits early; openGauss follows PostgreSQL-style quoting (single quotes for strings, double quotes for identifiers). Link: gsql Tool Reference

  2. Insert after Pages 26–27 (case-insensitive matching, performance) Reason: The lecture warns that upper(title) in WHERE is slow; openGauss provides ILIKE as the explicit solution. Also, openGauss Dolphin compatibility can change LIKE behavior, so teach ILIKE to be unambiguous. Demo snippet:

    SELECT *
    FROM movies
    WHERE title ILIKE '%a%';

    Link: Mode Matching Operators Link: Dolphin Character Processing Functions and Operators

  3. Insert after Pages 29–31 (ISO dates + parsing) Reason: Date input defaults vary by DBMS; openGauss students should learn ISO literals and explicit parsing. Demo snippet:

    SELECT DATE '2018-03-12';
    SELECT to_date('12 March, 2018', 'DD Month, YYYY');

    Link: Date and Time Functions and Operators

  4. Insert after Pages 33–36 (timestamp “whole-day” filtering) Reason: This is a frequent real-world bug; add one openGauss example showing the half-open interval pattern. Demo snippet:

    SELECT *
    FROM issues
    WHERE issued >= DATE '2021-03-02'
     AND issued <  DATE '2021-03-03';

    Link: Date and Time Functions and Operators

  5. Insert after Page 55 (how to inspect tables in openGauss) Reason: Students need an actionable “inspect schema” workflow for the course DBMS. Demo snippet:

    -- in gsql
    \d movies
    \dt
    
    -- portable SQL alternative
    SELECT column_name, data_type
    FROM information_schema.columns
    WHERE table_name = 'movies'
    ORDER BY ordinal_position;

    Link: gsql Tool Reference Link: Meta-Command Reference

  6. Insert after Pages 78–79 (CASE for NULL labeling) Reason: Connect CASE to a simple reporting task and reinforce correct NULL handling in openGauss. Demo snippet:

    SELECT surname,
          CASE
              WHEN died IS NULL THEN 'Alive'
              ELSE 'Deceased'
          END AS status
    FROM people;

    Link: Expressions

2.3.3 References

2.4 CS213 Lecture-04 CS213-DB-04.pdf

2.4.1 Slides Updates

2.4.1.1 Page 3

Replace with

2.4.1.2 Page 7

Fix

Add

2.4.1.3 Pages 14–15

Replace with

2.4.1.4 Page 16

Update

2.4.1.5 Page 19

Add

2.4.1.6 Page 21

Replace with

2.4.1.7 Page 22

Add

2.4.1.8 Pages 23–27

Replace with

2.4.1.9 Pages 33 and 67

Fix

Replace with a complete version

SELECT COUNT(*) AS number_of_acting_directors
FROM (
    SELECT peopleid
    FROM (
        SELECT DISTINCT peopleid, credited_as
        FROM credits
        WHERE credited_as IN ('A', 'D')
    ) AS t
    GROUP BY peopleid
    HAVING COUNT(*) = 2
) AS acting_directors;
2.4.1.10 Page 45

Fix

2.4.1.11 Page 43

Replace with

2.4.1.12 Page 49

Update

2.4.1.13 Page 53

Replace with

2.4.1.14 Pages 68–69

Replace with

2.4.2 openGauss insertion

  1. Insert after Page 6 (DISTINCT basics) Reason: The lecture introduces DISTINCT but lacks a concrete course DBMS demo; add an openGauss runnable example and reinforce “row-level distinctness”. Demo snippet:

    SELECT DISTINCT country
    FROM movies
    WHERE year_released = 2000;
    
    SELECT DISTINCT country, year_released
    FROM movies
    WHERE year_released IN (2000, 2001);

    Link: openGauss DISTINCT

  2. Insert after Page 13 (GROUP BY syntax rule) Reason: Students commonly select non-aggregated columns that are not in GROUP BY; add an openGauss example showing the correct pattern. Demo snippet:

    SELECT country,
          COUNT(*) AS number_of_movies
    FROM movies
    GROUP BY country;

    Link: openGauss GROUP BY Clause

  3. Insert after Page 19 (HAVING vs WHERE) Reason: The lecture introduces HAVING, but students confuse it with WHERE; add a short openGauss “before vs after grouping” demo. Demo snippet:

    SELECT country,
          MIN(year_released) AS oldest_movie
    FROM movies
    GROUP BY country
    HAVING MIN(year_released) < 1940;

    Link: openGauss HAVING Clause

  4. Insert after Page 22 (optimizer visibility) Reason: The slides mention optimizers differ; add one openGauss tool students can use to observe plans and costs. Demo snippet:

    EXPLAIN
    SELECT country, COUNT(*)
    FROM movies
    GROUP BY country;

    Link: openGauss EXPLAIN

  5. Insert after Page 27 (COUNT differences with NULLs) Reason: The lecture explains the concept, but a small openGauss demo makes it stick and prevents common mistakes in analytics queries. Demo snippet:

    SELECT COUNT(*) AS total_rows,
          COUNT(died) AS non_null_died
    FROM people;

    Link: openGauss Aggregate Functions

  6. Insert after Page 54 (JOIN with aliases as the default style) Reason: The lecture shows multiple syntaxes (NATURAL/USING/ON); standardize on the safest teaching pattern for openGauss labs. Demo snippet:

    SELECT DISTINCT p.first_name, p.surname
    FROM people AS p
    JOIN credits AS c
       ON c.peopleid = p.peopleid
    WHERE c.credited_as = 'D';

    Link: openGauss JOIN Clause

  7. Insert after Page 69 (avoid accidental Cartesian products) Reason: The lecture warns about forgetting join predicates; add an openGauss-specific habit: always write explicit joins, and use CROSS JOIN only intentionally. Demo snippet:

    SELECT COUNT(*)
    FROM movies m
    CROSS JOIN countries c;

    Link: openGauss JOIN Clause

2.4.3 References

2.5 CS213 Lecture-05 CS213-DB-05.pdf

2.5.1 Slides Updates

2.5.1.1 Page 10

Replace with

2.5.1.2 Page 16

Replace with

2.5.1.3 Page 18

Replace with

2.5.1.4 Pages 21–27

Add

2.5.1.5 Page 29

Replace with a simpler “correct” solution

    SELECT
        m.year_released,
        m.title,
        p.first_name,
        p.surname
    FROM movies m
    LEFT OUTER JOIN credits c
        ON c.movieid = m.movieid
       AND c.credited_as = 'D'
    LEFT OUTER JOIN people p
        ON p.peopleid = c.peopleid
    WHERE m.country = 'gb';
2.5.1.6 Page 32

Replace with

2.5.1.7 Pages 35–36

Add

2.5.1.8 Pages 38–41

Replace the placeholder subquery with a complete version

    SELECT
        x.movieid,
        SUM(x.view_count) AS view_count
    FROM (
        SELECT
            'last year' AS period,
            movieid,
            SUM(view_count) AS view_count
        FROM last_year_data
        GROUP BY movieid
        UNION ALL
        SELECT
            'this year' AS period,
            movieid,
            SUM(view_count) AS view_count
        FROM current_year_data
        GROUP BY movieid
    ) x
    GROUP BY x.movieid;
2.5.1.9 Page 47

Fix the broken INTERSECT example

    SELECT country_code
    FROM countries
    INTERSECT
    SELECT DISTINCT country
    FROM movies;
2.5.1.10 Pages 44 and 50

Add

2.5.1.11 Page 66

Replace with

2.5.1.12 Pages 76–79

Add

2.5.1.13 Page 81

Replace with

2.5.1.14 Page 85

Add

2.5.2 openGauss insertion

  1. Insert after Page 10 (outer join types in openGauss) Reason: The slide claims only LEFT OUTER JOIN matters; openGauss supports LEFT/RIGHT/FULL, so students should learn the full vocabulary and the “rewrite RIGHT as LEFT” habit. Link: JOIN (CROSS/INNER/LEFT/RIGHT/FULL)

  2. Insert after Page 16 (COALESCE in openGauss) Reason: This is the practical “show 0 instead of NULL” moment; add the official openGauss function reference to match what students will run in labs. Demo snippet:

       SELECT
           c.country_name,
           COALESCE(x.number_of_movies, 0) AS number_of_movies
       FROM countries c
       LEFT OUTER JOIN (
           SELECT
               country AS country_code,
               COUNT(*) AS number_of_movies
           FROM movies
           GROUP BY country
       ) x
           ON x.country_code = c.country_code;

    Link: COALESCE

  3. Insert after Page 27 (outer join filter placement in openGauss) Reason: The “outer join killer” is a frequent beginner bug; show the correct openGauss pattern: move the filter into ON. Demo snippet:

       SELECT
           m.title,
           p.surname
       FROM movies m
       LEFT OUTER JOIN credits c
           ON c.movieid = m.movieid
          AND c.credited_as = 'D'
       LEFT OUTER JOIN people p
           ON p.peopleid = c.peopleid
       WHERE m.country = 'gb';

    Link: SELECT (JOIN nesting rules)

  4. Insert after Page 36 (UNION vs UNION ALL in openGauss) Reason: Students often default to UNION and accidentally lose rows; openGauss supports UNION ALL and documents the required column/type matching. Link: UNION Clause

  5. Insert after Page 51 (INTERSECT/EXCEPT naming and use in openGauss) Reason: The slides mention EXCEPT / MINUS; openGauss uses EXCEPT and supports set-operation type resolution similarly to UNION. Link: UNION, CASE, and Related Constructs

  6. Insert after Page 79 (safe anti-join: NOT EXISTS in openGauss) Reason: NOT IN + NULL is a real production pitfall; openGauss documents EXISTS/NOT EXISTS directly. Demo snippet:

       SELECT p.*
       FROM people p
       WHERE NOT EXISTS (
           SELECT 1
           FROM people p2
           WHERE p2.born < 1950
             AND p2.first_name = p.first_name
       );

    Link: Subquery Expressions (EXISTS/NOT EXISTS)

2.5.3 References

2.6 CS213 Lecture-06 CS213-DB-06.pdf

2.6.1 Slides Updates

2.6.1.1 Pages 3–9

Add

2.6.1.2 Page 11

Replace with

2.6.1.3 Page 14

Replace with

2.6.1.4 Page 15

Add

Example to add:

SELECT
    to_char(a_date_column, 'MM/DD/YYYY') AS event_date,
    other_cols
FROM some_table
ORDER BY a_date_column;
2.6.1.5 Pages 17–18

Replace with

ORDER BY
    CASE credited_as
        WHEN 'D' THEN 1
        WHEN 'P' THEN 2
        WHEN 'A' THEN 3
        ELSE 99
    END,
    postid;
2.6.1.6 Pages 22–26

Add

2.6.1.7 Pages 33–35

Replace with

ORDER BY concat(coalesce(path, ''), lpad(postid::text, 9, '0'), '.');

Add

2.6.1.8 Page 37

Update

2.6.1.9 Pages 52–53

Fix (incomplete queries): Replace with a complete version

SELECT
    country_name,
    cnt AS number_of_movies,
    round(100.0 * cnt / sum(cnt) OVER (), 0) AS percentage
FROM (
    SELECT
        c.country_name,
        coalesce(m.cnt, 0) AS cnt
    FROM countries c
    LEFT JOIN (
        SELECT country, count(*) AS cnt
        FROM movies
        GROUP BY country
    ) m
        ON m.country = c.country_code
) q
ORDER BY country_name;

Add

2.6.1.10 Pages 60 and 67

Fix (broken SQL layout): Replace with a clean version

SELECT
    title,
    country,
    year_released,
    row_number() OVER (
        PARTITION BY country
        ORDER BY year_released DESC
    ) AS rn
FROM movies;

Fix (incomplete “top 2 per country” query): Replace with

SELECT
    x.country,
    x.title,
    x.year_released
FROM (
    SELECT
        country,
        title,
        year_released,
        row_number() OVER (
            PARTITION BY country
            ORDER BY year_released DESC, movieid DESC
        ) AS rn
    FROM movies
) x
WHERE x.rn <= 2;

2.6.2 openGauss insertion

  1. Insert after Page 11 (NULL ordering) Reason: The slides say NULL ordering is DBMS-dependent; add an openGauss demo showing explicit NULLS FIRST/LAST to avoid surprises across systems. Demo snippet for the new slide:
SELECT
    title,
    year_released
FROM movies
ORDER BY year_released ASC NULLS LAST;

Link: openGauss ORDER BY Clause

  1. Insert after Page 14 (collation) Reason: The slide uses Oracle-flavored examples; add an openGauss-friendly note that PostgreSQL-like systems use COLLATE, and collation availability depends on installed locales. Suggested note: “Use COLLATE when locale-specific ordering matters; test on the target server.”
  2. Insert after Pages 22–26 (top-N and pagination) Reason: Students will implement paging immediately; add an openGauss example for LIMIT/OFFSET and a brief warning about large offsets. Demo snippet for the new slide:
SELECT
    title,
    country,
    year_released
FROM movies
ORDER BY title
LIMIT 10 OFFSET 20;

Link: openGauss ORDER BY Clause

  1. Insert after Pages 33–35 (hierarchies) Reason: The lecture shows CONNECT BY (Oracle-specific) and mentions recursive queries next; add one openGauss slide that uses WITH RECURSIVE as the portable approach. Demo snippet for the new slide:
WITH RECURSIVE thread AS (
    SELECT postid, answered_postid, message, 1 AS depth
    FROM forum_posts
    WHERE answered_postid IS NULL
  UNION ALL
    SELECT p.postid, p.answered_postid, p.message, t.depth + 1
    FROM forum_posts p
    JOIN thread t
        ON p.answered_postid = t.postid
)
SELECT *
FROM thread
ORDER BY depth, postid;
  1. Insert after Pages 52–53 (percentage of total) Reason: The slide’s main query is incomplete; add a runnable openGauss version so students can copy/paste and verify sum(...) over (). Demo snippet for the new slide: use the completed query from Section 2.6.1.9.
  2. Insert after Pages 60–67 (top-N per group) Reason: This is a high-value pattern (“top K per category”); add an openGauss slide with a clean row_number() solution and one sentence on tie behavior (row_number vs dense_rank). Demo snippet for the new slide: use the completed query from Section 2.6.1.10.

2.6.3 References

2.7 CS213 Lecture-07 CS213-DB-07.pdf

2.7.1 Slides Updates

2.7.1.1 Pages 8–12

Add

2.7.1.2 Page 9

Fix

2.7.1.3 Page 20

Fix

2.7.1.4 Page 21

Fix

2.7.1.5 Page 23

Change

2.7.1.6 Page 26

Replace with

2.7.1.7 Page 28

Fix

Add

2.7.1.8 Page 29

Fix

Add

2.7.1.9 Page 37

Replace with

2.7.1.10 Page 38

Add

2.7.1.11 Pages 54–59

Replace with

2.7.1.12 Page 68

Fix

2.7.1.13 Pages 70–73, 80–82

Fix

2.7.2 openGauss insertion

  1. Insert after Page 12 (full-text search overview) Reason: The lecture builds a manual word-index table; add one slide showing openGauss has built-in full-text search operators/functions so students see the “real” approach. (docs.opengauss.org) Link: openGauss Text Search Functions and Operators Demo snippet for the new slide:

    SELECT movieid, title
    FROM movies
    WHERE to_tsvector('simple', title) @@ to_tsquery('simple', 'space & odyssey & 2001');
  2. Insert after Page 28 (autocommit warning) Reason: Students using openGauss should practice explicit transactions in gsql, aligned with the “Beware of autocommit” message. (docs.opengauss.org) Link: openGauss gsql Tool Demo snippet for the new slide:

    BEGIN;
    DELETE FROM movies WHERE movieid = 25;
    ROLLBACK;
  3. Insert after Page 29 (DDL and commits differ by DBMS) Reason: The slide highlights portability risk; add one openGauss example to show PostgreSQL-like “DDL is transactional” behavior in a concrete way. Demo snippet for the new slide:

    BEGIN;
    CREATE TABLE ddl_demo(id INT);
    ROLLBACK;
  4. Insert after Page 46 (sequence function syntax) Reason: The lecture lists multiple syntaxes; add the openGauss function form and one key rule (currval/lastval require nextval in the session). (docs.opengauss.org) Link: openGauss Sequence Functions

  5. Insert after Page 59 (retrieving generated IDs) Reason: Replace the “last generated value” habit with the clearer, safer openGauss pattern: RETURNING. (docs.opengauss.org) Link: openGauss INSERT Statement Demo snippet for the new slide:

    INSERT INTO movies(title)
    VALUES ('Some Movie Title')
    RETURNING movieid;
  6. Insert after Page 82 (PostgreSQL COPY / psql \copy) Reason: The lecture mentions PostgreSQL COPY and \copy; add the openGauss gsql \copy equivalent so students can import local files without server-side file access. (docs.opengauss.org) Link: openGauss gsql \copy Meta-Command Demo snippet for the new slide:

    \copy us_movie_info
    FROM 'us_movie_info.csv'
    WITH (FORMAT csv);

2.7.3 References

2.8 CS213 Lecture-08 CS213-DB-08.pdf

2.8.1 Slides Updates

2.8.1.1 Page 3

Fix

2.8.1.2 Page 4

Fix

2.8.1.3 Page 8

Replace with

2.8.1.4 Pages 10–12

Add

2.8.1.5 Pages 15–16

Fix

UPDATE movies_2 AS m
SET duration = i.duration,
    color = CASE i.color
        WHEN 'C' THEN 'Y'
        WHEN 'B' THEN 'N'
        ELSE NULL
    END
FROM us_movie_info AS i
WHERE m.country = 'us'
  AND i.title = m.title
  AND i.year_released = m.year_released;
2.8.1.6 Page 24

Replace with

2.8.1.7 Pages 25–26

Replace with

2.8.1.8 Pages 27–29

Tighten

2.8.1.9 Page 33

Add

2.8.1.10 Page 35

Replace with

2.8.1.11 Pages 36–39

Fix

Add

2.8.1.12 Page 41

Replace with

2.8.1.13 Pages 44–47

Fix

CREATE OR REPLACE FUNCTION full_name(the_name varchar)
RETURNS varchar
AS $$
BEGIN
    RETURN CASE
        WHEN position(',' IN the_name) > 0 THEN
            trim(' ' FROM substr(the_name, position(',' IN the_name) + 1))
            || ' ' ||
            trim(')' FROM substr(the_name, 1, position(',' IN the_name) - 1))
        ELSE
            the_name
    END;
END;
$$ LANGUAGE plpgsql;
2.8.1.14 Pages 49–55

Replace with

2.8.1.15 Page 56

Replace with

2.8.2 openGauss insertion

  1. Insert after Page 12 (set-based update patterns) Reason: The slides focus on correlated subqueries; add the openGauss-friendly UPDATE ... FROM pattern as the default approach. Link: UPDATE Statement
  2. Insert after Page 24 (duplicate join matches) Reason: Students need one concrete rule: ensure the join keys are unique, or the update result can be unpredictable. Show “add UNIQUE or pre-aggregate” as the fix. Link: UPDATE Statement
  3. Insert after Page 27 (MERGE as standard UPSERT) Reason: Confirm MERGE support and highlight the “match condition should be key-like” rule. Link: MERGE INTO Statement Guide
  4. Insert after Page 29 (dialect differences for UPSERT) Reason: The lecture shows MySQL/SQLite syntax; add the openGauss angle: ON DUPLICATE KEY UPDATE requires a unique constraint or unique index. Link: INSERT Guide
  5. Insert after Page 35 (TRUNCATE safety) Reason: The rollback claim is risky; add a TRUNCATE slide focused on what it does (fast page release + strong locks) and when to use it. Link: TRUNCATE Guide
  6. Insert after Page 47 (functions and procedures) Reason: The lecture introduces PL/pgSQL and procedures; add one minimal example for each so students can run them immediately. Link: PL/pgSQL Functions Link: CREATE PROCEDURE

2.8.3 References

2.9 CS213 Lecture-09 CS213-DB-09.pdf

2.9.1 Slides Updates

2.9.1.1 Page 1

Fix

2.9.1.2 Page 3

Replace with

2.9.1.3 Page 6

Replace with

2.9.1.4 Page 9

Add

2.9.1.5 Pages 14–16

Update

Suggested replacement snippet:

INSERT INTO movies(title, country, year_released)
SELECT p_title, country_code, p_year
FROM countries
WHERE country_name = p_country_name
RETURNING movieid
INTO n_movieid;
2.9.1.6 Page 17

Replace with

2.9.1.7 Pages 19–25

Fix

2.9.1.8 Pages 38–40

Replace with

2.9.1.9 Page 43

Replace with

Suggested corrected table:

CREATE TABLE people_audit (
    auditid         SERIAL PRIMARY KEY,
    peopleid        INT NOT NULL,
    type_of_change  CHAR(1),
    column_name     VARCHAR,
    old_value       VARCHAR,
    new_value       VARCHAR,
    changed_by      VARCHAR,
    time_changed    TIMESTAMP
);
2.9.1.10 Page 44

Fix

SELECT
    peopleid,
    'U' AS type_of_change,
    column_name,
    old_value,
    new_value,
    current_user || '@' || COALESCE(inet_client_addr()::text, 'localhost') AS changed_by,
    current_timestamp AS time_changed
FROM ...;
2.9.1.11 Page 47

Update

CREATE TRIGGER people_trg
AFTER INSERT OR UPDATE OR DELETE ON people
FOR EACH ROW
EXECUTE FUNCTION people_audit_fn();
2.9.1.12 Pages 53–59

Replace with

2.9.1.13 Pages 63–67

Replace with

2.9.2 openGauss insertion

  1. Insert after Page 3 (functions vs procedures) Reason: The slide text implies PostgreSQL has only functions; add a concrete openGauss target model: functions vs procedures and how students should call each. Link: CREATE PROCEDURE Link: CALL

  2. Insert after Page 6 (transaction boundaries) Reason: Students often try to “COMMIT inside a function”; add an openGauss note: use procedures for business operations and keep set-based SQL inside. Suggested note: “Use procedures for multi-step operations; keep functions lightweight and side-effect controlled.”

  3. Insert after Pages 14–16 (safe ID retrieval) Reason: Replace lastval() habit with the safer pattern students can copy: INSERT ... RETURNING in openGauss (PostgreSQL-like). Link: INSERT Demo snippet for the new slide:

    INSERT INTO movies(title, country, year_released)
    VALUES ('Some Title', 'us', 2001)
    RETURNING movieid;
  4. Insert after Page 17 (calling syntax) Reason: The slide suggests calling “procedures” via SELECT; add openGauss examples that distinguish SELECT fn() vs CALL proc(). Demo snippet for the new slide:

    SELECT some_void_function(...);
    CALL some_procedure(...);
  5. Insert after Page 32 (trigger timing and NEW/OLD) Reason: When students implement triggers, they need one runnable openGauss example showing NEW / OLD and timing choice. Link: CREATE TRIGGER

  6. Insert after Pages 43–47 (auditing pattern) Reason: The lecture demonstrates auditing in PostgreSQL; add an openGauss-ready version emphasizing TIMESTAMP, current_user, and one-row-per-changed-column design. Link: PL/pgSQL

2.9.3 References

2.10 CS213 Lecture-10 CS213-DB-10.pdf

2.10.1 Slides Updates

2.10.1.1 Page 6

Replace with

2.10.1.2 Pages 11–15

Update

2.10.1.3 Page 12

Clarify

2.10.1.4 Pages 18–20

Add

2.10.1.5 Page 21

Update

2.10.1.6 Pages 32–33

Add

2.10.1.7 Page 34

Improve

2.10.1.8 Pages 44–46

Add

2.10.1.9 Page 51

Add

2.10.1.10 Page 80

Add

2.10.1.11 Pages 96–102

Fix + Clarify

2.10.2 openGauss insertion

  1. Insert after Page 21 (buffering and read-ahead) Reason: Students learn storage hierarchy best when they can map it to real DB configuration knobs in a concrete system. Demo snippet for the new slide:

    SHOW shared_buffers;
    SHOW effective_cache_size;
  2. Insert after Page 33 (slotted pages and page size) Reason: The lecture discusses pages conceptually; a one-line query makes “page size” real and testable in openGauss. Demo snippet for the new slide:

    SHOW block_size;
    SELECT current_setting('block_size');
  3. Insert after Page 46 (PK/UNIQUE create indexes “behind your back”) Reason: This is a perfect place to prove it with a catalog query in openGauss, reinforcing that constraints usually imply indexes. Demo snippet for the new slide:

    SELECT indexname, indexdef
    FROM pg_indexes
    WHERE tablename = 'people';
  4. Insert after Page 80 (EXPLAIN) Reason: Students need a repeatable workflow for performance: write query → EXPLAIN → change index/query → re-check. openGauss is a practical default target. Demo snippet for the new slide:

    EXPLAIN
    SELECT *
    FROM people
    WHERE surname = 'Marvin';
    
    EXPLAIN ANALYZE
    SELECT *
    FROM people
    WHERE surname = 'Marvin';
  5. Insert after Pages 96–102 (LIKE/functions/date parts/implicit casts) Reason: The slides correctly warn about “dead indexes”, but students also need the standard fixes in a real DBMS: expression indexes, range predicates, and explicit casting. Demo snippet for the new slide:

    CREATE INDEX people_upper_surname_idx
    ON people (upper(surname));
    
    EXPLAIN
    SELECT *
    FROM people
    WHERE upper(surname) = 'MARVIN';
    
    EXPLAIN
    SELECT *
    FROM events
    WHERE date_column >= DATE '2025-06-01'
     AND date_column <  DATE '2025-07-01';
    
    EXPLAIN
    SELECT *
    FROM t
    WHERE varchar_code = CAST(12345678 AS VARCHAR);

2.10.3 References

2.11 CS213 Lecture-11 CS213-DB-11.pdf

2.11.1 Slides Updates

2.11.1.1 Page 5

Fix the syntax layout of CREATE VIEW (column list placement). Replace with

    CREATE VIEW viewname (col1, ..., coln)
    AS
    SELECT ... ;
2.11.1.2 Page 10

Reorder the example so it is executable SQL. Replace with

    CREATE VIEW vmovies AS
    SELECT
        m.movieid,
        m.title,
        m.year_released,
        c.country_name
    FROM movies m
    INNER JOIN countries c
        ON c.country_code = m.country;
2.11.1.3 Page 14

Polish wording + make the “SELECT *” warning more explicit.

2.11.1.4 Pages 18–19

Tone cleanup (remove insulting phrasing).

2.11.1.5 Page 23

Grammar fix.

2.11.1.6 Page 25

Add one mitigation sentence after the “function column breaks index usage” point.

2.11.1.7 Page 43

Fix REVOKE syntax (use FROM, not TO). Replace with

    GRANT  <privilege> ON <object> TO   <role_or_user>;
    REVOKE <privilege> ON <object> FROM <role_or_user>;
2.11.1.8 Page 48

Make the “schema usage” remark concrete with one example.

2.11.1.9 Page 51

Complete the GRANT statement.

2.11.1.10 Pages 52–55

Fix the “current user” function to be portable in PostgreSQL-like systems.

    CREATE VIEW my_stuff AS
    SELECT *
    FROM stuff
    WHERE username = CURRENT_USER;
2.11.1.11 Page 54

Clarify the “don’t drop & recreate” warning with the exact action.

2.11.1.12 Page 58

Replace “alter view or replace view” with precise commands.

2.11.1.13 Pages 59–61

Add a 1-slide takeaway (security limitation).

2.11.2 openGauss insertion

2.11.2.1 Insert after Page 5 (View syntax)

Insert content: openGauss CREATE VIEW / CREATE OR REPLACE VIEW syntax and a minimal example. Reason: anchors the lecture’s generic SQL to the course engine. Official link: openGauss CREATE VIEW

2.11.2.2 Insert after Page 25 (Computed expressions and indexes)

Insert content: expression index idea for computed predicates (when supported by the engine and function properties allow). Reason: provides a concrete “what to do next” for the performance pitfall. Official link: openGauss CREATE INDEX

2.11.2.3 Insert after Pages 35–36 (Views-on-views performance)

Insert content: materialized view as an option for expensive “reporting views” + refresh concept. Reason: offers a standard performance escape hatch when the view is intentionally heavy. Official link: openGauss CREATE MATERIALIZED VIEW

2.11.2.4 Insert after Pages 43–49 (Privileges)

Insert content: openGauss GRANT/REVOKE syntax reference + schema privilege note. Reason: students can immediately map the lecture’s conceptual DCL to exact commands. Official links:

2.11.2.5 Insert after Page 54 (Modifying a view)

Insert content: “don’t drop; use replace” workflow and how to verify a view definition (system catalogs / built-in helpers). Reason: matches the slide warning with a practical checklist. Official link: openGauss CREATE VIEW

2.11.3 References

2.12 CS213 Lecture-12 CS213-DB-12.pdf

2.12.1 Slides Updates

2.12.1.1 Page 4

Fix

Add

2.12.1.2 Page 6

Polish

2.12.1.3 Pages 6–8

Replace with

2.12.1.4 Pages 10–11

Refine

2.12.1.5 Pages 12–13

Replace

Clarify

2.12.1.6 Pages 16–18

Add

2.12.1.7 Pages 19–22

Improve

2.12.1.8 Page 25

Fix

Add

2.12.1.9 Pages 28–29

Fix

2.12.1.10 Page 30

Replace with

2.12.1.11 Page 32

Fix

Add

2.12.1.12 Page 39

Add

2.12.2 openGauss insertion

  1. Insert after Page 8 (updating a join view) Reason: The lecture shows “SQL Server vs most products”; add one concrete openGauss outcome so students learn what happens in the course DBMS. Link: openGauss CREATE VIEW

  2. Insert after Pages 12–13 (user-dependent filtering in a view) Reason: The example uses user; replace with CURRENT_USER and show the openGauss-friendly pattern for per-user filtering. Demo snippet for the new slide:

    CREATE OR REPLACE VIEW vmy_movies AS
    SELECT
       m.movieid,
       m.title,
       m.year_released,
       m.country
    FROM movies m
    WHERE m.country IN (
       SELECT
           c.country_code
       FROM countries c
       INNER JOIN user_scope u
           ON u.continent = c.continent
       WHERE u.username = CURRENT_USER
    );

    Link: openGauss CREATE VIEW

  3. Insert after Page 20 (WITH CHECK OPTION) Reason: This is the exact point where students see “disappearing rows”; add openGauss syntax + a 2-line insert/update demo that fails under CHECK OPTION. Link: openGauss CREATE VIEW

  4. Insert after Pages 25–26 (INSTEAD OF triggers) Reason: The slides introduce INSTEAD OF triggers conceptually; add openGauss “how you actually do it” to connect the idea to practice. Links:

    • openGauss CREATE TRIGGER
    • openGauss CREATE FUNCTION
  5. Insert after Pages 33–37 (data dictionary access) Reason: The lecture names information_schema and pg_... views; add 2–3 catalog queries students can run in openGauss to list tables/views/columns. Suggested demo snippet:

    SELECT
       table_schema,
       table_name,
       table_type
    FROM information_schema.tables
    WHERE table_schema = 'public';
    
    SELECT
       table_name,
       column_name,
       ordinal_position,
       data_type
    FROM information_schema.columns
    WHERE table_name = 'movies'
    ORDER BY ordinal_position;

2.12.3 References

2.13 CS213 Lecture-13 CS213-DB-13.pdf

2.13.1 Slides Updates

2.13.1.1 Page 3

Replace with

2.13.1.2 Page 11

Add

2.13.1.3 Page 14

Replace with

2.13.1.4 Pages 15–16

Add

2.13.1.5 Pages 22–26

Add a note (terminology clarity)

2.13.1.6 Pages 30–31

Fix

2.13.1.7 Page 32

Replace with

2.13.1.8 Page 36

Replace with

2.13.1.9 Page 38

Fix (wording)

2.13.1.10 Page 43

Replace with

2.13.1.11 Pages 46–53

Add (one slide)

2.13.2 openGauss insertion

  1. Insert after Page 11 (cost estimation and catalog statistics) Reason: The slide introduces “statistics from the catalog”; add one practical openGauss workflow so students can connect “stats → plan choice”. Link: openGauss EXPLAIN Link: openGauss ANALYZE Demo snippet for the new slide:

    ANALYZE movies;
    
    EXPLAIN
    SELECT *
    FROM movies
    WHERE movieid = 125;
    
    EXPLAIN ANALYZE
    SELECT *
    FROM movies
    WHERE movieid = 125;
  2. Insert after Page 18 (Seq Scan vs Index Scan wording) Reason: The slide names PostgreSQL operators; openGauss uses the same style of plan nodes, so one EXPLAIN screenshot/snippet makes the terminology stick. Link: openGauss EXPLAIN Demo snippet for the new slide:

    EXPLAIN
    SELECT *
    FROM movies
    WHERE movieid = 125;
  3. Insert after Pages 22–26 (clustered vs non-clustered indexes) Reason: These slides can mislead students into assuming “clustered index” is universal; add an openGauss note that physical clustering is not the default and is a separate operation. Link: openGauss CLUSTER

  4. Insert after Pages 32–34 (parsing, cache, and reuse) Reason: The lecture talks about caching parsed queries; openGauss examples should show the practical entry point: prepared statements (server-side) plus repeated EXECUTE. Link: openGauss PREPARE Demo snippet for the new slide:

    PREPARE q1 (INT) AS
    SELECT *
    FROM movies
    WHERE movieid = $1;
    
    EXECUTE q1(125);
    EXECUTE q1(126);
    
    DEALLOCATE q1;
  5. Insert after Pages 48–50 (two-phase commit) Reason: The lecture introduces 2PC conceptually; add an openGauss “prepared transaction” snippet so students see what 2PC looks like in SQL. Link: openGauss PREPARE TRANSACTION Demo snippet for the new slide:

    BEGIN;
    
    UPDATE accounts
    SET balance = balance - 100
    WHERE account_id = 1;
    
    PREPARE TRANSACTION 'tx_001';
    
    COMMIT PREPARED 'tx_001';

2.13.3 References

3. Course Content Proposal

3.1 Distributed and Cloud-Native Databases Track

3.1.1 Replication, Read/Write Splitting, and Consistency Expectations

3.1.2 Distributed Transactions: 2PC, Failure Modes, and Saga Alternatives

3.1.3 Elasticity and Cost–Performance Co-Optimization in Cloud Deployments

3.1.4 HTAP and Real-Time Analytics: OLTP Meets OLAP

3.2 AI-Era Databases and Retrieval Track

3.2.1 Production Text Search: From LIKE to Full-Text Search

3.2.2 Vector Search and Hybrid Retrieval: Concepts and Course-Level Practice

3.2.3 AI-Assisted Database Engineering Workflow: Verified, Reproducible, and Safe

3.2.4 Observability and Performance Regression: Evidence-Driven Debugging

3.2.5 Mini-Project Option: Search Feature for the Course Database

3.3 Privacy, Compliance, and Data Ethics Track

3.3.1 Data Classification and Minimization

3.3.2 Access Control and Auditability

3.3.3 De-identification, Masking, and Encryption Basics

3.3.4 Data Lifecycle: Retention, Deletion, and “Right to be Forgotten”

3.4 Database Low-Level Operating Principles Track

3.4.1 On-Disk Storage: Pages, Tuples, and Slotted Layout

3.4.2 Buffer Manager and I/O: Why Memory Settings Matter

3.4.3 WAL and Crash Recovery: Durability in Practice

3.4.4 Concurrency Control: MVCC, Locks, and Vacuum/GC

3.4.5 Query Execution Engine: Operators and Plan-to-Performance Mapping

3.4.6 Capstone Mini-Lab: “From Bytes to Behavior”

3.5 Summary

Overall, this proposal emphasizes three complementary aspects of upgrades to the course. First, it adds more modern database topics—replication, cloud-native cost–performance trade-offs, HTAP, and practical retrieval—so students can connect core SQL skills to how real systems are deployed today. Second, it strengthens “how databases work” understanding by introducing low-level operating principles (storage pages, buffer management, WAL/recovery, MVCC/locking, and execution operators), so performance and correctness become explainable rather than trial-and-error. Third, it formalizes responsible AI usage: students may use AI tools to draft ideas, but must verify outputs with official documentation, minimal reproducible tests, and evidence-driven performance evaluation, ensuring the final work is accurate, safe, and maintainable.

Summary

In this report, we upgrade FilmDB and the CS213 course materials along three aligned directions.

We modernize FilmDB into a “run-any-day” update pipeline: a daily Kaggle TMDB snapshot provides the baseline, the official TMDB API supplies incremental deltas, and staging-based merges plus integrity checks keep the database current beyond the original 2019 cutoff. We preserve legacy compatibility while introducing stable identifiers (e.g., tmdb_id), deterministic deduplication, run logging, and two safe deliverables (filmdb_pg.sql and filmdb_og.sql) to reduce dialect risk.

We review the lecture slides for correctness and teachability by fixing errors and incomplete SQL, tightening misleading claims, standardizing safer patterns, and adding openGauss insertions with official references so students have a consistent concrete DBMS target.

We propose modern course modules covering cloud/distributed databases, AI-era retrieval and evidence-driven performance work, privacy/compliance engineering, and low-level database internals—so students learn both modern deployments and the underlying mechanisms, while using AI tools responsibly through verification and reproducible tests.