1.1) (6 points)
Block Nested Loops Join
(A) Reads = |R| + |S| * ceil(|R|/(B-2)) = 5,000 + 15,000 * ceil(5,000/(800-2)) = 110,000
(B) Writes = 0 (no writes are required for nested loops joins)
1.2) (7 points)
Grace Hash Join
(A) Reads in Partition phase = |R| + |S| = 20,000
Reads in Probe phase = |R| + |S| = 20,000
Total reads = 2 * (|R| + |S|) = 40,000
(B) Total writes = |R| + |S| = 20,000
1.3) (7 points)
BNJ = 110,000 * 1 [reads; no writes] = 110,000
GHJ = 40,000 * 1 [reads] + 20,000 * X [writes]
From BNJ = GHJ → 110,000 = 40,000 + 20,000 * 3.5,
we get X = 3.5
So we should choose BNJ if X > 3.5, but GHJ if X < 3.5
Question 2 (20 points)
2.1) (4 points) B - 1 buckets
2.2) (4 points) Recursively apply hash-based projection technique to further split the buckets with
different hash functions until no more partition overflow.
2.3) (4 points) (B - 1)^2 buckets
2.4) (4 points) Accelerate the record matching in memory or reduce CPU cost of record matching.
2.5) (4 points) No. In the probing phase, using the same hash function will assign each data record into
the same bucket. No speeding up for join in the probing phase is obtained by using the same hash
function.