higepon blog

17. clustered のコードを読む9 - PostgreSQL のソースコードを読む

db

前回紹介した cluster コマンド高速化パッチを読む。

HeapScanDesc hscan;
IndexInfo	*ii;
ScanKey		 scanKeys;
int			 i;

Tuplesortstate *tuplesort;
bool 		shouldfree;

HeapScanDesc、IndexInfo, ScanKey, Tuplesortstate が初登場。

HeapScanDesc
- heap を読んでいくときに使用するハンドル
IndexInfo
- index のメタデータ（attribute, 式indexかどうか？、uniqueか？）
ScanKey
- column や index などの大小比較を表す
Tuplesortstate
- Tuplesort 操作の state
Tuplesort とは？
- heap, index に限らずデータをソートするための仕組み
- データの大小に応じて、一時ファイルを使ってソートされたりもする(external sort)
- ソート時に今使っても良いメモリのサイズを kb 指定できる
  - なるほど。
- このあたりのことがすんなりと理解できるのは Database Management Systems のおかげだ。

パッチがやっていることは以下の通り。

IndexInfo を取得
- 式 index なら do_sort = false
- それ以外は do_sort = true
_bt_mkscankey_nodata で index の scan key を生成
tuplesort_begin_rawheap などで tuple をソートする

おまけ

tuplesort のコードには以下のようなコメントが。Knuth 先生偉大すぎる。

 * See Knuth, volume 3, for more than you want to know about the external
 * sorting algorithm.

リンク

PostgreSQL のソースコードを読む