Skip to content

2022 OceanBase 数据库大赛 决赛小记

Posted on:2022.11.10

TOC

Open TOC

info

https://zhuanlan.zhihu.com/p/459185153

https://zhuanlan.zhihu.com/p/513498617

https://zhuanlan.zhihu.com/p/445201899

https://zhuanlan.zhihu.com/p/508331407

https://www.bilibili.com/video/BV1Ng411976V

https://space.bilibili.com/645669485/article

ref

快速体验 OceanBase

使用源码构建 OceanBase 数据库

规划 OceanBase 集群部署

observer

下载安装 OBD

sudo yum install -y yum-utils && sudo yum-config-manager --add-repo https://mirrors.aliyun.com/oceanbase/OceanBase.repo && sudo yum install -y ob-deploy

源码构建 OceanBase 数据库

sudo yum install -y git gdb wget cpio make glibc-devel glibc-headers ccache && sudo yum install -y rpm*
git clone -b dev-vgalaxy https://gitee.com/oceandb-space/oceanbase.git
cd oceanbase && sh build.sh init && sh build.sh debug
cd build_debug && make -j$(nproc) observer
ll src/observer/observer
make install DESTDIR=./
obd mirror create -n oceanbase-ce -V 4.0.0.0 -p ./usr/local -f -t obadvanced
obd repo list

配置文件 ob-advanced-auto.yaml

oceanbase-ce:
tag: obadvanced
servers:
- ip: 127.0.0.1
global:
home_path: /root/obadvanced
devname: lo
mysql_port: 2881
rpc_port: 2882
zone: zone1
memory_limit: 14G
system_memory: 12G
datafile_size: 100G
log_disk_size: 20G
syslog_level: INFO
syslog_io_bandwidth_limit: 10GB
enable_syslog_wf: false
enable_syslog_recycle: true
max_syslog_file_count: 4

部署

obd cluster autodeploy obadvanced -c ob-advanced-auto.yaml -A -f
obd cluster display obadvanced

修改 obadvanced/bin/observer 的软链接,并重启集群,即可反映修改

obclient

sudo yum install -y obclient
obclient -h127.0.0.1 -P2881 -uroot@test -Dtest

开放 2881 端口

增大事务超时时间,单位微秒

set global ob_trx_idle_timeout=3600000000;
set global ob_trx_timeout=3600000000;
set global ob_query_timeout=3600000000;

调整参数

obclient -h127.0.0.1 -P2881 -uroot@sys
alter system set system_memory='1G';
alter resource unit sys_unit_config memory_size='12G';

cli

vim / less

export TERM=xterm
cat xxx | less -S

gdb debug

ps -ef | grep observer
cd oceanbase
gdb attach {id}

或者考虑使用 cgdb

gdb-add-index

gdb-add-index /root/obadvanced/bin/observer

gdb core

gdb --core=core.*
file /root/obadvanced/bin/observer

vscode debug

插件

VS Code Remote-SSH: The vscode server failed to start SSH - Stack Overflow

未正确关闭 ssh 连接

用于 FindPID

添加 launch.json

{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "(gdb) Attach",
"type": "cppdbg",
"request": "attach",
"program": "/root/obadvanced/bin/observer",
"processId": "${input:FindPID}",
"MIMode": "gdb",
"miDebuggerPath": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
],
"sourceFileMap": {
"./build_debug/src/observer/./src/observer/mysql": {
"editorPath": "${workspaceFolder}/src/observer/mysql",
"useForBreakpoints": true
},
"./build_debug/src/observer/./src/observer/omt": {
"editorPath": "${workspaceFolder}/src/observer/omt",
"useForBreakpoints": true
},
"./build_debug/src/sql/parser/./src/sql/parser": {
"editorPath": "${workspaceFolder}/src/sql/parser",
"useForBreakpoints": true
},
"./build_debug/src/sql/./src/sql": {
"editorPath": "${workspaceFolder}/src/sql",
"useForBreakpoints": true
},
"./build_debug/src/storage/./src/storage": {
"editorPath": "${workspaceFolder}/src/storage",
"useForBreakpoints": true
},
"./build_debug/src/share/./src/share/schema": {
"editorPath": "${workspaceFolder}/src/share/schema",
"useForBreakpoints": true
},
"./build_debug/src/share/./src/share/object": {
"editorPath": "${workspaceFolder}/src/share/object",
"useForBreakpoints": true
}
}
}
],
"inputs": [
{
"id": "FindPID",
"type": "command",
"command": "shellCommand.execute",
"args": {
"command": "ps -aux | grep /bin/observer | grep -v \"grep\" | awk '{print $2}'",
"description": "Select your observer PID",
"useFirstResult": true,
}
}
]
}

注意此处 sourceFileMap 的编写

.
├── share
│ ├── CMakeFiles
│ ├── cmake_install.cmake
│ ├── libob_share_static.a
│ └── Makefile
├── sql
│ ├── CMakeFiles
│ ├── cmake_install.cmake
│ ├── libob_sql_static.a
│ ├── Makefile
│ └── parser
└── storage
├── CMakeFiles
├── cmake_install.cmake
├── libob_storage_static.a
└── Makefile

. 代表 xxx_static.a 所在的目录

更准确的,应该参考 gdb 中给出的路径

local setup

https://mirror.nju.edu.cn/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-DVD-2009.iso

手动分区,打开网络连接,内存 12G,显存 128MB

共享文件夹

sudo mount -t vboxsf shared-dir /home/vgalaxy/Desktop/shared

修改 /etc/security/limits.conf 添加

root soft nofile 655350
root hard nofile 655350
* soft nofile 655350
* hard nofile 655350
* soft stack 20480
* hard stack 20480
* soft nproc 655360
* hard nproc 655360
* soft core unlimited
* hard core unlimited

查看

sh -c "ulimit -a"
  1. To set the aio-max-nr value, add the following line to the /etc/sysctl.conf file:
fs.aio-max-nr = 1048576
  1. To activate the new setting, run the following command:
sysctl -p /etc/sysctl.conf

root vscode

sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
sudo sh -c 'echo -e "[code]\nname=Visual Studio Code\nbaseurl=https://packages.microsoft.com/yumrepos/vscode\nenabled=1\ngpgcheck=1\ngpgkey=https://packages.microsoft.com/keys/microsoft.asc" > /etc/yum.repos.d/vscode.repo'
yum check-update
sudo yum install code

structure

root

[vgalaxy@localhost oceanbase]$ tree -d -L 1
.
├── build_debug
├── cmake
├── deps
├── docs
├── src
├── test - mysql_test
├── tools
└── unittest - 单测

deps

[vgalaxy@localhost deps]$ tree -d -L 3
.
├── 3rd
│   ├── compile
│   ├── pkg
│   └── usr
│   └── local
├── easy - libeasy 的 rpc 框架,基于 libev
│   ├── src
│   │   ├── include
│   │   ├── io
│   │   ├── memory
│   │   ├── packet
│   │   ├── thread
│   │   └── util
│   ├── test
│   │   ├── include
│   │   ├── io
│   │   ├── memory
│   │   ├── packet
│   │   ├── thread
│   │   └── util
│   └── tools
├── logmessage
└── oblib
├── src
│   ├── common - 依赖于 lib
│   ├── lib - 造轮子
│   └── rpc - ob 的 rpc 框架,依赖于 libeasy
└── unittest
├── common
├── lib
├── pub
└── rpc

deps/oblib/src/common

每一行必须有主键,用户可见的无主键表是通过一个隐藏的自增列做 rowkey 的

deps/oblib/src/lib/oblog

模板和宏的奇技淫巧

src

.
├── archive - 日志归档组件,备份恢复依赖这个组件
├── clog - 事务 redo 日志 / Paxos 的实现
├── election - 分布式选举模块
├── liboblog
│   ├── src
│   └── tests
├── observer - 总装车间,入口是 ob_server.h 和 ob_service.h
│   ├── mysql - MySQL 协议层的命令处理入口
│   ├── omt - omt 中的 mt 表示 multi-tenant,里面实现了 observer 线程模型的抽象 worker,每个租户在其有租户的节点上会创建一个线程池用于处理 SQL 请求
│   ├── table
│   └── virtual_table - sys 租户各个 __all_virtual 虚拟表的实现
├── rootserver - OceanBase 集群总控服务,集群管理和自动容灾,系统自举,分区副本管理和负载均衡,以及 DDL 的执行都在这个组件中
│   ├── backup
│   ├── restore
│   └── virtual_table
├── share
├── sql - 计算引擎
│   ├── code_generator
│   ├── dtl
│   ├── engine
│   ├── executor
│   ├── monitor
│   ├── optimizer
│   ├── parser
│   ├── plan_cache
│   ├── privilege_check
│   ├── resolver
│   ├── rewrite
│   └── session
└── storage - 存储引擎
├── backup
├── blocksstable
├── compaction
├── gts
├── memtable
├── replayengine
└── transaction - 事务管理

sql

兼容 MySQL 协议

connect

建立连接的过程在 obmp_connect,它执行用户认证鉴权

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_connect.cpp#L186

如果鉴权成功,会创建一个 ObSQLSession 对象唯一表示一个数据库连接

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/session/ob_sql_session_info.h#L175

所有其他命令处理都会访问这个 session 对象

query

所有的 SQL 语句类型,包括 DML、DDL,以及 multi-statement 都是用 query 命令处理的

  1. src/observer/mysql/obmp_query.cpp:process

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L84

  • TraceId,它是一条 SQL 一次处理过程的一个唯一标识

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L103

  • split_multiple_stmt 把每条语句拆分出来

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L212

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/parser/ob_parser.cpp#L64

  1. src/observer/mysql/obmp_query.cpp:process_single_stmt

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L384

  1. src/observer/mysql/obmp_query.cpp:do_process

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L527

调用 stmt_query,进入 sql 模块

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L624

sql

stmt_query 输入 SQL 语句字符串,输出一个包含物理执行计划和元信息的 ResultSet

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L153

ObString (stmt) -> ParseResult (ParseNode)

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/parser/ob_parser.cpp#L210

ParseNode -> ObStmt

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/ob_resolver.cpp#L148

这个模块是面向对象设计的,每种语句类型有一个 Resolver 和一个 Stmt

ObResolver 负责 dispatch,Resolver 基类为 ObStmtResolver

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/ob_stmt_resolver.h#L32

ObDMLResolver 继承 ObStmtResolver

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_dml_resolver.h#L49

ObSelectResolver 继承 ObDMLResolver

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_select_resolver.h#L47

Stmt 类似,不再赘述

以 select 的 resolve 过程为例

resolve - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_select_resolver.cpp#L881
resolve_normal_query - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_select_resolver.cpp#L747
resolve_from_clause - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_select_resolver.cpp#L3153
resolve_table - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_dml_resolver.cpp#L1261
resolve_joined_table - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/resolver/dml/ob_dml_resolver.cpp#L1352

对于非 SELECT 和 DML 之外的语句,如大多数 DDL 语句解析到这里就可以执行了,由 engine/cmd 目录下的 executor 直接执行

DDL 是通过 rootservice (RS) 执行的,所以其 executor 实际是发送 RPC

以 create table 为例

ObCreateTableExecutor::execute - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/engine/cmd/ob_table_executor.cpp#L413
ObRootService::create_table - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/rootserver/ob_root_service.cpp#L4170

事务控制语句则在本机直接调用事务层

以 begin 为例

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/engine/cmd/ob_tcl_executor.cpp#L61

对于 SELECT 和 DML 及带数据操作的 DDL,则需要产生执行计划

优化器 sql/optimizer 的接口类是 ObOptimizer,以上一步生成的 ObDMLStmt 为输入,执行基于代价的优化,生成一个逻辑执行计划 ObLogPlan

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/optimizer/ob_optimizer.cpp#L24

以 select 为例

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/optimizer/ob_select_log_plan.cpp#L1293
https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/optimizer/ob_log_plan.h#L1158
https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/optimizer/ob_log_plan.cpp#L5352

改写 sql/rewrite 是优化器的一部分,执行等价的关系运算改写,产生潜在更好的执行计划候选,这里有一系列改写规则

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/rewrite/ob_transformer_impl.cpp#L49

负责把逻辑执行计划转换为能够高效执行的物理执行计划

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/code_generator/ob_code_generator.cpp#L23

产生的物理执行计划一般会被保存到计划缓存

https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/plan_cache/ob_plan_cache.cpp#L508

把上面的流程串起来

以 COM_QUERY 为例

ObMPQuery::do_process - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_query.cpp#L527
ObSql::stmt_query - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L153
ObSql::handle_text_query - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1134
ObSql::handle_physical_plan - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L3152
ObSql::handle_parser - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L3231
ObSql::generate_physical_plan - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1509
ObSql::generate_stmt - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1315
resolve - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1403
ObSql::transform_stmt - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1772
transform - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1836
ObSql::optimize_stmt - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1894
optimize - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1914
ObSql::code_generate - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L1985
generate - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L2009
ObSql::pc_add_plan - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L2947
add plan - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L2973

以 COM_STMT_EXECUTE 为例

ObMPStmtExecute::do_process - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/observer/mysql/obmp_stmt_execute.cpp#L596
ObSql::stmt_execute - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L183
ObSql::handle_ps_execute - https://gitee.com/oceanbase/oceanbase/blob/oceanbase_competition/src/sql/ob_sql.cpp#L839
ObParser::parse
ObSql::generate_physical_plan

ex

https://github.com/oceanbase/oceanbase/pull/372

TPC-H

How to generate dataset using TPC-H

修改 makefile.suite

CC = gcc
DATABASE = INFORMIX
MACHINE = LINUX
WORKLOAD = TPCH

构建

make -f makefile.suite

生成 1GB 量级数据

./dbgen -v -f -T L -s 1

去掉末尾的 |

for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;

shuffle

shuf lineitem.tbl -o lineitem.tbl

load data

建表

CREATE TABLE LINEITEM
(
L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15, 2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15, 2) NOT NULL,
L_DISCOUNT DECIMAL(15, 2) NOT NULL,
L_TAX DECIMAL(15, 2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL,
primary key (L_ORDERKEY, L_LINENUMBER)
);

设置全局安全路径

SET GLOBAL secure_file_priv = "";

导入数据

LOAD DATA INFILE '/root/tpc/dbgen/lineitem.tbl' INTO TABLE LINEITEM fields terminated by "|";

ObLoadDataSPImpl

[2022-11-09 16:14:02.189547] INFO [SQL.ENG] execute (ob_load_data_impl.cpp:1911) [7758][T1002_TNT_L0_G0][T1002][YB427F000001-0005ED0500B0E46A-0-0] [lt=20] LOAD DATA start report(file_path=/root/lineitem.csv, table_name=`tpc`.`lineitem`, batch_size=100, parallel=4, load_mode=0, transaction_timeout=600000000)
[2022-11-09 16:14:02.597511] INFO [SQL.ENG] execute (ob_load_data_impl.cpp:1967) [7758][T1002_TNT_L0_G0][T1002][YB427F000001-0005ED0500B0E46A-0-0] [lt=18] LOAD DATA finish report(total shuffle task=2, total insert task=61, insert rt sum=1330165, suffle rt sum=64625, total wait secs=0, datafrag info={total_part_cnt:1, total_alloc_cnt:127, total_free_cnt:127})

ObLoadDataDirectDemo

建表的时候不加 primary key,会出现报错不支持 heap table

https://gitee.com/oceandb-space/oceanbase/blob/9ede93b3a8129dea89cf7bf12c64b0ae58e2c984/src/sql/engine/cmd/ob_load_data_direct_demo.cpp#L937

init

租户 ID,读取 schema,有一个对应的 guard

使用 ObArenaAllocator 分配一行记录的内存空间,需要使用租户 ID

ObObj 包含类型信息的值

ObNewRow 一行记录

ObCollationType 参考 https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

使用 ObArenaAllocator 分配 FILE_BUFFER_SIZE (2M) 的内存

从 FieldOrVarStruct 得到 column 的索引

ObLoadDatumRow 包装 (ObStorageDatum)

rowkey 有两个,可能存在隐藏的 rowkey

overview

利用 ObLoadDataBuffer,从文件中读取一行,并解析为 ObNewRow

ObNewRow: wrap row pointer (an ObObj array) and row size

利用 ObLoadRowCaster,将 ObNewRow 转换为 ObLoadDatumRow,并加入到 ObLoadExternalSort 中

读取并解析完成全部的行后,在外存排序 (ExternalSortRound)

不断从 ObLoadExternalSort 获取 ObLoadDatumRow,并加入到 ObLoadSSTableWriter 中 (ObMacroBlockWriter)

ObDataStoreDesc 中存储了对应的 ObSSTableIndexBuilder,用于建立中间层索引

所有的 ObLoadDatumRow 添加完成后,建立 sstable 即可

cast

从 ObLoadCSVPaser 得到的 ObNewRow,其中 ObObj 的类型均为 ObVarcharType

需要根据 schema,将其转换为对应的类型

然后将 ObObj 转换为 ObStorageDatum

ObDatum 的结构如下

https://gitee.com/oceandb-space/oceanbase/blob/9ede93b3a8129dea89cf7bf12c64b0ae58e2c984/src/share/datum/ob_datum.h#L167

ObStorageDatum 继承了 ObDatum

https://gitee.com/oceandb-space/oceanbase/blob/9ede93b3a8129dea89cf7bf12c64b0ae58e2c984/src/storage/blocksstable/ob_datum_row.h#L288

ObLoadDatumRow 和 blocksstable::ObDatumRow 中包含 ObStorageDatum

sort

ob_parallel_external_sort.h 作为模板类的头文件,似乎打不了断点

init

add item

MemorySortRound add_item https://gitee.com/oceandb-space/oceanbase/blob/9ede93b3a8129dea89cf7bf12c64b0ae58e2c984/src/storage/ob_parallel_external_sort.h#L1478

do sort

此处 curr round 和 next round 均为 ExternalSortRound

https://gitee.com/oceandb-space/oceanbase/blob/9ede93b3a8129dea89cf7bf12c64b0ae58e2c984/src/storage/ob_parallel_external_sort.h#L1768

在默认配置下,实际上 loop for all fragment 这一步,在数据量小于 256 GB 时并不会进行

DAG

do_merge -> do_one_run -> share::dag_yield

https://gitee.com/oceandb-space/oceanbase/blob/b27c6c973099645a7fb13f32fdfb6f12c89a8130/src/storage/ob_parallel_external_sort.h#L1249

get next item

如果全部数据能够存放在 buf mem limit 中,则从 MemorySortRound 获取 next item,否则从 ExternalSortRound 获取 next item

  1. MemorySortRound get next item

此处的 iterator 为 ObMemoryFragmentIterator

  1. ExternalSortRound get next item

此处的 iterator 为 ObFragmentReaderV2

sstable

append row

写宏块 https://gitee.com/oceandb-space/oceanbase/blob/55eda68d9e8c9e3793a2083ba9de58c779171663/src/storage/blocksstable/ob_macro_block_writer.cpp#L386

其中存在 index builder 的操作

create sstable

ObSSTableIndexBuilder 的结果传递给 ObSSTableMergeRes,再传递给 ObTabletCreateSSTableParam

正式 create sstable https://gitee.com/oceandb-space/oceanbase/blob/55eda68d9e8c9e3793a2083ba9de58c779171663/src/sql/engine/cmd/ob_load_data_direct_demo.cpp#L1092

更新存储 https://gitee.com/oceandb-space/oceanbase/blob/55eda68d9e8c9e3793a2083ba9de58c779171663/src/sql/engine/cmd/ob_load_data_direct_demo.cpp#L1100

macro

SMART_VAR

SMART_VAR(ObSSTableMergeRes, merge_res) {
...
}

gcc -E

STATIC_ASSERT(sizeof(ObSSTableMergeRes) < LOCAL_VARIABLE_SIZE_HARD_LIMIT, "large object!!!");
bool merge_res_from_heap = false;
ret = OB_SUCCESS != ret ? ret : precheck<ObSSTableMergeRes>((sizeof(ObSSTableMergeRes) > SMART_VAR_MAX_SINGLE_STACK_USE_SIZE), merge_res_from_heap);
char merge_res_buf[OB_SUCCESS == ret && !merge_res_from_heap ? sizeof(ObSSTableMergeRes) : 0];
SV<ObSSTableMergeRes, (sizeof(ObSSTableMergeRes) > SMART_VAR_MAX_SINGLE_STACK_USE_SIZE)> merge_res_s{ret, merge_res_from_heap, [&](void *ptr) { return new (ptr) ObSSTableMergeRes{}; }};
if (OB_SUCCESS == merge_res_s.ret_ && !merge_res_from_heap) { merge_res_s.v_ = new (merge_res_buf) ObSSTableMergeRes{}; }
auto &merge_res = merge_res_s.get();
auto merge_res_ret = merge_res_s.ret_;
if (OB_SUCC(merge_res_ret)) {
...
}

LIKELY

/* __builtin_expect(A, B) evaluates to A, but notifies the compiler that
the most likely value of A is B. This feature was added at some point
between 2.95 and 3.0. Let's use 3.0 as the lower bound for now. */
#if (GCC_VERSION < 3000)
#define __builtin_expect(a, b) (a)
#endif
// OB_LIKELY and OB_UNLIKELY may be used in C++ and C code, so using !!1 and !!0 instead of true and false
#define OB_LIKELY(x) __builtin_expect(!!(x),!!1)
#define OB_UNLIKELY(x) __builtin_expect(!!(x),!!0)
#define OB_SUCC(statement) (OB_LIKELY(::oceanbase::common::OB_SUCCESS == (ret = (statement))))
#define OB_FAIL(statement) (OB_UNLIKELY(::oceanbase::common::OB_SUCCESS != (ret = (statement))))

LOG K

When key is the name of value, can use K(value) for key and value.
example:
SQL_LOG(INFO, "test for K", K(value));
When key is the name of a member in class, can use K_
example:
value_ is a member of class T.Then in class T, can use
SQL_LOG(INFO, "test for K_", K_(value))

KR(ret) 与 OB_FAIL 配合使用

KP -> pointer

lsm

compaction

https://developer.aliyun.com/article/758369

https://zhuanlan.zhihu.com/p/112574579

可以减小空间放大和读放大

较高层级使用

可以减小写放大

较低层级使用

delete and tombstone

https://developer.aliyun.com/article/765567

ob design

75e0189c6bba41be9b9002badbd348cc.png

a587c805eda7492a887936f68ee88cee.png

a309f2a29b57485baaaf7ab1599c6a6c.png

9cb37dc5cbc841a999de7764b8e8566e.png

9c548fa6106046ef99eb3028dfb07c2f.png

a9bf3fded75a4b478b03d48f557d1ac9.png

storage

存储分层结构

6fbeb50d175b45caa6aa0d759e542423.png

类图

3886742449864f808af3fd178b52fc61.png

内存数据格式

020ea25dcad04b12ad6abc6e0986afac.png

磁盘文件格式

38abedf866334c1a812f8e8bf770eb5a.png

54958deae66043f2808d2767ae465bb9.png

openvpn

sudo openvpn --config /etc/openvpn/config.ovpn

172.16.0.30

能 ping 通,但是 ssh 连接超时

netstat -nr
ifconfig
> tree
.
├── ca.crt
├── client
├── config.ovpn
├── server
├── vsc-0jlq9yfjbz45zm5gwxl0h.crt
└── vsc-0jlq9yfjbz45zm5gwxl0h.key

config.ovpn 内容如下

client
dev tun
proto tcp
remote 39.101.78.156 1194
resolv-retry infinite
nobind
persist-key
persist-tun
ca /etc/openvpn/ca.crt
cert /etc/openvpn/vsc-0jlq9yfjbz45zm5gwxl0h.crt
key /etc/openvpn/vsc-0jlq9yfjbz45zm5gwxl0h.key
cipher AES-128-CBC
;comp-lzo
verb 4

ssh-copy-id

免密登录云服务器

修改 ~/.ssh/config

Host ob
HostName xxx
User root
Host ob-huawei
HostName xxx
User root
Host tencent
HostName xxx
User ubuntu

然后键入 ssh-copy-id ob 配置,之后只需 ssh ob 即可

如果云服务器设置不允许使用密码登录,上述方案会失效

将 ip address 写入配置文件,之后只需 ssh -i <已下载的与实例关联的私钥文件的路径> tencent 即可

script

写了一个 python 脚本自动化下列任务

区分 python 的三个库

https://stackoverflow.com/questions/13606867/what-is-the-difference-between-multiprocessing-and-subprocess

subprocess 主要用于启动 obclient 与 observer 通信,使用 subprocess.run 为同步操作

multiprocessing 主要用于启动 perf,为异步操作,但是无法优雅的终止

尝试在脚本中 kill perf 进程,会导致无法正确生成 perf.data

于是只能在 load data 结束后手动发送 SIGINT 信号,然后手动生成火焰图

boot autodeploy

修改 /etc/rc.d/rc.local

echo "*** START DEPLOYMENT ***" && obd cluster destroy obadvanced ; obd cluster autodeploy obadvanced -c /root/ob-advanced-auto.yaml -A -f && echo "*** FINISH DEPLOYMENT ***"

并添加如下权限

chmod +x /etc/rc.d/rc.local

使 centos 云服务器 boot 后自动部署 ob

似乎无效,还是在 ~/.bashrc 里面加 alias 吧

perf

https://www.brendangregg.com/linuxperf.html

hypothesis

csv 文件格式如下

127747202|5245697|245698|2|1|1642.43|0.09|0.08|N|O|1998-02-14|1998-02-09|1998-02-20|TAKE BACK RETURN|MAIL|hely unusual theod|
144558691|8792293|292294|1|14|19388.04|0.06|0.02|R|F|1994-04-19|1994-02-28|1994-05-08|COLLECT COD|AIR|ial deposits integrate. slyly ev|
228942724|44747|419748|4|5|8458.70|0.08|0.03|A|F|1992-10-08|1992-11-07|1992-11-01|NONE|FOB|. slyly regular deposits about the silen|
14686916|7112590|237605|5|24|38453.76|0.01|0.06|R|F|1994-01-01|1994-03-02|1994-01-08|TAKE BACK RETURN|MAIL| regular pinto beans boost. idly pending|
220547303|6466003|466004|2|44|42621.92|0.09|0.07|R|F|1992-05-07|1992-07-14|1992-05-08|DELIVER IN PERSON|TRUCK|usly blithe i|
48064226|6140776|390801|4|8|14531.76|0.08|0.02|R|F|1994-09-05|1994-11-03|1994-09-29|NONE|MAIL|se carefully ironic d|
72336418|5741124|491147|1|10|11648.40|0.09|0.08|N|O|1996-03-01|1996-03-27|1996-03-22|NONE|TRUCK|lyly even |
27866627|2226889|226890|1|50|90788.50|0.03|0.00|A|F|1992-12-15|1992-11-14|1992-12-19|NONE|AIR|ts cajole about the furi|
97710950|8161257|36306|7|40|52714.00|0.07|0.07|N|O|1996-10-31|1996-11-07|1996-11-16|DELIVER IN PERSON|SHIP|pinto beans are i|
224844870|4701953|451972|2|39|76234.08|0.07|0.06|R|F|1993-02-04|1993-03-17|1993-02-08|DELIVER IN PERSON|AIR|haggle slyly|

注意每一行的结尾有一个 field_term_str

field_term_strline_term_str 均为单字符

不考虑转义字符,不考虑编码

不存在 NULL 字段

type cast

映射关系如下

row idxdatum idxsql typeob type
00INTEGERObInt32Type
31INTEGERObInt32Type
12INTEGERObInt32Type
23INTEGERObInt32Type
44DECIMAL(15, 2)ObNumberType
55DECIMAL(15, 2)ObNumberType
66DECIMAL(15, 2)ObNumberType
77DECIMAL(15, 2)ObNumberType
88CHAR(1)ObCharType
99CHAR(1)ObCharType
1010DATEObDateType
1111DATEObDateType
1212DATEObDateType
1313CHAR(25)ObCharType
1414CHAR(10)ObCharType
1515VARCHAR(44)ObVarcharType
CSV -> ObNewRow (all string) -> ObNewRow (schema) -> ObStorageDatum

ObNewRow (all string) 中 obj 的 type 为 ObVarcharType

ObNewRow (all string) -> ObNewRow (schema) 需要下述四种转换函数

https://gitee.com/oceandb-space/oceanbase/blob/5da1176773ad23b26880bb74a7ee33c811e84331/src/share/object/ob_obj_cast.cpp#L4584

https://gitee.com/oceandb-space/oceanbase/blob/5da1176773ad23b26880bb74a7ee33c811e84331/deps/oblib/src/lib/timezone/ob_time_convert.cpp#L788

https://gitee.com/oceandb-space/oceanbase/blob/c360ba5f573f222d691d005ba499ade685d1157d/src/share/object/ob_obj_cast.cpp#L4480

https://gitee.com/oceandb-space/oceanbase/blob/c360ba5f573f222d691d005ba499ade685d1157d/deps/oblib/src/lib/number/ob_number_v2.h#L1629

https://gitee.com/oceandb-space/oceanbase/blob/c360ba5f573f222d691d005ba499ade685d1157d/deps/oblib/src/lib/number/ob_number_v2.h#L1540

https://gitee.com/oceandb-space/oceanbase/blob/c360ba5f573f222d691d005ba499ade685d1157d/deps/oblib/src/lib/number/ob_number_v2.h#L585

https://gitee.com/oceandb-space/oceanbase/blob/c360ba5f573f222d691d005ba499ade685d1157d/deps/oblib/src/lib/number/ob_number_v2.cpp#L763

thread pool

Thread

A wrapper of Linux thread that supports normal thread operations.

https://gitee.com/oceandb-space/oceanbase/blob/b56869e4074d32a6036eb106af15ac5a6fbf3178/deps/oblib/src/lib/thread/thread.h#L24

__th_start 作为 runnable entry 的 wrapper 函数

https://gitee.com/oceandb-space/oceanbase/blob/b56869e4074d32a6036eb106af15ac5a6fbf3178/deps/oblib/src/lib/thread/thread.cpp#L227

Threads (ThreadPool)

线程池

using ThreadPool = Threads;

管理 Thread

Thread **threads_;

内部启动 Thread

https://gitee.com/oceandb-space/oceanbase/blob/b56869e4074d32a6036eb106af15ac5a6fbf3178/deps/oblib/src/lib/thread/threads.cpp#L179

其 runnable entry 为 run

void Threads::run(int64_t idx)
{
thread_idx_ = static_cast<uint64_t>(idx);
Worker worker;
run1();
}

其中 thread_idx_ 为线程局部变量,runrun1 均为虚函数,可被 override

submit 函数似乎被弃用了

IRunWrapper 用于 pre_runend_run

https://gitee.com/oceandb-space/oceanbase/blob/b56869e4074d32a6036eb106af15ac5a6fbf3178/deps/oblib/src/lib/thread/threads.cpp#L144

对应的单测

https://gitee.com/oceandb-space/oceanbase/blob/b56869e4074d32a6036eb106af15ac5a6fbf3178/deps/oblib/unittest/lib/thread/test_threads.cpp#L100

ObSimpleThreadPool

override run1 函数

内部通过 ObLightyQueue 管理 task,ObLightyQueue 存在一个 capacity

提供虚函数 handlehandle_drop 处理 task

task 与 thread pool 耦合

相当于 thread_num 个线程处理最多容纳 task_num_limit 个的 task 的 queue 中的 task

对应的单测

https://gitee.com/oceandb-space/oceanbase/blob/d9cc67c39e31e947577a6f963f9f047ed9eab0a3/deps/oblib/unittest/lib/thread/test_simple_thread_pool.cpp#L79

ObDynamicThreadPool

类似 ObSimpleThreadPool,但内部线程数量动态可变,不超过一个上限

ObDynamicThreadInfo 结构体封装了线程信息

ObDynamicThreadTask 中提供虚函数 process 处理 task

task 与 thread pool 解耦合

对应的单测

https://gitee.com/oceandb-space/oceanbase/blob/d9cc67c39e31e947577a6f963f9f047ed9eab0a3/deps/oblib/unittest/lib/thread/test_dynamic_thread_pool.cpp#L104

ObReentrantThread

start and stop run task, can be called repeatedly, if created.

UNITY_BUILD

https://cmake.org/cmake/help/latest/prop_tgt/UNITY_BUILD.html

cmake/Utils.cmake
76: set_target_properties(${target} PROPERTIES UNITY_BUILD ON)
77: set_target_properties(${target} PROPERTIES UNITY_BUILD_CODE_AFTER_INCLUDE "${unity_after}")
78: set_target_properties(${target} PROPERTIES UNITY_BUILD_MODE GROUP)

log

https://www.oceanbase.com/docs/community-observer-cn-10000000000901115

日志格式

这里以一条日志举例说明

[2022-07-01 10:49:10.272860] INFO [SQL.EXE] implicit_end_trans (ob_sql_trans_control.cpp:161) [119171][T1004_Occam][T1][Y0-00
00000000000000-0-0] implicit end trans(is_rollback=false, exec_ctx.get_execution_id()=18446744073709551615, callback=NULL)

对应的日志信息如下

[时间] 日志级别 [所属模块] 函数名 (文件:行号) [线程 id][线程名][所属租户][trace_id]

日志级别

从低到高有 6 种,DEBUG、TRACE、INFO、WARN、USER_ERR、ERROR

其中 ERROR 日志比较特殊,会将打日志时所在的堆栈打印出来(需要通过符号表解析)

注意

开启 DEBUG 日志将耗费大量资源,在较新版本中,DEBUG 日志在 release 编译下会自动去掉,即使开启也无法生效

设置日志打印级别

修改日志限流量

printf

仍然输出到日志中

parallel cast

基本思路

内存管理

内存依赖关系

现象观察

data dist

处理 dbgen 生成的 50 GB 量级数据

awk -F| '{print $1}' demo.csv > key-ori

然后使用 python pandas 包

>>> import pandas as pd
>>> df = pd.read_csv("key-ori")
>>> df.describe()
127747202
count 3.000058e+08
mean 1.500031e+08
std 8.659724e+07
min 1.000000e+00
25% 7.501517e+07
50% 1.500066e+08
75% 2.249938e+08
max 3.000000e+08
>>> df.quantile(q=0.1)
127747202 30000768.0

对于精确的数据分布,可以参考 dbgen 源码

summary

赛题描述

尝试