TOC
Open TOC
Type
LLVM 类型系统的基础为 Type
类
所有类型由如下列枚举定义
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Type.h#L54
enum TypeID {
// PrimitiveTypes
HalfTyID = 0 , ///< 16-bit floating point type
BFloatTyID , ///< 16-bit floating point type (7-bit significand)
FloatTyID , ///< 32-bit floating point type
DoubleTyID , ///< 64-bit floating point type
X86_FP80TyID , ///< 80-bit floating point type (X87)
FP128TyID , ///< 128-bit floating point type (112-bit significand)
PPC_FP128TyID , ///< 128-bit floating point type (two 64-bits, PowerPC)
VoidTyID , ///< type with no size
LabelTyID , ///< Labels
MetadataTyID , ///< Metadata
X86_MMXTyID , ///< MMX vectors (64 bits, X86 specific)
X86_AMXTyID , ///< AMX vectors (8192 bits, X86 specific)
TokenTyID , ///< Tokens
// Derived types... see DerivedTypes.h file.
IntegerTyID , ///< Arbitrary bit width integers
FunctionTyID , ///< Functions
PointerTyID , ///< Pointers
StructTyID , ///< Structures
ArrayTyID , ///< Arrays
FixedVectorTyID , ///< Fixed width SIMD vector type
ScalableVectorTyID ///< Scalable SIMD vector type
};
其中
primitive types 代表没有子类的类型
derived types 代表拥有子类的类型
所有结构等价 的类型在全局只有一个对象实例 (单例 )
Type
类的继承关系如下图所示
LLVMContext
类中包含了一个顶层 const 指针,指向 LLVMContextImpl
经典 PImpl 设计
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/LLVMContext.h#L69
LLVMContextImpl * const pImpl;
LLVMContextImpl
中包含了上述 primitive types 和 integer type 的单例,在构造函数中初始化
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.cpp#L40
LLVMContextImpl :: LLVMContextImpl (LLVMContext & C)
: DiagHandler ( std :: make_unique < DiagnosticHandler >()),
VoidTy (C, Type::VoidTyID), LabelTy (C, Type::LabelTyID),
HalfTy (C, Type::HalfTyID), BFloatTy (C, Type::BFloatTyID),
FloatTy (C, Type::FloatTyID), DoubleTy (C, Type::DoubleTyID),
MetadataTy (C, Type::MetadataTyID), TokenTy (C, Type::TokenTyID),
X86_FP80Ty (C, Type::X86_FP80TyID), FP128Ty (C, Type::FP128TyID),
PPC_FP128Ty (C, Type::PPC_FP128TyID), X86_MMXTy (C, Type::X86_MMXTyID),
X86_AMXTy (C, Type::X86_AMXTyID), Int1Ty (C, 1 ), Int8Ty (C, 8 ),
Int16Ty (C, 16 ), Int32Ty (C, 32 ), Int64Ty (C, 64 ), Int128Ty (C, 128 ) {
if ( OpaquePointersCL . getNumOccurrences ()) {
OpaquePointers = OpaquePointersCL;
}
}
Type
类也提供了对应的静态方法,用于获取这些单例
Floating Point Types
primitive type
Type Description half
16-bit floating-point value bfloat
16-bit “brain” floating-point value (7-bit significand). Provides the same number of exponent bits as float
, so that it matches its dynamic range, but with greatly reduced precision. Used in Intel’s AVX-512 BF16 extensions and Arm’s ARMv8.6-A extensions, among others. float
32-bit floating-point value double
64-bit floating-point value fp128
128-bit floating-point value (113-bit significand) x86_fp80
80-bit floating-point value (X87) ppc_fp128
128-bit floating-point value (two 64-bits)
通常使用 float
和 double
类型
Void Type
primitive type
可以通过如下代码获取 void
类型的单例
llvm::Type * type = llvm:: Type :: getVoidTy (TheContext);
void
类型不代表任何值,也没有大小,仅起到占位符的作用,如函数的返回值
define dso_local void @foo() #0 {
ret void
}
Label Type
primitive type
用于标记基本块,例如 max
函数可能对应的 LLVM IR
define dso_local i32 @max(i32 noundef %0, i32 noundef %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
%5 = load i32, i32* %3, align 4
%6 = load i32, i32* %4, align 4
%7 = icmp sgt i32 %5, %6
br i1 %7, label %8, label %10
8: ; preds = %2
%9 = load i32, i32* %3, align 4
br label %12
10: ; preds = %2
%11 = load i32, i32* %4, align 4
br label %12
12: ; preds = %10, %8
%13 = phi i32 [ %9, %8 ], [ %11, %10 ]
ret i32 %13
}
注意这里隐式的 %2
编号
Token Type
primitive type
The token type is used when a value is associated with an instruction but all uses of the value must not attempt to introspect or obscure it. As such, it is not appropriate to have a phi or select of type token.
The identifier ‘none
’ is recognized as an empty token constant and must be of token type .
略过
primitive type
The metadata type represents embedded metadata. No derived types may be created from metadata except for function arguments.
LLVM IR allows metadata to be attached to instructions and global objects in the program that can convey extra information about the code to the optimizers and code generator. One example application of metadata is source-level debug information. There are two metadata primitives: strings and nodes.
Metadata does not have a type, and is not a value. If referenced from a call
instruction, it uses the metadata
type.
All metadata are identified in syntax by an exclamation point (‘!
’).
例如
!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 14.0.6"}
Integer Type
语法结构为 iN
,其中 N
为表示所需整数大小的位宽
可以通过如下代码获取 i32
类型的单例
llvm::Type * type = llvm:: Type :: getInt32Ty (TheContext);
在构造 i32
类型的过程中,向 Type
类中存储了 SubclassData
信息
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Type.h#L86
TypeID ID : 8 ; // The current base type of this type.
unsigned SubclassData : 24 ; // Space for subclasses to store data.
// Note that this should be synchronized with
// MAX_INT_BITS value in IntegerType class.
受其大小限制,integer type 的宽度范围为 [ 1 , 2 23 ] [1, 2^{23}] [ 1 , 2 23 ]
也就是说 LLVM 所能够表示的最大整数为 2 2 23 = 2 8388608 2^{2^{23}}=2^{8388608} 2 2 23 = 2 8388608
注意这里的 integer type 并不包含符号信息
LLVMContextImpl
使用了下述数据结构缓存了所有的 integer type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1524
DenseMap <unsigned , IntegerType * > IntegerTypes;
Pointer Type
pointer type 通常用于引用指定内存位置中的对象
pointer type 可以定义指向对象所在的地址空间编号,默认为 0
AddrSpace
同样被存储到了 SubclassData
中
可以通过如下代码获取 i32*
类型的单例
llvm::Type * type = llvm:: Type :: getInt32PtrTy (TheContext, 0 );
上述方法封装了 PointerType::get
方法
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Type.cpp#L301
PointerType * Type :: getInt32PtrTy ( LLVMContext & C , unsigned AS ) {
return getInt32Ty (C)-> getPointerTo (AS);
}
其中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Type.cpp#L776
PointerType * Type :: getPointerTo ( unsigned AddrSpace ) const {
return PointerType:: get (const_cast < Type * > ( this ), AddrSpace);
}
LLVMContextImpl
使用了下述数据结构缓存了所有的 pointer type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1538
DenseMap < Type * , PointerType *> PointerTypes; // Pointers in AddrSpace = 0
DenseMap < std::pair < Type * , unsigned> , PointerType *> ASPointerTypes;
注意到这里的 pointer type 携带了 pointee 的类型信息
pointee 的类型存储在 Type
类的 ContainedTys
中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Type.h#L106
/// Keeps track of how many Type*'s there are in the ContainedTys list.
unsigned NumContainedTys = 0 ;
/// A pointer to the array of Types contained by this Type. For example, this
/// includes the arguments of a function type, the elements of a structure,
/// the pointee of a pointer, the element type of an array, etc. This pointer
/// may be 0 for types that don't contain other types (Integer, Double,
/// Float).
Type * const * ContainedTys = nullptr ;
社区的这种 explicit pointee types 的讨论如下
从历史上看,LLVM 是 C 的某种类型安全子集,为指针类型提供了额外的检查层,指针类型为前端的类型检查提供了便利
在 LLVM 的发展过程中,人们逐渐意识到指针类型的设计并不能有效地支持编译优化
许多操作实际上并不关心 pointee 的类型,这些操作通常最终采用任意指针类型 i8*
,于是指针类型的转换 (bitcast) 会带来开销
注意 LLVM 并不存在 void*
,可以参考下述代码
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Type.cpp#L590
bool PointerType :: isValidElementType ( Type * ElemTy ) {
return ! ElemTy -> isVoidTy () && ! ElemTy -> isLabelTy () &&
! ElemTy -> isMetadataTy () && ! ElemTy -> isTokenTy () &&
! ElemTy -> isX86_AMXTy ();
}
社区最后达成的共识是,explicit pointee types 的成本大于收益 ,因此应该弃用它们
于是,LLVM 提出了 opaque pointer type,直译为不透明的指针类型,这种指针类型不携带 pointee 的类型信息
例如,对于下述 LLVM IR
其对应的 opaque 版本为
在底层 APIs 上,构造这条指令的 API 从 LLVMBuildLoad
变为了 LLVMBuildLoad2
Array Type
array type 包含两个属性
number of elements
这里允许 number of elements 为 0,从而实现 flexible array member
underlying data type
下面是一些例子
Syntax Semantics [40 x i32]
Array of 40 32-bit integer values. [3 x [4 x i32]]
3x4 array of 32-bit integer values. [2 x [3 x [4 x i16]]]
2x3x4 array of 16-bit integer values.
可以通过如下代码获取 [40 x i32]
类型的单例
llvm::Type * type = llvm:: ArrayType :: get ( llvm :: Type :: getInt32Ty (TheContext), 40 );
类似 pointer type,array type 的 underlying data type 存储在 Type
类的 ContainedTys
中
LLVMContextImpl
使用了下述数据结构缓存了所有的 array type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1536
DenseMap < std::pair < Type * , uint64_t> , ArrayType * > ArrayTypes;
Vector Type
vector type 类似 array type,但是用于 SIMD,并且不被认为是 aggregate types,而是 first class types
Values of these types are the only ones which can be produced by instructions .
vector type 包含三个属性
number of elements
这里不允许 number of elements 为 0
underlying primitive data type
只允许 integer, floating-point or pointer type
scalable property
若为 false,则为 FixedVectorType
,否则为 ScalableVectorType
下面是一些例子
Syntax Semantics <4 x i32>
Vector of 4 32-bit integer values. <vscale x 4 x i32>
Vector with a multiple of 4 32-bit integer values.
对于 ScalableVectorType
而言,其 vscale 在编译期由硬件环境决定
可以通过如下代码获取 <vscale x 4 x i32>
类型的单例
llvm::Type * type = llvm:: VectorType :: get ( llvm :: Type :: getInt32Ty (TheContext), 4 , true );
LLVMContextImpl
使用了下述数据结构缓存了所有的 vector type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1537
DenseMap < std::pair < Type * , ElementCount > , VectorType * > VectorTypes;
注意此处的 ElementCount
类,其构造出现在
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/DerivedTypes.h#L427
static VectorType * get ( Type * ElementType , unsigned NumElements , bool Scalable ) {
return VectorType:: get (ElementType, ElementCount:: get (NumElements, Scalable));
}
其中调用了其父类 LinearPolySize
的下述方法
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/DerivedTypes.h#L427
static LeafTy get ( ScalarTy MinVal , bool Scalable ) {
return static_cast < LeafTy > ( LinearPolySize (MinVal, Scalable ? 1 : 0 ));
}
这里有一段注释
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/DerivedTypes.h#L427
/// UnivariateLinearPolyBase is a base class for ElementCount and TypeSize.
/// Like LinearPolyBase it tries to represent a linear polynomial
/// where only one dimension can be set at any time, e.g.
/// 0 * scale0 + 0 * scale1 + ... + cJ * scaleJ + ... + 0 * scaleK
/// The dimension that is set is the univariate dimension.
大概含义是若 scalable property 为 true,则允许对应的 dimension 在不同的硬件环境下进行不同的 scale
在实际测试中,发现在给定的硬件环境下,使用 LLVM 生成的 vector type 通常为 FixedVectorType
例如,利用 AVX2 intrinsics,对包含 8 个 float 类型数据的 vector 执行 abs 操作
#include <immintrin.h>
__m256 _mm256_abs_ps ( __m256 vec ) {
__m256 float_zero = _mm256_set1_ps ( 0 );
__m256 mask_lt_zero = _mm256_cmp_ps (vec, float_zero, _CMP_LT_OQ);
__m256 vec_neg = _mm256_sub_ps (float_zero, vec);
return _mm256_blendv_ps (vec, vec_neg, mask_lt_zero);
}
使用 clang -S -emit-llvm a.cpp -O3 -march=native
生成的中间代码如下
define dso_local noundef <8 x float> @_Z13_mm256_abs_psDv8_f(<8 x float> noundef %0) local_unnamed_addr #0 {
%2 = fcmp olt <8 x float> %0, zeroinitializer
%3 = fsub <8 x float> zeroinitializer, %0
%4 = select <8 x i1> %2, <8 x float> %3, <8 x float> %0
ret <8 x float> %4
}
注意这里 %0, %2, %3, %4
的类型均为 <8 x float>
,这同时说明了 vector type 属于 first class types
Structure Type
structure type 有两种类型
匿名,在 context 内保证唯一性,必须包含 body
LLVMContextImpl
使用了下述数据结构缓存了所有的 literal struct type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1528
using StructTypeSet = DenseSet < StructType * , AnonStructTypeKeyInfo >;
StructTypeSet AnonStructTypes;
这里的 AnonStructTypeKeyInfo
包含了下列成员
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L94
ArrayRef < Type * > ETypes;
bool isPacked;
可以通过如下方式获取 { i32, i32, i32 }
类型的单例
llvm::Type * i32 = llvm:: Type :: getInt32Ty (TheContext);
std::array < llvm::Type * , 3 > elems = {i32, i32, i32};
llvm::Type * type = llvm:: StructType :: get (TheContext, elems, false );
LLVM 为 ArrayRef
类提供了大量的 conversion constructors,支持从 pointer, vector, array, C-array 等多种类型构造 ArrayRef
可以匿名,不保证唯一性,可以不包含 body (opaque)
Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only literal types are uniqued in recent versions of LLVM.
LLVMContextImpl
使用了下述数据结构缓存了所有的 identified struct type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1530
StringMap < StructType * > NamedStructTypes;
unsigned NamedStructTypesUniqueID = 0 ;
可以通过如下方式构造 %struct.A = type { i32, i32, i32 }
类型
llvm::Type * i32 = llvm:: Type :: getInt32Ty (TheContext);
std::array < llvm::Type * , 3 > elems = {i32, i32, i32};
llvm::Type * type = llvm:: StructType :: create (TheContext, elems, "A" , false );
实际上,structure type 定义了下述属性,这些属性会被存储到 SubClassData
中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/DerivedTypes.h#L216
enum {
/// This is the contents of the SubClassData field.
SCDB_HasBody = 1 ,
SCDB_Packed = 2 ,
SCDB_IsLiteral = 4 ,
SCDB_IsSized = 8
};
下面举几个例子
struct A ;
struct B {
A * a;
};
生成的 LLVM IR 可能为
%struct.B = type { %struct.A* }
%struct.A = type opaque
其中 struct A 不包含 body,为 opaque structure type
由此可见,引入 opaque structure type 的目的是为了解决前置声明
对于 %struct.A
而言,SCDB_HasBody
和 SCDB_IsSized
对应的 bit 置 0
对于 isSized
的实现,可以参考 https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Type.cpp#L554
struct __attribute__ ((packed)) A {
int i;
short s;
char c;
};
生成的 LLVM IR 可能为
%struct.A = type <{ i32, i16, i8 }>
注意这里多出的 <
和 >
对于 %struct.A
而言,SCDB_Packed
对应的 bit 置 1
struct A {
struct {
int i;
int j;
int k;
} x;
struct {
int i;
int j;
int k;
} y;
};
生成的 LLVM IR 可能为
%struct.A = type { %struct.anon, %struct.anon.0 }
%struct.anon = type { i32, i32, i32 }
%struct.anon.0 = type { i32, i32, i32 }
注意这里匿名结构体的类型仍然为 identified struct type,LLVM 内部会自动处理无名和重名的情形
Function Type
函数签名,包含了返回值类型和参数类型列表
类似 literal struct type,LLVMContextImpl
使用了下述数据结构缓存了所有的 function type
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1526
using FunctionTypeSet = DenseSet < FunctionType * , FunctionTypeKeyInfo >;
FunctionTypeSet FunctionTypes;
这里的 AnonStructTypeKeyInfo
包含了下列成员
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L142
const Type * ReturnType;
ArrayRef < Type * > Params;
bool isVarArg;
可以通过如下方式获取 i32 (i32)
类型的单例
llvm::Type * i32 = llvm:: Type :: getInt32Ty (TheContext);
std::array < llvm::Type * , 1 > args = {i32};
llvm::Type * type = llvm:: FunctionType :: get (i32, args, false );
类似的
isVarArg
被存储到了 SubclassData
中
ReturnType
和 Params
被存储到了 ContainedTys
中
这里并没有显式给出 llvm::LLVMContext
参数,实际上这里对应的 context 为 return type 所属的 context
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Type.cpp#L345
最后,这里的 isVarArg
字段用于指示该函数是否需要包含变长参数
例如
#include <stdio.h>
int main () { printf ( "hello world \n " ); }
生成的 LLVM IR 可能为
@.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1
define dso_local i32 @main() #0 {
%1 = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i64 0, i64 0))
ret i32 0
}
注意这里的函数签名 i32 (i8*, ...)
Value
Value
类是 LLVM 中一个非常重要的类,是很多核心类的基类
Value
类的部分 继承关系如下图所示
flowchart LR
Argument --> Value
BasicBlock --> Value
User --> Value
Constant --> User
Instruction --> User
Operator --> User
每一个 Value
类对象都包含一个指向 Type
类的指针,以及一个 use list,记录了使用了该 value 的 users
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Value.h#L74
class Value {
Type * VTy;
Use * UseList;
...
Value
类内部为 users 实现了迭代器模式,可以使用下述接口访问 value 的 users
llvm::Value * value = ...
for ( auto it = value -> use_begin (); it != value -> use_end (); ++ it) {
llvm::Value * user = it -> get ();
...
}
在对 LLVM IR 进行 transform 的时候,可能会将 value 替换为另一个 value,比如一条指令的结果恒为常数,那么就可以用常数替换这条指令,同时还需要修改引用这个 value 的 users
可以使用下述接口完成上述任务
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Value.h#L297
/// Change all uses of this to point to a new Value.
///
/// Go through the uses list for this definition and make each use point to
/// "V" instead of "this". After this completes, 'this's use list is
/// guaranteed to be empty.
void replaceAllUsesWith ( Value * V );
其内部实现利用了 ValueHandleBase
类
value handle 可以看作一个指向 value 的智能指针,可以在 value 被 delete 或者被 replaceAllUsesWith (RAUW) 时,触发特定的动作
ValueHandleBase
类有三个子类
WeakVH
当引用的 value 被 delete 或者被 RAUW 之后,置为 null
WeakTrackingVH
当引用的 value 被 delete 之后,置为 null
CallbackVH
当引用的 value 被 delete 或者被 RAUW 之后,会分别调用用户自定义的回调函数
Value
类对象可以 拥有一个 name,在 Value 类中使用 HasName
字段记录
LLVMContextImpl
使用了下述数据结构存储了所有的 value name
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1447
DenseMap <const Value * , ValueName * > ValueNames;
其中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Value.h#L55
using ValueName = StringMapEntry < Value * >;
User & Use
User
类继承自 Value
类,因为 user 自身也是一个 value,会被其他 users 使用
更具体的
一个 value 可以被多个 user 使用,即 def-use chain
上面已经举过例子了
一个 user 可以使用多个 value,即 use-def chain
例如访问一条指令对应的操作数
llvm::Instruction * ins = ...
for ( auto it = ins -> op_begin (); it != ins -> op_end (); ++ it) {
llvm::Value * value = it -> get ();
...
}
所以 Use
类的核心就是如何让 value 和 user 高效地双向关联
代码细节略过
Constant
Constant
类继承自 User
类
Constant
类作为所有常量的基类,代表其 value 不会在运行时 发生变化
函数和全局变量的常量性体现在它们的地址 不会发生变化
所有结构等价 的常量在全局只有一个对象实例 (单例 )
Constant
类的部分 继承关系如下图所示
flowchart LR
BlockAddress --> Constant
ConstantAggregate --> Constant
ConstantArray --> ConstantAggregate
ConstantStruct --> ConstantAggregate
ConstantVector --> ConstantAggregate
ConstantData --> Constant
ConstantFP --> ConstantData
ConstantInt --> ConstantData
ConstantAggregateZero --> ConstantData
ConstantPointerNull --> ConstantData
ConstantDataSequential --> ConstantData
ConstantDataArray --> ConstantDataSequential
ConstantDataVector --> ConstantDataSequential
ConstantExpr --> Constant
GlobalValue --> Constant
GlobalObject --> GlobalValue
Function --> GlobalObject
GlobalVariable --> GlobalObject
ConstantData
ConstantInt
表示任意位宽的整型常量
LLVMContextImpl
使用了下述数据结构缓存了所有的 int constant
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1449
using IntMapTy = DenseMap < APInt , std:: unique_ptr < ConstantInt >, DenseMapAPIntKeyInfo >;
IntMapTy IntConstants;
可以通过如下代码获取 i32 100
常量的单例
llvm::Value * value = llvm:: ConstantInt :: get (TheContext, llvm:: APInt ( 32 , 100 , false /* isSigned */ ));
使用 isSigned
参数提示 APInt
类处理符号问题
An analogous transition that happened earlier in LLVM is integer signedness. Currently there is no distinction between signed and unsigned integer types, but rather each integer operation (e.g. add) contains flags to signal how to treat the integer. Previously LLVM IR distinguished between unsigned and signed integer types and ran into similar issues of no-op casts. The transition from manifesting signedness in types to instructions happened early on in LLVM’s timeline to make LLVM easier to work with.
注意此处的辅助类 APInt
,其内部使用 uint64_t
或 uint64_t *
存储原始数据
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/ADT/APInt.h#L1868
union {
uint64_t VAL; ///< Used to store the <= 64 bits integer value.
uint64_t * pVal; ///< Used to store the >64 bits integer value.
} U;
另外 LLVMContextImpl
也为布尔常量值 i1
额外保存了其单例
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1510
ConstantInt * TheTrueVal = nullptr ;
ConstantInt * TheFalseVal = nullptr ;
可以通过如下代码获取
llvm::Value * value = llvm:: ConstantInt :: getTrue (TheContext);
ConstantFP
表示任意位宽的浮点常量
类似 ConstantInt
,LLVMContextImpl
使用了下述数据结构缓存了所有的 float constant
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1453
using FPMapTy = DenseMap < APFloat , std:: unique_ptr < ConstantFP >, DenseMapAPFloatKeyInfo >;
FPMapTy FPConstants;
可以通过如下代码获取 float 1.1
常量的单例
llvm::Value * value = llvm:: ConstantFP :: get (TheContext, llvm:: APFloat (static_cast <float> ( 1.1 )));
此处的浮点数遵循 IEEE 规范,其实现封装在 APFloat
等类中,例如
float foo () { return 1.1 ; }
其生成的 LLVM IR 为
define dso_local noundef float @_Z3foov() #0 {
ret float 0x3FF19999A0000000
}
使用十六进制表示浮点常量
ConstantAggregateZero
表示复合零常量,通常用于全零初始化
例如
其生成的 LLVM IR 为
@_ZL3arr = internal constant [42 x i32] zeroinitializer, align 16
此处的 zeroinitializer 即为 i32
类型的 ConstantAggregateZero
llvm::Value * value = llvm:: ConstantAggregateZero :: get ( llvm :: Type :: getInt32Ty (TheContext));
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant aggregate zero
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1478
DenseMap < Type * , std::unique_ptr < ConstantAggregateZero >> CAZConstants;
ConstantPointerNull
表示空指针
例如
void * foo () { return nullptr ; }
其生成的 LLVM IR 为
define dso_local noundef i8* @_Z3foov() #0 {
ret i8* null
}
此处的 null 即为 i8*
类型的 ConstantPointerNull
llvm::Value * value = llvm:: ConstantPointerNull :: get ( llvm :: Type :: getInt8PtrTy (TheContext));
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant pointer null
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1489
DenseMap < PointerType * , std::unique_ptr < ConstantPointerNull >> CPNConstants;
ConstantDataArray
表示常量数组
限制 underlying data type 为 simple 1/2/4/8-byte integer 或 float/double
例如
const int arr[] = { 0 , 1 , 2 };
其生成的 LLVM IR 为
@_ZL3arr = internal constant [3 x i32] [i32 0, i32 1, i32 2], align 4
可以通过如下代码获取
std::array <int , 3 > elems = { 0 , 1 , 2 };
llvm::Value * value = llvm:: ConstantDataArray :: get (TheContext, elems);
ConstantDataVector
表示常量向量
限制 underlying data type 为 simple 1/2/4/8-byte integer 或 float/double
例如
#include <immintrin.h>
__m256 foo () { return _mm256_set1_ps ( 1 ); }
使用 clang -S -emit-llvm a.cpp -O3 -march=native
生成的中间代码如下
define dso_local noundef <8 x float> @_Z3foov() local_unnamed_addr #0 {
ret <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
}
可以通过如下代码获取
std::array <float , 8 > elems = { 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 };
llvm::Value * value = llvm:: ConstantDataVector :: get (TheContext, elems);
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant data array 和 constant data vector
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1497
StringMap < std::unique_ptr < ConstantDataSequential >> CDSConstants;
注意 ConstantDataSequential
是 ConstantDataArray
和 ConstantDataVector
的父类
另外,这里 mapping 的 key 是字符串类型,以上述调用为例
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Constants.cpp#L3030
Constant * ConstantDataVector :: get ( LLVMContext & Context , ArrayRef < float > Elts ) {
auto * Ty = FixedVectorType:: get ( Type :: getFloatTy (Context), Elts . size ());
const char * Data = reinterpret_cast <const char * > ( Elts . data ());
return getImpl ( StringRef (Data, Elts . size () * 4 ), Ty);
}
这里的字符串是由常量值构造的
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Constants.cpp#L2891
Constant * ConstantDataSequential :: getImpl ( StringRef Elements , Type * Ty ) {
// If the elements are all zero or there are no elements, return a CAZ, which
// is more dense and canonical.
if ( isAllZeros (Elements))
return ConstantAggregateZero:: get (Ty);
当元素全零时,ConstantDataSequential
会退化为 ConstantAggregateZero
ConstantAggregate
ConstantStruct
表示结构体常量
例如
struct A {
int i;
int j;
};
const A a = { 1 , 1 };
其生成的 LLVM IR 为
%struct.A = type { i32, i32 }
@_ZL1a = internal constant %struct.A { i32 1, i32 1 }, align 4
可以通过如下代码获取
llvm::Type * i32 = llvm:: Type :: getInt32Ty (TheContext);
llvm::StructType * type = llvm:: StructType :: create (TheContext, {i32, i32}, "A" , false );
llvm::Constant * one = llvm:: ConstantInt :: get (TheContext, llvm:: APInt ( 32 , 1 , false /* isSigned */ ));
std::array < llvm::Constant * , 2 > consts = {one, one};
llvm::Value * value = llvm:: ConstantStruct :: get (type, consts);
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant struct
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1483
using StructConstantsTy = ConstantUniqueMap < ConstantStruct >;
StructConstantsTy StructConstants;
ConstantArray
表示常量数组
当 underlying data type 不 为 simple 1/2/4/8-byte integer 或 float/double 时
例如
struct A {
int i;
int j;
};
const A a[] = {{ 1 , 1 },{ 1 , 1 }};
其生成的 LLVM IR 为
% struct .A = type { i32, i32 }
@_ZL1a = internal constant [ 2 x % struct . A ] [ % struct . A { i32 1 , i32 1 }, % struct . A { i32 1 , i32 1 }], align 16
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant array
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1480
using ArrayConstantsTy = ConstantUniqueMap < ConstantArray >;
ArrayConstantsTy ArrayConstants;
参考
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/ConstantsContext.h#L551
template < class ConstantClass > class ConstantUniqueMap {
public:
using ValType = typename ConstantInfo< ConstantClass >:: ValType ;
using TypeClass = typename ConstantInfo< ConstantClass >:: TypeClass ;
using LookupKey = std:: pair < TypeClass * , ValType >;
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/ConstantsContext.h#L326
template <> struct ConstantInfo < ConstantArray > {
using ValType = ConstantAggrKeyType < ConstantArray >;
using TypeClass = ArrayType ;
};
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/ConstantsContext.h#L339
template < class ConstantClass > struct ConstantAggrKeyType {
ArrayRef < Constant * > Operands;
可知缓存的 mapping 中 key 形式如下
{ArrayType *, ArrayRef<Constant *>}
GlobalValue
用于表示全局定义的对象
再次强调,函数和全局变量的常量性体现在它们的地址 不会发生变化,相当于一个顶层 const 指针指向这些对象
GlobalVariable
表示全局变量
例如
其生成的 LLVM IR 为
@a = dso_local global i32 1, align 4
这里的 dso_local 的含义如下
The compiler may assume that a function or variable marked as dso_local will resolve to a symbol within the same linkage unit. Direct access will be generated even if the definition is not within this compilation unit.
另一个例子,对于
其生成的 LLVM IR 为
@_ZL1a = internal global i32 1, align 4
这里的 internal 的含义如下
Similar to private, but the value shows as a local symbol (STB_LOCAL
in the case of ELF) in the object file. This corresponds to the notion of the ‘static
’ keyword in C.
注意这里出现了 name mangling,对于 internal 链接类型的 value,其对应的符号名和目标文件中的一致
联系之前的 internal constant
此处目标文件的类型为 ELF
13: 0000000000004010 4 OBJECT LOCAL DEFAULT 22 _ZL1a
上述 IR 也许 可以通过如下代码获取
auto * value = new llvm:: GlobalVariable ( llvm :: Type :: getInt32Ty (TheContext), false /* isConstant */ , llvm::GlobalValue::LinkageTypes::InternalLinkage);
value -> setInitializer ( llvm :: ConstantInt :: get (TheContext, llvm:: APInt ( 32 , 1 , false /* isSigned */ )));
global variable 完整的 LLVM IR 语法如下
@<GlobalVarName> = [Linkage] [PreemptionSpecifier] [Visibility]
[DLLStorageClass] [ThreadLocal]
[(unnamed_addr|local_unnamed_addr)] [AddrSpace]
[ExternallyInitialized]
<global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, partition "name"]
[, comdat [($name)]] [, align <Alignment>]
[, no_sanitize_address] [, no_sanitize_hwaddress]
[, sanitize_address_dyninit] [, sanitize_memtag]
(, !name !N)*
其余属性略去暂不介绍
源码层面,所有的 global variable 都存储在当前的 Module
中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Module.h#L181
GlobalListType GlobalList; ///< The Global Variables in the module
其中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Module.h#L69
/// The type for the list of global variables.
using GlobalListType = SymbolTableList<GlobalVariable>;
可以使用下述代码遍历当前 module 所有的 global variable
for ( auto it = TheModule -> global_begin (); it != TheModule -> global_end (); ++ it) {
llvm::GlobalVariable & value = * it;
...
}
这得益于 GlobalVariable
类还继承了 ilist_node<GlobalVariable>
class GlobalVariable : public GlobalObject , public ilist_node < GlobalVariable >
从而能够通过当前节点 (GlobalVariable),遍历链表上其他节点 (GlobalVariable)
Function
表示函数定义和函数声明
对于函数定义
int foo ( int ) { return {}; }
在 clang -S -emit-llvm a.cpp -O3
下生成的 LLVM IR 为
; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone sspstrong uwtable willreturn
define dso_local noundef i32 @_Z3fooi(i32 noundef %0) local_unnamed_addr #0 {
ret i32 0
}
dso_local 上面已经介绍过了
noundef 作为 parameter attribute (函数参数和返回值的属性),标识参数或者返回值不是 undef 的
local_unnamed_addr 标识函数地址在当前的 module 内不重要,只需要关心函数内容,这样 module 内相同的函数满足一定条件就可以合并
上面还有一些 function attributes,不多介绍了
函数定义完整的 LLVM IR 语法如下
define [linkage] [PreemptionSpecifier] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[(unnamed_addr|local_unnamed_addr)] [AddrSpace] [fn Attrs]
[section "name"] [partition "name"] [comdat [($name)]] [align N]
[gc] [prefix Constant] [prologue Constant] [personality Constant]
(!name !N)* { ... }
上述 IR 也许 可以通过如下代码获取
llvm::Type * i32 = llvm:: Type :: getInt32Ty (TheContext);
std::array < llvm::Type * , 1 > args = {i32};
llvm::FunctionType * type = llvm:: FunctionType :: get (i32, args, false );
llvm::Value * func = llvm:: Function :: Create (type, llvm::GlobalValue::LinkageTypes::ExternalLinkage, 0 /* AddrSpace */ );
对于函数声明,例如 printf
extern int printf ( const char * __restrict __format , ...);
其对应的 LLVM IR 为
declare noundef i32 @_Z6printfPKcz(i8* noundef, ...) #1
函数声明完整的 LLVM IR 语法如下
declare [linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[(unnamed_addr|local_unnamed_addr)] [align N] [gc]
[prefix Constant] [prologue Constant]
源码层面,类似的,所有的 function 都存储在当前的 Module
中
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Module.h#L182
FunctionListType FunctionList; ///< The Functions in the module
Function
类包含一些重要的成员
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/Function.h#L72
using BasicBlockListType = SymbolTableList < BasicBlock >;
...
// Important things that make up a function!
BasicBlockListType BasicBlocks; ///< The basic blocks
mutable Argument * Arguments = nullptr ; ///< The formal arguments
size_t NumArgs;
std::unique_ptr < ValueSymbolTable > SymTab; ///< Symbol table of args/instructions
AttributeList AttributeSets; ///< Parameter attributes
在此主要关注 Argument
类,即函数形参,记录了如下信息
Function
类提供了迭代器接口遍历 arguments 和 basic blocks
BlockAddress
用于唯一标识一组 (Function, BasicBlock)
的地址
由于没有介绍 BasicBlock
,略过
ConstantExpr
表示常量表达式
其核心为下述方法
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Constants.cpp#L2263
Constant * ConstantExpr :: get ( unsigned Opcode , Constant * C1 , Constant * C2 , unsigned Flags , Type * OnlyIfReducedTy )
相当于通过操作数和操作符构造常量表达式
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/Constants.cpp#L2311
if (Constant * FC = ConstantFoldBinaryInstruction (Opcode, C1, C2))
return FC;
在构造常量表达式的过程中,会判断是否可以进行常量折叠
其中使用了大量 isa<>
等模板判断 value 是否为 undef 或者 poison
这里简单介绍一下 Undefined Values 和 Poison Values
相关的继承关系如下
引入这两种 value 的原因是,LLVM IR 存在 undefined behavior 这个概念,例如常见的 signed integer overflow
bool foo ( int a ) { return a + 1 > a; }
其对应的 LLVM IR 为
注意这里的 nsw
符号,代表 No Signed Wrap ,当 %3
的值为 INT_MAX
时,由于 INT_MAX + 1
会导致 signed integer overflow,此时的 %4
即为 poison value
在之前 的 LLVM 实现中,上述情形下 %4
为 undefined value
在 undefined value 上进行运算将会产生 undefined value,而不是产生 undefined behavior,在某些情形下,可能会产生一些优化,例如编译器会认为 undef & 1
只有最低位是 undefined 的,于是 ((undef & 1) >> 1)
就会被认为是 0
A ‘poison
’ value should be used instead of ‘undef
’ whenever possible. Poison values are stronger than undef, and enable more optimizations. Just the existence of ‘undef
’ blocks certain optimizations.
在 2016 年,LLVM 社区曾提议弃用 undef 而只使用 poison,不过目前看来 undef 和 poison 仍然是并存的
另一个出现常量折叠的地方是使用 IRBuilder 构建指令时,例如
llvm::Constant * one = llvm:: ConstantInt :: get (TheContext, llvm:: APInt ( 32 , 1 , false /* isSigned */ ));
llvm::Value * value = Builder . CreateAdd (one, one);
追踪其可能的调用轨迹
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/IRBuilder.h#L1242
Value * CreateAdd ( Value * LHS , Value * RHS , const Twine & Name = "" , bool HasNUW = false , bool HasNSW = false ) {
if (Value * V = Folder . FoldNoWrapBinOp (Instruction::Add, LHS, RHS, HasNUW, HasNSW))
return V;
return CreateInsertNUWNSWBinOp (Instruction::Add, LHS, RHS, Name, HasNUW, HasNSW);
}
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/include/llvm/IR/ConstantFolder.h#L68
Value * FoldNoWrapBinOp (Instruction:: BinaryOps Opc , Value * LHS , Value * RHS , bool HasNUW , bool HasNSW ) const override {
auto * LC = dyn_cast < Constant >(LHS);
auto * RC = dyn_cast < Constant >(RHS);
if (LC && RC) {
if ( ConstantExpr :: isDesirableBinOp (Opc)) {
unsigned Flags = 0 ;
if (HasNUW)
Flags |= OverflowingBinaryOperator::NoUnsignedWrap;
if (HasNSW)
Flags |= OverflowingBinaryOperator::NoSignedWrap;
return ConstantExpr:: get (Opc, LC, RC, Flags);
}
return ConstantFoldBinaryInstruction (Opc, LC, RC);
}
return nullptr ;
}
若操作数满足一定的条件,会调用 ConstantExpr::get
获取对应的常量表达式,从而实现可能的常量折叠优化
LLVMContextImpl
使用了下述数据结构缓存了所有的 constant expr
https://github.com/llvm/llvm-project/blob/3665da3d0091ab765d54ce643bd82d353c040631/llvm/lib/IR/LLVMContextImpl.h#L1506
ConstantUniqueMap < ConstantExpr > ExprConstants;
TODO
References
https://llvm.org/docs/LangRef.html#type-system
https://llvm.org/docs/LangRef.html#constants
https://llvm.org/docs/LangRef.html#linkage-types
https://llvm.org/docs/LangRef.html#parameter-attributes
https://llvm.org/docs/LangRef.html#function-attributes
https://llvm.org/docs/LangRef.html#global-variables
https://llvm.org/docs/LangRef.html#functions
https://www.llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates
https://llvm.org/doxygen/classllvm_1_1Type.html
https://llvm.org/doxygen/classllvm_1_1Value.html
https://llvm.org/doxygen/classllvm_1_1Constant.html
https://llvm.org/docs/OpaquePointers.html
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project
https://groups.seas.harvard.edu/courses/cs153/2019fa/schedule.html
https://github.com/llvm/llvm-project
https://github.com/ghaiklor/llvm-kaleidoscope
https://github.com/PacktPublishing/Learn-LLVM-12
https://blog.csdn.net/weixin_42654107/article/details/122860584
https://lowlevelbits.org/type-equality-in-llvm/
https://www.youtube.com/watch?v=_-3Iiads1EM
https://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.pdf
https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html