Generate Program Query
Users can leverage the previously introduced SEG API and Call Graph API to query customized program features. Given that these APIs are built on graph structures, data and relations can be extracted from these graphs. The following are some examples demonstrating the capabilities of these APIs.
Running Example
#include <stdlib.h>
#include <string.h>
void init(char* token, char* password){
password = token;
}
void f(char *password) {
char localToken[256];
init(localToken, password);
memset(password, ' ', strlen(password));
memset(localToken, ' ', strlen(localToken));
free(password);
}
int main(){
f("password");
}
We are going to use the running example above to demonstrate several queries.
List callee/caller for each function.
Basic idea:
As the callee/caller relationships captured in
CallGraphNode
, we start by retrieving the correspondingCallGraphNode
for a function using theiterator begin() : end()
andoperator[]
APIs which are provided by theCallGraph
.Once we have the
CallGraphNode
, we list the called functions or caller functions via two iterators:begin() : end()
andcaller_begin() : caller_end()
.
void list_callee_and_cs_foreach_function() {
// Callee
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
outs() << "Function: " << cg_b->first->getName() << "\n";
for (auto cg_node_b = cg_b->second->begin(); cg_node_b != cg_b->second->end(); cg_node_b++) {
outs() << "CS in Line " << DIA->getSrcLine(cg_node_b->first) << " calls ";
if (Function *callee = cg_node_b->second->getFunction())
outs() << "function '" << callee->getName() << "'\n";
else
outs() << "external node\n";
}
outs() << '\n';
}
}
// Caller
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
outs() << "Function: " << cg_b->first->getName() << "\n";
for (auto cg_node_b = cg_b->second->caller_begin(); cg_node_b != cg_b->second->caller_end(); cg_node_b++) {
if (cg_node_b->first) {
outs() << "is called in Line " << DIA->getSrcLine(cg_node_b->first) << " by";
Function *caller = cg_node_b->second->getFunction();
if (caller) {
outs() << "function '" << caller->getName() << "'\n";
}
}
}
outs() << '\n';
}
}
}
The outer loop (
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++)
) use the call graph iterator to iterate through all the functions in the call graph.The if condition (
if (func && !func->isDeclaration())
) checks whether the function is defined in the analyzed code (i.e., it is not a declaration or an external function).For each defined function, it prints the function name.
The inner loop (
for (auto cg_node_b = cg_b->second->begin(); cg_node_b != cg_b->second->end(); cg_node_b++)
) use the call graph node callee iterator to iterate through all calls made by the function.For each call, it prints the line where the call is made and the function that is being called. It prints “external node” if an external function is called.
Listing callers (calling functions) for each function
The outer loop, as before, iterates through all functions in the call graph.
Similarly, the if condition checks whether the function is defined in the analyzed code.
For each defined function, it prints the function name.
The inner loop (
for (auto cg_node_b = cg_b->second->caller_begin(); cg_node_b != cg_b->second->caller_end(); cg_node_b++)
) use the call graph node caller iterator to iterate through all functions that call the function.For each calling function, it prints the line where the call is made and the function that made the call.
Hence, this piece of code provides a comprehensive list of all functions called by and calling each function in the analyzed program.
Outputs
Function: init
Function: f
CS in Line 10 calls function 'init'
CS in Line 11 calls function 'strlen'
CS in Line 12 calls function 'strlen'
CS in Line 13 calls function 'free'
Function: main
CS in Line 17 calls function 'f'
Function: init
is called in Line 10 function 'f'
Function: f
is called in Line 17 function 'main'
Function: main
Interpretation:
In the function
f
:init
is called at line 10strlen
is called at lines 11 and 12free
is called at line 13
In the
main
function,f
is called at line 17.The function
init
is called by functionf
at line 10.The function
f
is called by themain
function at line 17.
From the analysis, it’s clear that the code successfully lists the calling and called functions for each function in the target program.
List transitive callee/caller for each function.
Basic idea:
The
CallGraphNode
presents directly called functions and caller functions through two iterators:begin() : end()
andcaller_begin() : caller_end()
. To fetch the transitive callee/caller, we have to recursively compute theCallGraphNode
for the functions called or caller functions listed from the priorCallGraphNode
, until no moreCallGraphNode
can be collected.We apply this recursive process to every function in the call graph, enabling the collection of transitive callees/callers.
The following code snippet showcases how we transitively collect callees utilizing a helper function forward_traverse_call_graph_from()
. This recursive function uses the callee begin() : end()
iterator of each reached CallGraphNode cg_node
to gather the callee and store them in a set called visited
.
void forward_traverse_call_graph_from(Function* A, std::set<Function*> &visited) {
CBCallGraphNode *cg_node = cg->operator[](A);
if (!cg_node) {
return;
}
if (visited.count(A)){
return;
}
visited.insert(A);
for (auto cg_node_b = cg_node->begin(); cg_node_b != cg_node->end(); cg_node_b++) {
if (Function *callee = cg_node_b->second->getFunction()) {
forward_traverse_call_graph_from(callee, visited);
}
}
}
void list_all_callee_foreach_function() {
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
std::set<Function*> *visited = new std::set<Function*>();
forward_traverse_call_graph_from(const_cast<Function *>(func), *visited);
outs() << "For function: " << func->getName() << " its all callees are: \n";
for (auto callee : *visited) {
outs() << "\t" << callee->getName() << "\n";
}
}
}
}
The C++ code is designed to list all transitive callees (functions being called directly or indirectly) for each function in the program being analyzed. Here’s a breakdown of the process:
forward_traverse_call_graph_from(Function* A, std::set<Function*> &visited)
: This function performs a forward traversal of the call graph starting from function A. It uses recursion to visit each callee (called function) and its subsequent callees. The visited callee functions are stored in thevisited
set to avoid revisiting and infinite recursion.list_all_callee_foreach_function()
: This function iterates over all functions present in the call graph.For each function, it creates a new set
visited
to keep track of the visited callee functions.It then initiates a forward traversal of the call graph starting from the current function using the
forward_traverse_call_graph_from()
function.Finally, it prints the name of the current function and the names of all its transitive callees (both directly and indirectly called functions).
With the help of these functions, you can efficiently analyze a given codebase, understanding the relationship and dependencies between different functions.
Outputs:
For function: init its all callees are:
init
For function: f its all callees are:
init
f
strlen
free
For function: main its all callees are:
init
f
strlen
free
main
The output presents the set of all transitive callees (directly or indirectly called functions) for each function in the running example.
Here’s what the output means for each function:
Function
init
: Its only callee is itself (init
), which means it does not call any other function.Function
f
: Its callees areinit
,strlen
,free
, and itself (f
). So, functionf
callsinit
,strlen
, andfree
directly.Function
main
: Its callees areinit
,f
,strlen
,free
, and itself (main
). This means that functionmain
callsf
directly, andf
then callsinit
,strlen
, andfree
. Somain
callsinit
,strlen
, andfree
indirectly, viaf
.
Practice
Users could have a try to modifiy the function to collect the transitive callers.
Determine whether function A is called by function B
Basic idea:
Function A might be called by Function B through several intermediate functions.
Therefore, to ascertain this relationship, we need to examine the transitive relations.
By applying the forward_traverse_call_graph_from()
helper function, we can collect all callees of Function B and subsequently check if Function A is included.
void is_function_A_called_by_function_B(Function* A, Function* B) {
std::set<Function*> *visited = new std::set<Function*>();
forward_traverse_call_graph_from(const_cast<Function *>(B), *visited);
if (visited->count(A)) {
outs() << "Yes\n";
} else {
outs() << "No\n";
}
}
The C++ code determines whether function A is called by function B, either directly or indirectly, using the prior forward_traverse_call_graph_from()
function. Here’s how it works:
It first creates a new set,
visited
, to store the functions visited during the traversal of the call graph.It then initiates a forward traversal of the call graph starting from function B using the
forward_traverse_call_graph_from()
function. This function populates thevisited
set with all of the callees of function B (including both direct and indirect callees).After the traversal, it checks whether function A is in the
visited
set using thecount()
function.If function A is in the
visited
set, it means function A is called by function B, either directly or indirectly. The program outputs a “Yes” as a result.If function A is not in the
visited
set, it indicates function A is not called by function B. The program outputs a “No” in this case.
List all pointer type variable
Basic idea:
Since variable information is contained within the SEG, we first need to acquire the SEG for a function through the
getSymbolicExprGraph(&f)
API offered by theSymbolicExprGraphBuilder
class.To enumerate pointer type variables, we must check the type of all variables in the function. We begin by iterating over the operand node via the operand iterator
value_node_begin() : value_node_end()
, and then determine the type information for each operand through access to theSEGValue
.
void list_all_pointer_type_variable(Module &M) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
for (auto item = seg->value_node_begin(); item != seg->value_node_end(); item++) {
if (item->second->getSEGValue()->getValue()->getType()->isPointerTy()) {
outs() << item->second->getSEGValue()->getValueName() << "\n";
}
}
outs() << "\n";
}
}
}
Outputs:
Function: init
token
password
Function: f
password
localToken
Function: main
.str
This C++ code list all the pointer type variables in each function in a codebase using Symbolic Expression Graph (SEG) APIs. Here’s a breakdown of the process:
It iterates over each function
f
in moduleM
. It skips the functions that are just declarations and not defined in the code.For each function defined in the code, it fetches the Symbolic Expression Graph (SEG) associated with the function using
segBuilder->getSymbolicExprGraph(&f)
. The SEG represents all expressions in the function wherein the nodes are the values and the edges are the data dependencies.It then loops through all nodes (item->second) in the SEG representative of the computed values within the function (symbol or expression).
For each computed value (node) in the SEG, it checks if the corresponding LLVM Value is of a pointer type through
getType()->isPointerTy()
.If the value is a pointer type, it prints the corresponding name of the variable.
In the given example, the code outputs all pointer variables in each function of the module:
- In function
init
, it identified two pointer variables:token
andpassword
. - In function
f
, it caught two pointer variables:password
andlocalToken
. - In function
main
, it found the pointer string literal.str
.
Practice
We can even fulfill more complex requirements, such as determining if a function, which resides in a pointer-type variable, can be called by another function present in a different pointer-type variable.
Users can achieve this by combining the list_all_pointer_type_variable()
function from this example with the is_function_A_called_by_function_B()
function from the previous example.
Firstly, adjust the list_all_pointer_type_variable()
function to select two pointer-type variables. Next, use the getParentFunction()
API, offered by SEG, to determine their corresponding residing functions. Finally, utilize the is_function_A_called_by_function_B()
function to check if there is a call relationship between these two functions.
List value flow
Basic idea:
The variable’s data-dependence/value-flow can be accessed using several iterators provided by the SEG API. In the following code snippet, we use the vflow_begin() : vflow_end()
iterator to access a variable’s value-flow. Like the call relation, these value-flow relations accessed by iterators are directed. Transitive relations can be collected by recursively accessing the value flow iterator of a variable.
We have encapsulated several high-level APIs that enable users to query the relation between any two vertices of interest. For more information, please refer to the Vulnerability Detection section.
void list_data_dependence(Module &M) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
for (auto item = seg->value_node_begin(); item != seg->value_node_end(); item++) {
if (item->second->getSEGValue()->getValue()->getType()->isPointerTy()) {
outs() << item->second->getSEGValue()->getValueName() << "\n";
for (auto parent = item->second->vflow_begin() ; parent != item->second->vflow_end(); parent++) {
outs() << **parent << "\n";
}
}
}
outs() << "\n";
}
}
}
The C++ code lists all data dependencies related to each pointer type variable within every function in a codebase, leveraging Symbolic Expression Graph (SEG) APIs. Here’s the step-by-step of what this code does:
The code iterates over each function
f
in the moduleM
, excluding the functions that are just declarations and not defined in the code.For each defined function, it retrieves the Symbolic Expression Graph (SEG) for the function using
segBuilder->getSymbolicExprGraph(&f)
. The SEG represents all expressions in the function, with nodes representing values and edges symbolizing data dependencies.It then loops through all nodes (item->second) in SEG, representative of the calculated values within the function (symbol or expression).
For each calculated value (node) in the SEG, it verifies if the corresponding LLVM Value is of a pointer type using
getType()->isPointerTy()
.If the value is of pointer type, it prints the corresponding name of the variable, and afterward, it iterates over its parents (data dependents) via the vflow iterator (
vflow_begin() -> vflow_end()
).For each directly dependent value or parent, it outputs details about this dependency.
Complete Implementation
Provided below is the complete implementation of the above examples.
After setting up the environment, you can execute this on ClearBlue. For more details on the setup, please refer to Preparing Coding Environment.
//
// Created by yongchao on 2/6/24.
//
#include "Analysis/Bitcode/DebugInfoAnalysis.h"
#include "Analysis/CallGraph/CBCallGraph.h"
#include "Language/C-langs/C++/Mangling.h"
#include "IR/SEG/SEGArgumentNode.h"
#include "IR/SEG/SEGCallSite.h"
#include "IR/SEG/SEGReturnNode.h"
#include "IR/SEG/SEGReturnSite.h"
#include "IR/SEG/SymbolicExprGraphBuilder.h"
using namespace llvm;
class querySEG : public ModulePass {
private:
SymbolicExprGraphBuilder *segBuilder = nullptr;
CBCallGraph *cg = nullptr;
DebugInfoAnalysis *DIA = nullptr;
public:
static char ID;
querySEG();
~querySEG() override;
void getAnalysisUsage(AnalysisUsage &AU) const override;
// void initializeAnalysis(Pass *P);
bool runOnModule(Module &M) override;
void list_callee_and_cs_foreach_function();
void list_all_callee_foreach_function();
void forward_traverse_call_graph_from(Function *A,
std::set<Function *> &visited);
void is_function_A_called_by_function_B(Function *A, Function *B);
void list_all_pointer_type_variable(Module &M);
void list_data_dependence(Module &M);
};
char querySEG::ID = 0;
void querySEG::getAnalysisUsage(llvm::AnalysisUsage &AU) const {
AU.setPreservesAll();
AU.addRequired<SymbolicExprGraphBuilder>();
AU.addRequired<CBCallGraphWrapperPass>();
AU.addRequired<DebugInfoAnalysis>();
}
void function_boundary_example(Module &M,
SymbolicExprGraphBuilder *segBuilder) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
int numCommonArg = seg->getNumCommonArgument();
outs() << "\tNumCommonArg: " << numCommonArg << "\n";
for (int i = 0; i < numCommonArg; i++) {
auto commonArg = seg->getCommonArgument(i);
outs() << "\t\t" << commonArg->getSEGValue()->getValue()->getName()
<< "\n";
if (commonArg == seg->findNode(commonArg->getSEGValue())) {
outs() << "\tFind the common arg by SEGValue\n";
}
}
int numPseudoArg = seg->getNumPseudoArgument();
outs() << "\tNumPseudoArg: " << numPseudoArg << "\n";
for (int i = 0; i < numPseudoArg; i++) {
auto pseudoArg = seg->getPseudoArgument(i);
outs() << "\t\t" << pseudoArg->getSEGValue()->getValue()->getName()
<< "\n";
}
if (seg->hasCommonReturnNode()) {
if (seg->getCommonReturn()->getSEGValue() != nullptr) {
outs() << "\tCommonRet: "
<< seg->getCommonReturn()->getSEGValue()->getValue()->getName()
<< "\n";
}
}
int numPseudoRet = seg->getNumPseudoReturn();
outs() << "\tNumPseudoRet: " << numPseudoRet << "\n";
for (int i = 0; i < numPseudoRet; i++) {
auto pseudoRet = seg->getPseudoReturn(i);
outs() << "\t\t" << pseudoRet->getSEGValue()->getValue()->getName()
<< "\n";
}
}
}
}
void iterator_example(Module &M, SymbolicExprGraphBuilder *segBuilder) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
for (auto cs_b = seg->seg_callsite_begin();
cs_b != seg->seg_callsite_end(); cs_b++) {
SEGCallSite *cs = *cs_b;
outs() << "\tCallee: " << cs->getCalledFunction()->getName() << "\n";
}
}
}
}
void querySEG::list_callee_and_cs_foreach_function() {
// for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
// auto func = cg_b->first;
// if (func && !func->isDeclaration()) {
// outs() << cg_b->first->getName() << "\n";
// cg_b->second->print(outs(), *DIA);
// }
// }
// Callee
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
outs() << "Function: " << cg_b->first->getName() << "\n";
for (auto cg_node_b = cg_b->second->begin();
cg_node_b != cg_b->second->end(); cg_node_b++) {
outs() << "CS in Line " << DIA->getSrcLine(cg_node_b->first)
<< " calls ";
if (Function *callee = cg_node_b->second->getFunction())
outs() << "function '" << callee->getName() << "'\n";
else
outs() << "external node\n";
}
outs() << '\n';
}
}
// Caller
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
outs() << "Function: " << cg_b->first->getName() << "\n";
for (auto cg_node_b = cg_b->second->caller_begin();
cg_node_b != cg_b->second->caller_end(); cg_node_b++) {
if (cg_node_b->first) {
outs() << "is called in Line " << DIA->getSrcLine(cg_node_b->first)
<< " by";
Function *caller = cg_node_b->second->getFunction();
if (caller) {
outs() << "function '" << caller->getName() << "'\n";
}
}
}
outs() << '\n';
}
}
}
void querySEG::forward_traverse_call_graph_from(Function *A,
std::set<Function *> &visited) {
CBCallGraphNode *cg_node = cg->operator[](A);
if (!cg_node) {
return;
}
if (visited.count(A)) {
return;
}
visited.insert(A);
for (auto cg_node_b = cg_node->begin(); cg_node_b != cg_node->end();
cg_node_b++) {
if (Function *callee = cg_node_b->second->getFunction()) {
forward_traverse_call_graph_from(callee, visited);
}
}
}
void querySEG::list_all_callee_foreach_function() {
for (auto cg_b = cg->begin(); cg_b != cg->end(); cg_b++) {
auto func = cg_b->first;
if (func && !func->isDeclaration()) {
std::set<Function *> *visited = new std::set<Function *>();
forward_traverse_call_graph_from(const_cast<Function *>(func), *visited);
outs() << "For function: " << func->getName()
<< " its all callees are: \n";
for (auto callee : *visited) {
outs() << "\t" << callee->getName() << "\n";
}
}
}
}
void querySEG::is_function_A_called_by_function_B(Function *A, Function *B) {
std::set<Function *> *visited = new std::set<Function *>();
forward_traverse_call_graph_from(const_cast<Function *>(B), *visited);
if (visited->count(A)) {
outs() << "Yes\n";
} else {
outs() << "No\n";
}
}
void querySEG::list_all_pointer_type_variable(Module &M) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
for (auto item = seg->value_node_begin(); item != seg->value_node_end();
item++) {
if (item->second->getSEGValue()->getValue()->getType()->isPointerTy()) {
outs() << item->second->getSEGValue()->getValueName() << "\n";
}
}
outs() << "\n";
}
}
}
void querySEG::list_data_dependence(llvm::Module &M) {
for (Function &f : M) {
if (!f.isDeclaration()) {
outs() << "Function: " << f.getName() << "\n";
auto seg = segBuilder->getSymbolicExprGraph(&f);
for (auto item = seg->value_node_begin(); item != seg->value_node_end();
item++) {
if (item->second->getSEGValue()->getValue()->getType()->isPointerTy()) {
outs() << item->second->getSEGValue()->getValueName() << "\n";
for (auto parent = item->second->vflow_begin();
parent != item->second->vflow_end(); parent++) {
outs() << **parent << "\n";
}
}
}
outs() << "\n";
}
}
}
bool querySEG::runOnModule(llvm::Module &M) {
outs() << M.getName() << "\n";
segBuilder = &getAnalysis<SymbolicExprGraphBuilder>();
cg = &getAnalysis<CBCallGraphWrapperPass>().getCallGraph();
DIA = &getAnalysis<DebugInfoAnalysis>();
function_boundary_example(M, segBuilder);
iterator_example(M, segBuilder);
list_callee_and_cs_foreach_function();
list_all_callee_foreach_function();
list_all_pointer_type_variable(M);
list_data_dependence(M);
return false;
}
querySEG::querySEG() : ModulePass(ID) {}
querySEG::~querySEG() {}
static RegisterPass<querySEG> Y("query", "SEG query", false, true);
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.