7.4 KiB
Embedded Systems Reverse Engineering
Week 1
Introduction and Overview of Embedded Reverse Engineering: Ethics, Scoping, and Basic Concepts
Non-Credit Practice Exercise 3: Find Cross-References in Ghidra
Objective
Learn how to use Ghidra's cross-reference feature to trace how data flows through code, understanding where specific data is read, written, or referenced.
Prerequisites
- Ghidra installed with
0x0001_hello-worldproject open - Completed Exercise 2 (Find Strings) - you should know where the "hello, world" string is located
- CodeBrowser window open with the binary loaded
Task Description
In this exercise, you'll:
- Navigate to a specific data reference in the
mainfunction - Find where a particular data item (
DAT_...) is used - Trace back to see which functions access this data
- Understand how data flows from memory to the CPU and then to functions
Background: What are Cross-References?
A cross-reference is a link between different parts of the code:
- Code ? Data: An instruction reads or writes data
- Code ? Code: A function calls another function
- Data ? Data: One data item references another
In this exercise, we're tracking code ? data references to understand where and how the program uses the "hello, world" string.
Step-by-Step Instructions
Step 1: Navigate to the main Function
- In Ghidra's CodeBrowser, use Search ? For Address or Label (or press Ctrl+G)
- Type
mainand press Enter - Ghidra will navigate to the
mainfunction - You should see the disassembly in the Listing view (center panel)
Step 2: Locate the ldr Instruction
In the main function's disassembly, look for an ldr (load register) instruction. It should look something like:
ldr r0, [DAT_10000244]
or similar. This instruction:
ldr= load register (read data from memory)r0= put the data into registerr0[DAT_10000244]= read from the address stored at locationDAT_10000244
Step 3: Understand the Notation
In Ghidra's decompiler notation:
DAT_10000244= a data item (not code) at address0x10000244[...]= the address of; accessing memory at that location- The actual value is the address of the "hello, world" string in Flash memory
Step 4: Right-Click on the Data Reference
- In the Listing view, find the
ldrinstruction that loads the string address - Right-click on the
DAT_...part (the data reference) - A context menu should appear
Step 5: Select "References" Option
In the context menu:
- Look for an option that says References
- Click on it to see a submenu
- Select Show References to (this shows "where is this data used?")
Step 6: Review the References Window
A new window should appear showing all the locations where DAT_10000244 (or whatever the address is) is referenced:
Expected output might look like:
DAT_10000244 (1 xref):
main:10000236 (read)
This means:
- The data at
DAT_10000244is used in 1 place - That place is in the
mainfunction at instruction10000236 - It's a read operation (the code is reading this data)
Step 7: Answer These Questions
Question 1: Data Address
- What is the address of the data reference you found? (e.g.,
DAT_10000244) -
Question 2: Referenced By
- How many places reference this data?
-
- Which function(s) use it?
-
Question 3: Reference Type
- Is it a read or write operation?
-
- Why? (What's the program doing with this data?)
-
Question 4: The Chain
- The
ldrinstruction loads an address intor0 - What happens next? (Hint: Look at the next instruction after the
ldr) -
- Is there a function call? If so, which one?
-
Question 5: Understanding the Flow
DAT_10000244contains the address of the "hello, world" string- The
ldrloads that address intor0 - Then a function (probably
printforputs) is called withr0as the argument - Can you trace this complete flow?
Deeper Analysis (Optional Challenge)
Challenge 1: Find the Actual String Address
- Navigate to the
DAT_10000244location - Look at the value stored there
- Can you decode the hex bytes and find the actual address of "hello, world"?
- Hint: The RP2350 uses little-endian encoding, so the bytes are "backwards"
Example:
If you see bytes: CC 19 00 10
Read backwards: 10 00 19 CC = 0x100019CC
Challenge 2: Understand the Indirection
- In C, if we want to load an address, we do:
char *ptr = &some_string; - Then to use it:
printf(ptr); - In assembly, this becomes:
- Load the pointer:
ldr r0, [DAT_...] - Call the function:
bl printf
- Load the pointer:
- Can you see this pattern in the assembly?
Challenge 3: Follow Multiple References
- Try this with different data items in the binary
- Find a data reference that has multiple cross-references
- What data is used in more than one place?
Questions for Reflection
-
Why does the code need to load an address from memory?
- Why can't it just use the address directly?
- Hint: Position-independent code and memory protection
-
What's the relationship between
DAT_10000244and the "hello, world" string?- They're at different addresses - why?
- Which is in Flash and which points to where it's stored?
-
If we wanted to change what gets printed, where would we modify the code?
- Could we just change the string at address
0x100019CC? - Or would we need to change
DAT_10000244? - Or both?
- Could we just change the string at address
-
How does this relate to memory layout?
- Code section (Flash memory starting at
0x10000000) - Data section (constants/strings)
- Is everything at different addresses for a reason?
- Code section (Flash memory starting at
Tips and Hints
- If you right-click and don't see "References", try right-clicking directly on the instruction address instead
- You can also use Search ? For Cross References from the menu for a more advanced search
- In the Decompile view (right side), cross-references may be shown in a different format or with different colors
- Multi-level references: You can right-click on a data item and then follow the chain to another data item
Real-World Applications
Understanding cross-references is crucial for:
- Vulnerability hunting: Finding where user input flows through the code
- Firmware patching: Changing constants, strings, or data values
- Malware analysis: Tracking command-and-control server addresses or encryption keys
- Reverse engineering: Understanding program logic by following data dependencies
Summary
By completing this exercise, you've learned:
- How to find and interpret cross-references in Ghidra
- How to trace data from its definition to where it's used
- How the
ldr(load) instruction works to pass data to functions - The relationship between high-level C code and assembly-level data flow
- How addresses are indirectly referenced in position-independent code
Expected Final Understanding
You should now understand this flow:
String "hello, world" is stored at address 0x100019CC in Flash
?
A pointer to this address is stored at DAT_10000244 in Flash
?
The main() function loads this pointer: ldr r0, [DAT_10000244]
?
main() calls printf with r0 (the string address) as the argument
?
printf() reads the bytes at that address and prints them