Files
Embedded-Hacking/WEEK01/WEEK01-03.md
2026-03-19 15:01:07 -04:00

205 lines
7.4 KiB
Markdown

# Embedded Systems Reverse Engineering
[Repository](https://github.com/mytechnotalent/Embedded-Hacking)
## Week 1
Introduction and Overview of Embedded Reverse Engineering: Ethics, Scoping, and Basic Concepts
### Non-Credit Practice Exercise 3: Find Cross-References in Ghidra
#### Objective
Learn how to use Ghidra's cross-reference feature to trace how data flows through code, understanding where specific data is read, written, or referenced.
#### Prerequisites
- Ghidra installed with `0x0001_hello-world` project open
- Completed Exercise 2 (Find Strings) - you should know where the "hello, world" string is located
- CodeBrowser window open with the binary loaded
#### Task Description
In this exercise, you'll:
1. Navigate to a specific data reference in the `main` function
2. Find where a particular data item (`DAT_...`) is used
3. Trace back to see which functions access this data
4. Understand how data flows from memory to the CPU and then to functions
#### Background: What are Cross-References?
A **cross-reference** is a link between different parts of the code:
- **Code ? Data**: An instruction reads or writes data
- **Code ? Code**: A function calls another function
- **Data ? Data**: One data item references another
In this exercise, we're tracking **code ? data** references to understand where and how the program uses the "hello, world" string.
#### Step-by-Step Instructions
##### Step 1: Navigate to the main Function
1. In Ghidra's CodeBrowser, use **Search ? For Address or Label** (or press **Ctrl+G**)
2. Type `main` and press Enter
3. Ghidra will navigate to the `main` function
4. You should see the disassembly in the Listing view (center panel)
##### Step 2: Locate the `ldr` Instruction
In the main function's disassembly, look for an `ldr` (load register) instruction. It should look something like:
```
ldr r0, [DAT_10000244]
```
or similar. This instruction:
- **`ldr`** = load register (read data from memory)
- **`r0`** = put the data into register `r0`
- **`[DAT_10000244]`** = read from the address stored at location `DAT_10000244`
##### Step 3: Understand the Notation
In Ghidra's decompiler notation:
- **`DAT_10000244`** = a data item (not code) at address `0x10000244`
- **`[...]`** = the address of; accessing memory at that location
- The actual value is the address of the "hello, world" string in Flash memory
##### Step 4: Right-Click on the Data Reference
1. In the Listing view, find the `ldr` instruction that loads the string address
2. **Right-click** on the `DAT_...` part (the data reference)
3. A context menu should appear
##### Step 5: Select "References" Option
In the context menu:
1. Look for an option that says **References**
2. Click on it to see a submenu
3. Select **Show References to** (this shows "where is this data used?")
##### Step 6: Review the References Window
A new window should appear showing all the locations where `DAT_10000244` (or whatever the address is) is referenced:
**Expected output might look like:**
```
DAT_10000244 (1 xref):
main:10000236 (read)
```
This means:
- The data at `DAT_10000244` is used in 1 place
- That place is in the `main` function at instruction `10000236`
- It's a **read** operation (the code is reading this data)
##### Step 7: Answer These Questions
###### Question 1: Data Address
- What is the address of the data reference you found? (e.g., `DAT_10000244`)
- __________
###### Question 2: Referenced By
- How many places reference this data?
- __________
- Which function(s) use it?
- __________
###### Question 3: Reference Type
- Is it a read or write operation?
- __________
- Why? (What's the program doing with this data?)
- __________
###### Question 4: The Chain
- The `ldr` instruction loads an address into `r0`
- What happens next? (Hint: Look at the next instruction after the `ldr`)
- __________
- Is there a function call? If so, which one?
- __________
###### Question 5: Understanding the Flow
- **`DAT_10000244`** contains the address of the "hello, world" string
- The `ldr` loads that address into `r0`
- Then a function (probably `printf` or `puts`) is called with `r0` as the argument
- Can you trace this complete flow?
#### Deeper Analysis (Optional Challenge)
##### Challenge 1: Find the Actual String Address
1. Navigate to the `DAT_10000244` location
2. Look at the value stored there
3. Can you decode the hex bytes and find the actual address of "hello, world"?
4. Hint: The RP2350 uses little-endian encoding, so the bytes are "backwards"
**Example:**
If you see bytes: `CC 19 00 10`
Read backwards: `10 00 19 CC` = `0x100019CC`
##### Challenge 2: Understand the Indirection
1. In C, if we want to load an address, we do: `char *ptr = &some_string;`
2. Then to use it: `printf(ptr);`
3. In assembly, this becomes:
- Load the pointer: `ldr r0, [DAT_...]`
- Call the function: `bl printf`
4. Can you see this pattern in the assembly?
##### Challenge 3: Follow Multiple References
1. Try this with different data items in the binary
2. Find a data reference that has **multiple** cross-references
3. What data is used in more than one place?
#### Questions for Reflection
1. **Why does the code need to load an address from memory?**
- Why can't it just use the address directly?
- Hint: Position-independent code and memory protection
2. **What's the relationship between `DAT_10000244` and the "hello, world" string?**
- They're at different addresses - why?
- Which is in Flash and which points to where it's stored?
3. **If we wanted to change what gets printed, where would we modify the code?**
- Could we just change the string at address `0x100019CC`?
- Or would we need to change `DAT_10000244`?
- Or both?
4. **How does this relate to memory layout?**
- Code section (Flash memory starting at `0x10000000`)
- Data section (constants/strings)
- Is everything at different addresses for a reason?
#### Tips and Hints
- If you right-click and don't see "References", try right-clicking directly on the instruction address instead
- You can also use **Search ? For Cross References** from the menu for a more advanced search
- In the Decompile view (right side), cross-references may be shown in a different format or with different colors
- Multi-level references: You can right-click on a data item and then follow the chain to another data item
#### Real-World Applications
Understanding cross-references is crucial for:
- **Vulnerability hunting**: Finding where user input flows through the code
- **Firmware patching**: Changing constants, strings, or data values
- **Malware analysis**: Tracking command-and-control server addresses or encryption keys
- **Reverse engineering**: Understanding program logic by following data dependencies
#### Summary
By completing this exercise, you've learned:
1. How to find and interpret cross-references in Ghidra
2. How to trace data from its definition to where it's used
3. How the `ldr` (load) instruction works to pass data to functions
4. The relationship between high-level C code and assembly-level data flow
5. How addresses are indirectly referenced in position-independent code
#### Expected Final Understanding
You should now understand this flow:
```
String "hello, world" is stored at address 0x100019CC in Flash
?
A pointer to this address is stored at DAT_10000244 in Flash
?
The main() function loads this pointer: ldr r0, [DAT_10000244]
?
main() calls printf with r0 (the string address) as the argument
?
printf() reads the bytes at that address and prints them
```