# Embedded Systems Reverse Engineering [Repository](https://github.com/mytechnotalent/Embedded-Hacking) ## Week 1 Introduction and Overview of Embedded Reverse Engineering: Ethics, Scoping, and Basic Concepts ### Non-Credit Practice Exercise 3: Find Cross-References in Ghidra #### Objective Learn how to use Ghidra's cross-reference feature to trace how data flows through code, understanding where specific data is read, written, or referenced. #### Prerequisites - Ghidra installed with `0x0001_hello-world` project open - Completed Exercise 2 (Find Strings) - you should know where the "hello, world" string is located - CodeBrowser window open with the binary loaded #### Task Description In this exercise, you'll: 1. Navigate to a specific data reference in the `main` function 2. Find where a particular data item (`DAT_...`) is used 3. Trace back to see which functions access this data 4. Understand how data flows from memory to the CPU and then to functions #### Background: What are Cross-References? A **cross-reference** is a link between different parts of the code: - **Code ? Data**: An instruction reads or writes data - **Code ? Code**: A function calls another function - **Data ? Data**: One data item references another In this exercise, we're tracking **code ? data** references to understand where and how the program uses the "hello, world" string. #### Step-by-Step Instructions ##### Step 1: Navigate to the main Function 1. In Ghidra's CodeBrowser, use **Search ? For Address or Label** (or press **Ctrl+G**) 2. Type `main` and press Enter 3. Ghidra will navigate to the `main` function 4. You should see the disassembly in the Listing view (center panel) ##### Step 2: Locate the `ldr` Instruction In the main function's disassembly, look for an `ldr` (load register) instruction. It should look something like: ``` ldr r0, [DAT_10000244] ``` or similar. This instruction: - **`ldr`** = load register (read data from memory) - **`r0`** = put the data into register `r0` - **`[DAT_10000244]`** = read from the address stored at location `DAT_10000244` ##### Step 3: Understand the Notation In Ghidra's decompiler notation: - **`DAT_10000244`** = a data item (not code) at address `0x10000244` - **`[...]`** = the address of; accessing memory at that location - The actual value is the address of the "hello, world" string in Flash memory ##### Step 4: Right-Click on the Data Reference 1. In the Listing view, find the `ldr` instruction that loads the string address 2. **Right-click** on the `DAT_...` part (the data reference) 3. A context menu should appear ##### Step 5: Select "References" Option In the context menu: 1. Look for an option that says **References** 2. Click on it to see a submenu 3. Select **Show References to** (this shows "where is this data used?") ##### Step 6: Review the References Window A new window should appear showing all the locations where `DAT_10000244` (or whatever the address is) is referenced: **Expected output might look like:** ``` DAT_10000244 (1 xref): main:10000236 (read) ``` This means: - The data at `DAT_10000244` is used in 1 place - That place is in the `main` function at instruction `10000236` - It's a **read** operation (the code is reading this data) ##### Step 7: Answer These Questions ###### Question 1: Data Address - What is the address of the data reference you found? (e.g., `DAT_10000244`) - __________ ###### Question 2: Referenced By - How many places reference this data? - __________ - Which function(s) use it? - __________ ###### Question 3: Reference Type - Is it a read or write operation? - __________ - Why? (What's the program doing with this data?) - __________ ###### Question 4: The Chain - The `ldr` instruction loads an address into `r0` - What happens next? (Hint: Look at the next instruction after the `ldr`) - __________ - Is there a function call? If so, which one? - __________ ###### Question 5: Understanding the Flow - **`DAT_10000244`** contains the address of the "hello, world" string - The `ldr` loads that address into `r0` - Then a function (probably `printf` or `puts`) is called with `r0` as the argument - Can you trace this complete flow? #### Deeper Analysis (Optional Challenge) ##### Challenge 1: Find the Actual String Address 1. Navigate to the `DAT_10000244` location 2. Look at the value stored there 3. Can you decode the hex bytes and find the actual address of "hello, world"? 4. Hint: The RP2350 uses little-endian encoding, so the bytes are "backwards" **Example:** If you see bytes: `CC 19 00 10` Read backwards: `10 00 19 CC` = `0x100019CC` ##### Challenge 2: Understand the Indirection 1. In C, if we want to load an address, we do: `char *ptr = &some_string;` 2. Then to use it: `printf(ptr);` 3. In assembly, this becomes: - Load the pointer: `ldr r0, [DAT_...]` - Call the function: `bl printf` 4. Can you see this pattern in the assembly? ##### Challenge 3: Follow Multiple References 1. Try this with different data items in the binary 2. Find a data reference that has **multiple** cross-references 3. What data is used in more than one place? #### Questions for Reflection 1. **Why does the code need to load an address from memory?** - Why can't it just use the address directly? - Hint: Position-independent code and memory protection 2. **What's the relationship between `DAT_10000244` and the "hello, world" string?** - They're at different addresses - why? - Which is in Flash and which points to where it's stored? 3. **If we wanted to change what gets printed, where would we modify the code?** - Could we just change the string at address `0x100019CC`? - Or would we need to change `DAT_10000244`? - Or both? 4. **How does this relate to memory layout?** - Code section (Flash memory starting at `0x10000000`) - Data section (constants/strings) - Is everything at different addresses for a reason? #### Tips and Hints - If you right-click and don't see "References", try right-clicking directly on the instruction address instead - You can also use **Search ? For Cross References** from the menu for a more advanced search - In the Decompile view (right side), cross-references may be shown in a different format or with different colors - Multi-level references: You can right-click on a data item and then follow the chain to another data item #### Real-World Applications Understanding cross-references is crucial for: - **Vulnerability hunting**: Finding where user input flows through the code - **Firmware patching**: Changing constants, strings, or data values - **Malware analysis**: Tracking command-and-control server addresses or encryption keys - **Reverse engineering**: Understanding program logic by following data dependencies #### Summary By completing this exercise, you've learned: 1. How to find and interpret cross-references in Ghidra 2. How to trace data from its definition to where it's used 3. How the `ldr` (load) instruction works to pass data to functions 4. The relationship between high-level C code and assembly-level data flow 5. How addresses are indirectly referenced in position-independent code #### Expected Final Understanding You should now understand this flow: ``` String "hello, world" is stored at address 0x100019CC in Flash ? A pointer to this address is stored at DAT_10000244 in Flash ? The main() function loads this pointer: ldr r0, [DAT_10000244] ? main() calls printf with r0 (the string address) as the argument ? printf() reads the bytes at that address and prints them ```