Files
Embedded-Hacking/WEEK01/WEEK01-03.md
2026-03-19 15:01:07 -04:00

7.4 KiB

Embedded Systems Reverse Engineering

Repository

Week 1

Introduction and Overview of Embedded Reverse Engineering: Ethics, Scoping, and Basic Concepts

Non-Credit Practice Exercise 3: Find Cross-References in Ghidra

Objective

Learn how to use Ghidra's cross-reference feature to trace how data flows through code, understanding where specific data is read, written, or referenced.

Prerequisites

  • Ghidra installed with 0x0001_hello-world project open
  • Completed Exercise 2 (Find Strings) - you should know where the "hello, world" string is located
  • CodeBrowser window open with the binary loaded

Task Description

In this exercise, you'll:

  1. Navigate to a specific data reference in the main function
  2. Find where a particular data item (DAT_...) is used
  3. Trace back to see which functions access this data
  4. Understand how data flows from memory to the CPU and then to functions

Background: What are Cross-References?

A cross-reference is a link between different parts of the code:

  • Code ? Data: An instruction reads or writes data
  • Code ? Code: A function calls another function
  • Data ? Data: One data item references another

In this exercise, we're tracking code ? data references to understand where and how the program uses the "hello, world" string.

Step-by-Step Instructions

Step 1: Navigate to the main Function
  1. In Ghidra's CodeBrowser, use Search ? For Address or Label (or press Ctrl+G)
  2. Type main and press Enter
  3. Ghidra will navigate to the main function
  4. You should see the disassembly in the Listing view (center panel)
Step 2: Locate the ldr Instruction

In the main function's disassembly, look for an ldr (load register) instruction. It should look something like:

ldr r0, [DAT_10000244]

or similar. This instruction:

  • ldr = load register (read data from memory)
  • r0 = put the data into register r0
  • [DAT_10000244] = read from the address stored at location DAT_10000244
Step 3: Understand the Notation

In Ghidra's decompiler notation:

  • DAT_10000244 = a data item (not code) at address 0x10000244
  • [...] = the address of; accessing memory at that location
  • The actual value is the address of the "hello, world" string in Flash memory
Step 4: Right-Click on the Data Reference
  1. In the Listing view, find the ldr instruction that loads the string address
  2. Right-click on the DAT_... part (the data reference)
  3. A context menu should appear
Step 5: Select "References" Option

In the context menu:

  1. Look for an option that says References
  2. Click on it to see a submenu
  3. Select Show References to (this shows "where is this data used?")
Step 6: Review the References Window

A new window should appear showing all the locations where DAT_10000244 (or whatever the address is) is referenced:

Expected output might look like:

DAT_10000244 (1 xref):
   main:10000236 (read)

This means:

  • The data at DAT_10000244 is used in 1 place
  • That place is in the main function at instruction 10000236
  • It's a read operation (the code is reading this data)
Step 7: Answer These Questions
Question 1: Data Address
  • What is the address of the data reference you found? (e.g., DAT_10000244)

Question 2: Referenced By
  • How many places reference this data?

  • Which function(s) use it?

Question 3: Reference Type
  • Is it a read or write operation?

  • Why? (What's the program doing with this data?)

Question 4: The Chain
  • The ldr instruction loads an address into r0
  • What happens next? (Hint: Look at the next instruction after the ldr)

  • Is there a function call? If so, which one?

Question 5: Understanding the Flow
  • DAT_10000244 contains the address of the "hello, world" string
  • The ldr loads that address into r0
  • Then a function (probably printf or puts) is called with r0 as the argument
  • Can you trace this complete flow?

Deeper Analysis (Optional Challenge)

Challenge 1: Find the Actual String Address
  1. Navigate to the DAT_10000244 location
  2. Look at the value stored there
  3. Can you decode the hex bytes and find the actual address of "hello, world"?
  4. Hint: The RP2350 uses little-endian encoding, so the bytes are "backwards"

Example: If you see bytes: CC 19 00 10 Read backwards: 10 00 19 CC = 0x100019CC

Challenge 2: Understand the Indirection
  1. In C, if we want to load an address, we do: char *ptr = &some_string;
  2. Then to use it: printf(ptr);
  3. In assembly, this becomes:
    • Load the pointer: ldr r0, [DAT_...]
    • Call the function: bl printf
  4. Can you see this pattern in the assembly?
Challenge 3: Follow Multiple References
  1. Try this with different data items in the binary
  2. Find a data reference that has multiple cross-references
  3. What data is used in more than one place?

Questions for Reflection

  1. Why does the code need to load an address from memory?

    • Why can't it just use the address directly?
    • Hint: Position-independent code and memory protection
  2. What's the relationship between DAT_10000244 and the "hello, world" string?

    • They're at different addresses - why?
    • Which is in Flash and which points to where it's stored?
  3. If we wanted to change what gets printed, where would we modify the code?

    • Could we just change the string at address 0x100019CC?
    • Or would we need to change DAT_10000244?
    • Or both?
  4. How does this relate to memory layout?

    • Code section (Flash memory starting at 0x10000000)
    • Data section (constants/strings)
    • Is everything at different addresses for a reason?

Tips and Hints

  • If you right-click and don't see "References", try right-clicking directly on the instruction address instead
  • You can also use Search ? For Cross References from the menu for a more advanced search
  • In the Decompile view (right side), cross-references may be shown in a different format or with different colors
  • Multi-level references: You can right-click on a data item and then follow the chain to another data item

Real-World Applications

Understanding cross-references is crucial for:

  • Vulnerability hunting: Finding where user input flows through the code
  • Firmware patching: Changing constants, strings, or data values
  • Malware analysis: Tracking command-and-control server addresses or encryption keys
  • Reverse engineering: Understanding program logic by following data dependencies

Summary

By completing this exercise, you've learned:

  1. How to find and interpret cross-references in Ghidra
  2. How to trace data from its definition to where it's used
  3. How the ldr (load) instruction works to pass data to functions
  4. The relationship between high-level C code and assembly-level data flow
  5. How addresses are indirectly referenced in position-independent code

Expected Final Understanding

You should now understand this flow:

String "hello, world" is stored at address 0x100019CC in Flash
    ?
A pointer to this address is stored at DAT_10000244 in Flash
    ?
The main() function loads this pointer: ldr r0, [DAT_10000244]
    ?
main() calls printf with r0 (the string address) as the argument
    ?
printf() reads the bytes at that address and prints them