Compare commits

...

26 Commits

Author SHA1 Message Date
github-actions[bot] 70aa5869c0 Update schema with new tags: airshows, dog_friendly, cat_friendly, notes 2026-02-12 21:02:36 +00:00
github-actions[bot] 83c3406699 Add community submission from @ggman12 (closes #14) 2026-02-12 21:02:35 +00:00
ggman12 fecf9ff0ea format properly 2026-02-12 16:01:14 -05:00
ggman12 7e0a396fc7 only modify key parts of schemas/community_submission.v1.schema.json schema. Lowest diffs 2026-02-12 15:55:44 -05:00
ggman12 b0503bb3b2 fix: should update schema now 2026-02-12 15:46:11 -05:00
ggman12 0b89138daf modify existing json schema instead of creating a new file every time 2026-02-12 15:40:01 -05:00
ggman12 4b756cdaef fix syntax error 2026-02-12 15:32:37 -05:00
ggman12 9acffe1e56 handle multiple PRs with schema changes 2026-02-12 15:31:53 -05:00
ggman12 1694fe0b46 allow fileupload in submission 2026-02-12 15:26:45 -05:00
ggman12 c6d9e59d01 update template 2026-02-12 13:29:45 -05:00
ggman12 dd6cd7b6fd update schema with optional start_date and end_date scope 2026-02-12 13:28:43 -05:00
ggman12 f543b671f8 updating schema 2026-02-12 13:22:56 -05:00
ggman12 efb4cbb953 update example 2026-02-12 13:22:43 -05:00
ggman12 5578133a99 update schema to be uppercase only 2026-02-12 12:36:50 -05:00
ggman12 eace7d5a63 update folder 2026-02-12 12:34:27 -05:00
ggman12 82f47b662c make blank username work 2026-02-12 12:32:41 -05:00
ggman12 787796c3ab update approve_submission 2026-02-12 12:26:54 -05:00
ggman12 61aae586ee fix approve 2026-02-12 12:18:28 -05:00
ggman12 5abfa6b226 update submission validation 2026-02-12 12:15:04 -05:00
ggman12 a743b74ae5 Merge branch 'develop' 2026-02-12 12:10:24 -05:00
ggman12 53a020ab73 add jsonschema to requirements.txt 2026-02-12 12:09:03 -05:00
ggman12 2de41c9883 update historical. To check tar and fail fast if any maps fail 2026-02-12 12:01:13 -05:00
ggman12 bccc634158 remove existing release 2026-02-12 11:50:45 -05:00
ggman12 43b07942b0 add needed permissions 2026-02-12 11:42:49 -05:00
ggman12 2c9e994a12 add debug for FAA 2026-02-12 11:06:38 -05:00
ggman12 99b680476a delete parquet chunck after load to not use so much space for big historical run 2026-02-12 10:52:42 -05:00
16 changed files with 798 additions and 63 deletions
@@ -13,29 +13,42 @@ body:
**Rules (enforced on review/automation):** **Rules (enforced on review/automation):**
- Each object must include **at least one** of: - Each object must include **at least one** of:
- `registration_number` - `registration_number`
- `transponder_code_hex` (6 hex chars) - `transponder_code_hex` (6 uppercase hex chars, e.g., `ABC123`)
- `planequery_airframe_id` - `planequery_airframe_id`
- Your contributor name (entered below) will be applied to all objects. - Your contributor name (entered below) will be applied to all objects.
- `contributor_uuid` is derived from your GitHub account automatically. - `contributor_uuid` is derived from your GitHub account automatically.
- `creation_timestamp` is created by the system (you may omit it). - `creation_timestamp` is created by the system (you may omit it).
**Optional date scoping:**
- `start_date` - When the tags become valid (ISO 8601: `YYYY-MM-DD`)
- `end_date` - When the tags stop being valid (ISO 8601: `YYYY-MM-DD`)
**Example: single object** **Example: single object**
```json ```json
{ {
"transponder_code_hex": "a1b2c3" "registration_number": "N12345",
"tags": {"owner": "John Doe"},
"start_date": "2025-01-01"
} }
``` ```
**Example: multiple objects (array)** **Example: multiple objects (array)**
```json ```json
[ [
{ {
"registration_number": "N123AB" "registration_number": "N12345",
}, "tags": {"internet": "starlink"},
{ "start_date": "2025-05-01"
"planequery_airframe_id": "cessna|172s|12345", },
"transponder_code_hex": "0f1234" {
} "registration_number": "N12345",
"tags": {"owner": "John Doe"},
"start_date": "2025-01-01",
"end_date": "2025-07-20"
},
{
"transponder_code_hex": "ABC123",
"tags": {"internet": "viasat", "owner": "John Doe"}
}
] ]
``` ```
@@ -52,9 +65,11 @@ body:
id: submission_json id: submission_json
attributes: attributes:
label: Submission JSON label: Submission JSON
description: Paste either one JSON object or an array of JSON objects. Must be valid JSON. Do not include contributor_name or contributor_uuid in your JSON. description: |
Paste JSON directly, OR drag-and-drop a .json file here.
Must be valid JSON. Do not include contributor_name or contributor_uuid.
placeholder: | placeholder: |
Paste JSON here... Paste JSON here, or drag-and-drop a .json file...
validations: validations:
required: true required: true
@@ -38,9 +38,10 @@ jobs:
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }} GITHUB_REPOSITORY: ${{ github.repository }}
ISSUE_BODY: ${{ github.event.issue.body }}
run: | run: |
python -m src.contributions.approve_submission \ python -m src.contributions.approve_submission \
--issue-number ${{ github.event.issue.number }} \ --issue-number ${{ github.event.issue.number }} \
--issue-body "${{ github.event.issue.body }}" \ --issue-body "$ISSUE_BODY" \
--author "${{ steps.author.outputs.username }}" \ --author "${{ steps.author.outputs.username }}" \
--author-id ${{ steps.author.outputs.user_id }} --author-id ${{ steps.author.outputs.user_id }}
+29 -8
View File
@@ -81,8 +81,22 @@ jobs:
- name: Create tar of extracted data - name: Create tar of extracted data
run: | run: |
cd data/output cd data/output
tar -cf extracted_data.tar *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt 2>/dev/null || echo "Some files may not exist" echo "=== Disk space before tar ==="
ls -lah extracted_data.tar || echo "No tar created" df -h .
echo "=== Files to tar ==="
ls -lah *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt 2>/dev/null || echo "No files found"
# Create tar with explicit error checking
if ls *-planes-readsb-prod-0.tar_0 1>/dev/null 2>&1; then
tar -cvf extracted_data.tar *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt
echo "=== Tar file created ==="
ls -lah extracted_data.tar
# Verify tar integrity
tar -tf extracted_data.tar > /dev/null && echo "Tar integrity check passed" || { echo "Tar integrity check FAILED"; exit 1; }
else
echo "ERROR: No extracted directories found, cannot create tar"
exit 1
fi
- name: Upload extracted data - name: Upload extracted data
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v4
@@ -97,7 +111,7 @@ jobs:
needs: [generate-matrix, adsb-extract] needs: [generate-matrix, adsb-extract]
runs-on: ubuntu-24.04-arm runs-on: ubuntu-24.04-arm
strategy: strategy:
fail-fast: false fail-fast: true
matrix: matrix:
chunk: ${{ fromJson(needs.generate-matrix.outputs.chunks) }} chunk: ${{ fromJson(needs.generate-matrix.outputs.chunks) }}
icao_chunk: [0, 1, 2, 3] icao_chunk: [0, 1, 2, 3]
@@ -134,7 +148,12 @@ jobs:
run: | run: |
cd data/output cd data/output
if [ -f extracted_data.tar ]; then if [ -f extracted_data.tar ]; then
tar -xf extracted_data.tar echo "=== Tar file info ==="
ls -lah extracted_data.tar
echo "=== Verifying tar integrity ==="
tar -tf extracted_data.tar > /dev/null || { echo "ERROR: Tar file is corrupted"; exit 1; }
echo "=== Extracting ==="
tar -xvf extracted_data.tar
rm extracted_data.tar rm extracted_data.tar
echo "has_data=true" >> "$GITHUB_OUTPUT" echo "has_data=true" >> "$GITHUB_OUTPUT"
echo "=== Contents of data/output ===" echo "=== Contents of data/output ==="
@@ -188,17 +207,19 @@ jobs:
- name: Debug downloaded files - name: Debug downloaded files
run: | run: |
echo "=== Disk space before processing ==="
df -h
echo "=== Listing data/output/adsb_chunks/ ===" echo "=== Listing data/output/adsb_chunks/ ==="
find data/output/adsb_chunks/ -type f 2>/dev/null | head -50 || echo "No files found" find data/output/adsb_chunks/ -type f 2>/dev/null | wc -l
echo "=== Looking for parquet files ===" echo "=== Total parquet size ==="
find . -name "*.parquet" 2>/dev/null | head -20 || echo "No parquet files found" du -sh data/output/adsb_chunks/ || echo "No chunks dir"
- name: Combine chunks to CSV - name: Combine chunks to CSV
env: env:
START_DATE: ${{ needs.generate-matrix.outputs.global_start }} START_DATE: ${{ needs.generate-matrix.outputs.global_start }}
END_DATE: ${{ needs.generate-matrix.outputs.global_end }} END_DATE: ${{ needs.generate-matrix.outputs.global_end }}
run: | run: |
python -m src.adsb.combine_chunks_to_csv --chunks-dir data/output/adsb_chunks --start-date "$START_DATE" --end-date "$END_DATE" --skip-base python -m src.adsb.combine_chunks_to_csv --chunks-dir data/output/adsb_chunks --start-date "$START_DATE" --end-date "$END_DATE" --skip-base --stream
ls -lah data/planequery_aircraft/ ls -lah data/planequery_aircraft/
- name: Upload final artifact - name: Upload final artifact
@@ -277,6 +277,15 @@ jobs:
name: community-release name: community-release
path: artifacts/community path: artifacts/community
- name: Debug artifact structure
run: |
echo "=== FAA artifacts ==="
find artifacts/faa -type f 2>/dev/null || echo "No files found in artifacts/faa"
echo "=== ADS-B artifacts ==="
find artifacts/adsb -type f 2>/dev/null || echo "No files found in artifacts/adsb"
echo "=== Community artifacts ==="
find artifacts/community -type f 2>/dev/null || echo "No files found in artifacts/community"
- name: Prepare release metadata - name: Prepare release metadata
id: meta id: meta
run: | run: |
@@ -312,6 +321,13 @@ jobs:
echo "zip_basename=$ZIP_BASENAME" >> "$GITHUB_OUTPUT" echo "zip_basename=$ZIP_BASENAME" >> "$GITHUB_OUTPUT"
echo "name=planequery-aircraft snapshot ($DATE)${BRANCH_SUFFIX}" >> "$GITHUB_OUTPUT" echo "name=planequery-aircraft snapshot ($DATE)${BRANCH_SUFFIX}" >> "$GITHUB_OUTPUT"
- name: Delete existing release if exists
run: |
gh release delete "${{ steps.meta.outputs.tag }}" --yes 2>/dev/null || true
git push --delete origin "refs/tags/${{ steps.meta.outputs.tag }}" 2>/dev/null || true
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Create GitHub Release and upload assets - name: Create GitHub Release and upload assets
uses: softprops/action-gh-release@v2 uses: softprops/action-gh-release@v2
with: with:
@@ -0,0 +1,77 @@
name: Update Community PRs After Merge
on:
push:
branches: [main]
paths:
- 'community/**'
- 'schemas/community_submission.v1.schema.json'
permissions:
contents: write
pull-requests: write
jobs:
update-open-prs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install jsonschema
- name: Find and update open community PRs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get list of open community PRs
prs=$(gh pr list --label community --state open --json number,headRefName --jq '.[] | "\(.number) \(.headRefName)"')
if [ -z "$prs" ]; then
echo "No open community PRs found"
exit 0
fi
echo "$prs" | while read pr_number branch_name; do
echo "Processing PR #$pr_number (branch: $branch_name)"
# Checkout PR branch
git fetch origin "$branch_name"
git checkout "$branch_name"
# Merge main into PR branch
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
if git merge origin/main -m "Merge main to update schema"; then
# Regenerate schema for this PR's submission (adds any new tags)
python -m src.contributions.regenerate_pr_schema || true
# If there are changes, commit and push
if [ -n "$(git status --porcelain schemas/)" ]; then
git add schemas/
git commit -m "Update schema with new tags"
git push origin "$branch_name"
echo " Updated PR #$pr_number with schema changes"
else
git push origin "$branch_name"
echo " Merged main into PR #$pr_number"
fi
else
echo " Merge conflict in PR #$pr_number, adding comment"
gh pr comment "$pr_number" --body $'⚠️ **Merge Conflict**\n\nAnother community submission was merged and this PR has conflicts.\n\nA maintainer may need to:\n1. Close this PR\n2. Remove the `approved` label from the original issue\n3. Re-add the `approved` label to regenerate the PR'
git merge --abort
fi
fi
git checkout main
done
@@ -4,6 +4,9 @@ on:
issues: issues:
types: [opened, edited] types: [opened, edited]
permissions:
issues: write
jobs: jobs:
validate: validate:
if: contains(github.event.issue.labels.*.name, 'submission') if: contains(github.event.issue.labels.*.name, 'submission')
@@ -20,11 +23,24 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: pip install jsonschema run: pip install jsonschema
- name: Debug issue body
run: |
echo "=== Issue Body ==="
cat << 'ISSUE_BODY_EOF'
${{ github.event.issue.body }}
ISSUE_BODY_EOF
- name: Save issue body to file
run: |
cat << 'ISSUE_BODY_EOF' > /tmp/issue_body.txt
${{ github.event.issue.body }}
ISSUE_BODY_EOF
- name: Validate submission - name: Validate submission
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }} GITHUB_REPOSITORY: ${{ github.repository }}
run: | run: |
python -m src.contributions.validate_submission \ python -m src.contributions.validate_submission \
--issue-body "${{ github.event.issue.body }}" \ --issue-body-file /tmp/issue_body.txt \
--issue-number ${{ github.event.issue.number }} --issue-number ${{ github.event.issue.number }}
@@ -0,0 +1,14 @@
[
{
"contributor_name": "hellohello",
"contributor_uuid": "2981c3ee-8712-5f96-84bf-732eda515a3f",
"creation_timestamp": "2026-02-12T21:02:32.325360+00:00",
"registration_number": "N12345",
"tags": {
"airshows": true,
"cat_friendly": false,
"dog_friendly": true,
"notes": "is a pet carrier"
}
}
]
+1
View File
@@ -3,3 +3,4 @@ pandas==3.0.0
pyarrow==23.0.0 pyarrow==23.0.0
orjson==3.11.7 orjson==3.11.7
polars==1.38.1 polars==1.38.1
jsonschema==4.26.0
+61 -15
View File
@@ -3,7 +3,6 @@
"title": "PlaneQuery Aircraft Community Submission (v1)", "title": "PlaneQuery Aircraft Community Submission (v1)",
"type": "object", "type": "object",
"additionalProperties": false, "additionalProperties": false,
"properties": { "properties": {
"registration_number": { "registration_number": {
"type": "string", "type": "string",
@@ -11,13 +10,12 @@
}, },
"transponder_code_hex": { "transponder_code_hex": {
"type": "string", "type": "string",
"pattern": "^[0-9A-Fa-f]{6}$" "pattern": "^[0-9A-F]{6}$"
}, },
"planequery_airframe_id": { "planequery_airframe_id": {
"type": "string", "type": "string",
"minLength": 1 "minLength": 1
}, },
"contributor_uuid": { "contributor_uuid": {
"type": "string", "type": "string",
"format": "uuid" "format": "uuid"
@@ -28,14 +26,24 @@
"maxLength": 150, "maxLength": 150,
"description": "Display name (may be blank)" "description": "Display name (may be blank)"
}, },
"creation_timestamp": { "creation_timestamp": {
"type": "string", "type": "string",
"format": "date-time", "format": "date-time",
"description": "Set by the system when the submission is persisted/approved.", "description": "Set by the system when the submission is persisted/approved.",
"readOnly": true "readOnly": true
}, },
"start_date": {
"type": "string",
"format": "date",
"pattern": "^\\d{4}-\\d{2}-\\d{2}$",
"description": "Optional start date for when this submission's tags are valid (ISO 8601, e.g., 2025-05-01)."
},
"end_date": {
"type": "string",
"format": "date",
"pattern": "^\\d{4}-\\d{2}-\\d{2}$",
"description": "Optional end date for when this submission's tags are valid (ISO 8601, e.g., 2025-07-03)."
},
"tags": { "tags": {
"type": "object", "type": "object",
"description": "Additional community-defined tags as key/value pairs (values may be scalar, array, or object).", "description": "Additional community-defined tags as key/value pairs (values may be scalar, array, or object).",
@@ -43,36 +51,74 @@
"type": "string", "type": "string",
"pattern": "^[a-z][a-z0-9_]{0,63}$" "pattern": "^[a-z][a-z0-9_]{0,63}$"
}, },
"additionalProperties": { "$ref": "#/$defs/tagValue" } "additionalProperties": {
"$ref": "#/$defs/tagValue"
},
"properties": {
"airshows": {
"type": "boolean"
},
"cat_friendly": {
"type": "boolean"
},
"dog_friendly": {
"type": "boolean"
},
"notes": {
"type": "string"
}
}
} }
}, },
"allOf": [ "allOf": [
{ {
"anyOf": [ "anyOf": [
{ "required": ["registration_number"] }, {
{ "required": ["transponder_code_hex"] }, "required": [
{ "required": ["planequery_airframe_id"] } "registration_number"
]
},
{
"required": [
"transponder_code_hex"
]
},
{
"required": [
"planequery_airframe_id"
]
}
] ]
} }
], ],
"$defs": { "$defs": {
"tagScalar": { "tagScalar": {
"type": ["string", "number", "integer", "boolean", "null"] "type": [
"string",
"number",
"integer",
"boolean",
"null"
]
}, },
"tagValue": { "tagValue": {
"anyOf": [ "anyOf": [
{ "$ref": "#/$defs/tagScalar" }, {
"$ref": "#/$defs/tagScalar"
},
{ {
"type": "array", "type": "array",
"maxItems": 50, "maxItems": 50,
"items": { "$ref": "#/$defs/tagScalar" } "items": {
"$ref": "#/$defs/tagScalar"
}
}, },
{ {
"type": "object", "type": "object",
"maxProperties": 50, "maxProperties": 50,
"additionalProperties": { "$ref": "#/$defs/tagScalar" } "additionalProperties": {
"$ref": "#/$defs/tagScalar"
}
} }
] ]
} }
+18 -3
View File
@@ -36,8 +36,13 @@ def get_target_day() -> datetime:
return datetime.utcnow() - timedelta(days=1) return datetime.utcnow() - timedelta(days=1)
def process_single_chunk(chunk_path: str) -> pl.DataFrame: def process_single_chunk(chunk_path: str, delete_after_load: bool = False) -> pl.DataFrame:
"""Load and compress a single chunk parquet file.""" """Load and compress a single chunk parquet file.
Args:
chunk_path: Path to parquet file
delete_after_load: If True, delete the parquet file after loading to free disk space
"""
print(f"Processing {os.path.basename(chunk_path)}... | {get_resource_usage()}") print(f"Processing {os.path.basename(chunk_path)}... | {get_resource_usage()}")
# Load chunk - only columns we need # Load chunk - only columns we need
@@ -45,6 +50,14 @@ def process_single_chunk(chunk_path: str) -> pl.DataFrame:
df = pl.read_parquet(chunk_path, columns=needed_columns) df = pl.read_parquet(chunk_path, columns=needed_columns)
print(f" Loaded {len(df)} rows") print(f" Loaded {len(df)} rows")
# Delete file immediately after loading to free disk space
if delete_after_load:
try:
os.remove(chunk_path)
print(f" Deleted {chunk_path} to free disk space")
except Exception as e:
print(f" Warning: Failed to delete {chunk_path}: {e}")
# Compress to aircraft records (one per ICAO) using shared function # Compress to aircraft records (one per ICAO) using shared function
compressed = compress_multi_icao_df(df, verbose=True) compressed = compress_multi_icao_df(df, verbose=True)
print(f" Compressed to {len(compressed)} aircraft records") print(f" Compressed to {len(compressed)} aircraft records")
@@ -156,6 +169,7 @@ def main():
parser.add_argument("--chunks-dir", type=str, default=DEFAULT_CHUNK_DIR, help="Directory containing chunk parquet files") parser.add_argument("--chunks-dir", type=str, default=DEFAULT_CHUNK_DIR, help="Directory containing chunk parquet files")
parser.add_argument("--skip-base", action="store_true", help="Skip downloading and merging base release") parser.add_argument("--skip-base", action="store_true", help="Skip downloading and merging base release")
parser.add_argument("--keep-chunks", action="store_true", help="Keep chunk files after merging") parser.add_argument("--keep-chunks", action="store_true", help="Keep chunk files after merging")
parser.add_argument("--stream", action="store_true", help="Delete parquet files immediately after loading to save disk space")
args = parser.parse_args() args = parser.parse_args()
# Determine output ID and filename based on mode # Determine output ID and filename based on mode
@@ -190,9 +204,10 @@ def main():
print(f"Found {len(chunk_files)} chunk files") print(f"Found {len(chunk_files)} chunk files")
# Process each chunk separately to save memory # Process each chunk separately to save memory
# With --stream, delete parquet files immediately after loading to save disk space
compressed_chunks = [] compressed_chunks = []
for chunk_path in chunk_files: for chunk_path in chunk_files:
compressed = process_single_chunk(chunk_path) compressed = process_single_chunk(chunk_path, delete_after_load=args.stream)
compressed_chunks.append(compressed) compressed_chunks.append(compressed)
gc.collect() gc.collect()
+73 -13
View File
@@ -21,12 +21,14 @@ import urllib.request
import urllib.error import urllib.error
from datetime import datetime, timezone from datetime import datetime, timezone
from .schema import extract_json_from_issue_body, extract_contributor_name_from_issue_body, parse_and_validate from .schema import extract_json_from_issue_body, extract_contributor_name_from_issue_body, parse_and_validate, load_schema, SCHEMAS_DIR
from .contributor import ( from .contributor import (
generate_contributor_uuid, generate_contributor_uuid,
generate_submission_filename, generate_submission_filename,
compute_content_hash, compute_content_hash,
) )
from .update_schema import generate_updated_schema, check_for_new_tags, get_existing_tag_definitions
from .read_community_data import build_tag_type_registry
def github_api_request( def github_api_request(
@@ -54,7 +56,11 @@ def github_api_request(
try: try:
with urllib.request.urlopen(req) as response: with urllib.request.urlopen(req) as response:
return json.loads(response.read()) response_body = response.read()
# DELETE requests return empty body (204 No Content)
if not response_body:
return {}
return json.loads(response_body)
except urllib.error.HTTPError as e: except urllib.error.HTTPError as e:
error_body = e.read().decode() if e.fp else "" error_body = e.read().decode() if e.fp else ""
print(f"GitHub API error: {e.code} {e.reason}: {error_body}", file=sys.stderr) print(f"GitHub API error: {e.code} {e.reason}: {error_body}", file=sys.stderr)
@@ -94,14 +100,30 @@ def create_branch(branch_name: str, sha: str) -> None:
raise raise
def get_file_sha(path: str, branch: str) -> str | None:
"""Get the SHA of an existing file, or None if it doesn't exist."""
try:
response = github_api_request("GET", f"/contents/{path}?ref={branch}")
return response.get("sha")
except Exception:
return None
def create_or_update_file(path: str, content: str, message: str, branch: str) -> None: def create_or_update_file(path: str, content: str, message: str, branch: str) -> None:
"""Create or update a file in the repository.""" """Create or update a file in the repository."""
content_b64 = base64.b64encode(content.encode()).decode() content_b64 = base64.b64encode(content.encode()).decode()
github_api_request("PUT", f"/contents/{path}", { payload = {
"message": message, "message": message,
"content": content_b64, "content": content_b64,
"branch": branch, "branch": branch,
}) }
# If file exists, we need to include its SHA to update it
sha = get_file_sha(path, branch)
if sha:
payload["sha"] = sha
github_api_request("PUT", f"/contents/{path}", payload)
def create_pull_request(title: str, head: str, base: str, body: str) -> dict: def create_pull_request(title: str, head: str, base: str, body: str) -> dict:
@@ -144,21 +166,19 @@ def process_submission(
return False return False
data, errors = parse_and_validate(json_str) data, errors = parse_and_validate(json_str)
if errors: if errors or data is None:
error_list = "\n".join(f"- {e}" for e in errors) error_list = "\n".join(f"- {e}" for e in errors) if errors else "Unknown error"
add_issue_comment(issue_number, f"❌ **Validation Failed**\n\n{error_list}") add_issue_comment(issue_number, f"❌ **Validation Failed**\n\n{error_list}")
return False return False
# Normalize to list # Normalize to list
submissions = data if isinstance(data, list) else [data] submissions: list[dict] = data if isinstance(data, list) else [data]
# Generate contributor UUID from GitHub ID # Generate contributor UUID from GitHub ID
contributor_uuid = generate_contributor_uuid(author_id) contributor_uuid = generate_contributor_uuid(author_id)
# Extract contributor name from issue form (or default to GitHub username) # Extract contributor name from issue form (None means user opted out of attribution)
contributor_name = extract_contributor_name_from_issue_body(issue_body) contributor_name = extract_contributor_name_from_issue_body(issue_body)
if not contributor_name:
contributor_name = f"@{author_username}"
# Add metadata to each submission # Add metadata to each submission
now = datetime.now(timezone.utc) now = datetime.now(timezone.utc)
@@ -167,14 +187,15 @@ def process_submission(
for submission in submissions: for submission in submissions:
submission["contributor_uuid"] = contributor_uuid submission["contributor_uuid"] = contributor_uuid
submission["contributor_name"] = contributor_name if contributor_name:
submission["contributor_name"] = contributor_name
submission["creation_timestamp"] = timestamp_str submission["creation_timestamp"] = timestamp_str
# Generate unique filename # Generate unique filename
content_json = json.dumps(submissions, indent=2, sort_keys=True) content_json = json.dumps(submissions, indent=2, sort_keys=True)
content_hash = compute_content_hash(content_json) content_hash = compute_content_hash(content_json)
filename = generate_submission_filename(author_username, date_str, content_hash) filename = generate_submission_filename(author_username, date_str, content_hash)
file_path = f"community/{filename}" file_path = f"community/{date_str}/{filename}"
# Create branch # Create branch
branch_name = f"community-submission-{issue_number}" branch_name = f"community-submission-{issue_number}"
@@ -185,14 +206,53 @@ def process_submission(
commit_message = f"Add community submission from @{author_username} (closes #{issue_number})" commit_message = f"Add community submission from @{author_username} (closes #{issue_number})"
create_or_update_file(file_path, content_json, commit_message, branch_name) create_or_update_file(file_path, content_json, commit_message, branch_name)
# Update schema with any new tags (modifies v1 in place)
schema_updated = False
new_tags = []
try:
# Build tag registry from new submissions
tag_registry = build_tag_type_registry(submissions)
# Get current schema and merge existing tags
current_schema = load_schema()
existing_tags = get_existing_tag_definitions(current_schema)
# Merge existing tags into registry
for tag_name, tag_def in existing_tags.items():
if tag_name not in tag_registry:
tag_type = tag_def.get("type", "string")
tag_registry[tag_name] = tag_type
# Check for new tags
new_tags = check_for_new_tags(tag_registry, current_schema)
if new_tags:
# Generate updated schema
updated_schema = generate_updated_schema(current_schema, tag_registry)
schema_json = json.dumps(updated_schema, indent=2) + "\n"
create_or_update_file(
"schemas/community_submission.v1.schema.json",
schema_json,
f"Update schema with new tags: {', '.join(new_tags)}",
branch_name
)
schema_updated = True
except Exception as e:
print(f"Warning: Could not update schema: {e}", file=sys.stderr)
# Create PR # Create PR
schema_note = ""
if schema_updated:
schema_note = f"\n**Schema Updated:** Added new tags: `{', '.join(new_tags)}`\n"
pr_body = f"""## Community Submission pr_body = f"""## Community Submission
Adds {len(submissions)} submission(s) from @{author_username}. Adds {len(submissions)} submission(s) from @{author_username}.
**File:** `{file_path}` **File:** `{file_path}`
**Contributor UUID:** `{contributor_uuid}` **Contributor UUID:** `{contributor_uuid}`
{schema_note}
Closes #{issue_number} Closes #{issue_number}
--- ---
+48 -1
View File
@@ -30,7 +30,8 @@ def read_all_submissions(community_dir: Path | None = None) -> list[dict]:
all_submissions = [] all_submissions = []
for json_file in sorted(community_dir.glob("*.json")): # Search both root directory and date subdirectories (e.g., 2026-02-12/)
for json_file in sorted(community_dir.glob("**/*.json")):
try: try:
with open(json_file) as f: with open(json_file) as f:
data = json.load(f) data = json.load(f)
@@ -50,6 +51,52 @@ def read_all_submissions(community_dir: Path | None = None) -> list[dict]:
return all_submissions return all_submissions
def get_python_type_name(value) -> str:
"""Get a normalized type name for a value."""
if value is None:
return "null"
if isinstance(value, bool):
return "boolean"
if isinstance(value, int):
return "integer"
if isinstance(value, float):
return "number"
if isinstance(value, str):
return "string"
if isinstance(value, list):
return "array"
if isinstance(value, dict):
return "object"
return type(value).__name__
def build_tag_type_registry(submissions: list[dict]) -> dict[str, str]:
"""
Build a registry of tag names to their expected types from existing submissions.
Args:
submissions: List of existing submission dictionaries
Returns:
Dict mapping tag name to expected type (e.g., {"internet": "string", "year_built": "integer"})
"""
tag_types = {}
for submission in submissions:
tags = submission.get("tags", {})
if not isinstance(tags, dict):
continue
for key, value in tags.items():
inferred_type = get_python_type_name(value)
if key not in tag_types:
tag_types[key] = inferred_type
# If there's a conflict, keep the first type (it's already in use)
return tag_types
def group_by_identifier(submissions: list[dict]) -> dict[str, list[dict]]: def group_by_identifier(submissions: list[dict]) -> dict[str, list[dict]]:
""" """
Group submissions by their identifier (registration, transponder, or airframe ID). Group submissions by their identifier (registration, transponder, or airframe ID).
+66
View File
@@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""
Regenerate schema for a PR branch after main has been merged in.
This script looks at the submission files in this branch and updates
the schema if new tags were introduced.
Usage: python -m src.contributions.regenerate_pr_schema
"""
import json
import sys
from pathlib import Path
# Add parent to path for imports when running as script
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from src.contributions.read_community_data import read_all_submissions, build_tag_type_registry
from src.contributions.update_schema import (
get_existing_tag_definitions,
check_for_new_tags,
generate_updated_schema,
)
from src.contributions.schema import load_schema, SCHEMAS_DIR
def main():
"""Main entry point."""
# Load current schema
current_schema = load_schema()
# Get existing tag definitions from schema
existing_tags = get_existing_tag_definitions(current_schema)
# Read all submissions (including ones from this PR branch)
submissions = read_all_submissions()
if not submissions:
print("No submissions found")
return
# Build tag registry from all submissions
tag_registry = build_tag_type_registry(submissions)
# Check for new tags not in the current schema
new_tags = check_for_new_tags(tag_registry, current_schema)
if new_tags:
print(f"Found new tags: {new_tags}")
print("Updating schema...")
# Generate updated schema
updated_schema = generate_updated_schema(current_schema, tag_registry)
# Write updated schema (in place)
schema_path = SCHEMAS_DIR / "community_submission.v1.schema.json"
with open(schema_path, 'w') as f:
json.dump(updated_schema, f, indent=2)
f.write("\n")
print(f"Updated {schema_path}")
else:
print("No new tags found, schema is up to date")
if __name__ == "__main__":
main()
+116 -8
View File
@@ -10,12 +10,59 @@ except ImportError:
Draft202012Validator = None Draft202012Validator = None
SCHEMA_PATH = Path(__file__).parent.parent.parent / "schemas" / "community_submission.v1.schema.json" SCHEMAS_DIR = Path(__file__).parent.parent.parent / "schemas"
# For backwards compatibility
SCHEMA_PATH = SCHEMAS_DIR / "community_submission.v1.schema.json"
def load_schema() -> dict: def get_latest_schema_version() -> int:
"""Load the community submission schema.""" """
with open(SCHEMA_PATH) as f: Find the latest schema version number.
Returns:
Latest version number (e.g., 1, 2, 3)
"""
import re
pattern = re.compile(r"community_submission\.v(\d+)\.schema\.json$")
max_version = 0
for path in SCHEMAS_DIR.glob("community_submission.v*.schema.json"):
match = pattern.search(path.name)
if match:
version = int(match.group(1))
max_version = max(max_version, version)
return max_version
def get_schema_path(version: int | None = None) -> Path:
"""
Get path to a specific schema version, or latest if version is None.
Args:
version: Schema version number, or None for latest
Returns:
Path to schema file
"""
if version is None:
version = get_latest_schema_version()
return SCHEMAS_DIR / f"community_submission.v{version}.schema.json"
def load_schema(version: int | None = None) -> dict:
"""
Load the community submission schema.
Args:
version: Schema version to load. If None, loads the latest version.
Returns:
Schema dict
"""
schema_path = get_schema_path(version)
with open(schema_path) as f:
return json.load(f) return json.load(f)
@@ -50,11 +97,36 @@ def validate_submission(data: dict | list, schema: dict | None = None) -> list[s
return errors return errors
def download_github_attachment(url: str) -> str | None:
"""
Download content from a GitHub attachment URL.
Args:
url: GitHub attachment URL (e.g., https://github.com/user-attachments/files/...)
Returns:
File content as string, or None if download failed
"""
import urllib.request
import urllib.error
try:
req = urllib.request.Request(url, headers={"User-Agent": "PlaneQuery-Bot"})
with urllib.request.urlopen(req, timeout=30) as response:
return response.read().decode("utf-8")
except (urllib.error.URLError, urllib.error.HTTPError, UnicodeDecodeError) as e:
print(f"Failed to download attachment from {url}: {e}")
return None
def extract_json_from_issue_body(body: str) -> str | None: def extract_json_from_issue_body(body: str) -> str | None:
""" """
Extract JSON from GitHub issue body. Extract JSON from GitHub issue body.
Looks for JSON in the 'Submission JSON' section wrapped in code blocks. Looks for JSON in the 'Submission JSON' section, either:
- A GitHub file attachment URL (drag-and-drop .json file)
- Wrapped in code blocks (```json ... ``` or ``` ... ```)
- Or raw JSON after the header
Args: Args:
body: The issue body text body: The issue body text
@@ -62,13 +134,49 @@ def extract_json_from_issue_body(body: str) -> str | None:
Returns: Returns:
Extracted JSON string or None if not found Extracted JSON string or None if not found
""" """
# Match JSON in "### Submission JSON" section # Try: GitHub attachment URL in the Submission JSON section
pattern = r"### Submission JSON\s*\n\s*```(?:json)?\s*\n([\s\S]*?)\n\s*```" # Format: [filename.json](https://github.com/user-attachments/files/...)
match = re.search(pattern, body) # Or just the raw URL
pattern_attachment = r"### Submission JSON\s*\n[\s\S]*?(https://github\.com/(?:user-attachments/files|.*?/files)/[^\s\)\]]+\.json)"
match = re.search(pattern_attachment, body)
if match:
url = match.group(1)
content = download_github_attachment(url)
if content:
return content.strip()
# Also check for GitHub user-attachments URL anywhere in submission section
pattern_attachment_alt = r"\[.*?\.json\]\((https://github\.com/[^\)]+)\)"
match = re.search(pattern_attachment_alt, body)
if match:
url = match.group(1)
if ".json" in url or "user-attachments" in url:
content = download_github_attachment(url)
if content:
return content.strip()
# Try: JSON in code blocks after "### Submission JSON"
pattern_codeblock = r"### Submission JSON\s*\n\s*```(?:json)?\s*\n([\s\S]*?)\n\s*```"
match = re.search(pattern_codeblock, body)
if match: if match:
return match.group(1).strip() return match.group(1).strip()
# Try: Raw JSON after "### Submission JSON" until next section or end
pattern_raw = r"### Submission JSON\s*\n\s*([\[{][\s\S]*?[\]}])(?=\n###|\n\n###|$)"
match = re.search(pattern_raw, body)
if match:
return match.group(1).strip()
# Try: Any JSON object/array in the body (fallback)
pattern_any = r"([\[{][\s\S]*?[\]}])"
for match in re.finditer(pattern_any, body):
candidate = match.group(1).strip()
# Validate it looks like JSON
if candidate.startswith('{') and candidate.endswith('}'):
return candidate
if candidate.startswith('[') and candidate.endswith(']'):
return candidate
return None return None
+154
View File
@@ -0,0 +1,154 @@
#!/usr/bin/env python3
"""
Update the schema with tag type definitions from existing submissions.
This script reads all community submissions and generates a new schema version
that includes explicit type definitions for all known tags.
When new tags are introduced, a new schema version is created (e.g., v1 -> v2 -> v3).
Usage:
python -m src.contributions.update_schema
python -m src.contributions.update_schema --check # Check if update needed
"""
import argparse
import json
import sys
from pathlib import Path
from .read_community_data import read_all_submissions, build_tag_type_registry
from .schema import SCHEMAS_DIR, get_latest_schema_version, get_schema_path, load_schema
def get_existing_tag_definitions(schema: dict) -> dict[str, dict]:
"""Extract existing tag property definitions from schema."""
tags_props = schema.get("properties", {}).get("tags", {}).get("properties", {})
return tags_props
def type_name_to_json_schema(type_name: str) -> dict:
"""Convert a type name to a JSON Schema type definition."""
type_map = {
"string": {"type": "string"},
"integer": {"type": "integer"},
"number": {"type": "number"},
"boolean": {"type": "boolean"},
"null": {"type": "null"},
"array": {"type": "array", "items": {"$ref": "#/$defs/tagScalar"}},
"object": {"type": "object", "additionalProperties": {"$ref": "#/$defs/tagScalar"}},
}
return type_map.get(type_name, {"$ref": "#/$defs/tagValue"})
def generate_updated_schema(base_schema: dict, tag_registry: dict[str, str]) -> dict:
"""
Generate an updated schema with explicit tag definitions.
Args:
base_schema: The current schema to update
tag_registry: Dict mapping tag name to type name
Returns:
Updated schema dict
"""
schema = json.loads(json.dumps(base_schema)) # Deep copy
# Build tag properties with explicit types
tag_properties = {}
for tag_name, type_name in sorted(tag_registry.items()):
tag_properties[tag_name] = type_name_to_json_schema(type_name)
# Only add/update the properties key within tags, preserve everything else
if "properties" in schema and "tags" in schema["properties"]:
schema["properties"]["tags"]["properties"] = tag_properties
return schema
def check_for_new_tags(tag_registry: dict[str, str], current_schema: dict) -> list[str]:
"""
Check which tags in the registry are not yet defined in the schema.
Returns:
List of new tag names
"""
existing_tags = get_existing_tag_definitions(current_schema)
return [tag for tag in tag_registry if tag not in existing_tags]
def update_schema_file(
tag_registry: dict[str, str],
check_only: bool = False
) -> tuple[bool, list[str]]:
"""
Update the v1 schema file with new tag definitions.
Args:
tag_registry: Dict mapping tag name to type name
check_only: If True, only check if update is needed without writing
Returns:
Tuple of (was_updated, list_of_new_tags)
"""
current_schema = load_schema()
# Find new tags
new_tags = check_for_new_tags(tag_registry, current_schema)
if not new_tags:
return False, []
if check_only:
return True, new_tags
# Generate and write updated schema (in place)
updated_schema = generate_updated_schema(current_schema, tag_registry)
schema_path = get_schema_path()
with open(schema_path, "w") as f:
json.dump(updated_schema, f, indent=2)
f.write("\n")
return True, new_tags
def update_schema_from_submissions(check_only: bool = False) -> tuple[bool, list[str]]:
"""
Read all submissions and update the schema if needed.
Args:
check_only: If True, only check if update is needed without writing
Returns:
Tuple of (was_updated, list_of_new_tags)
"""
submissions = read_all_submissions()
tag_registry = build_tag_type_registry(submissions)
return update_schema_file(tag_registry, check_only)
def main():
parser = argparse.ArgumentParser(description="Update schema with tag definitions")
parser.add_argument("--check", action="store_true", help="Check if update needed without writing")
args = parser.parse_args()
was_updated, new_tags = update_schema_from_submissions(check_only=args.check)
if args.check:
if was_updated:
print(f"Schema update needed. New tags: {', '.join(new_tags)}")
sys.exit(1)
else:
print("Schema is up to date")
sys.exit(0)
else:
if was_updated:
print(f"Updated {get_schema_path()}")
print(f"Added tags: {', '.join(new_tags)}")
else:
print("No update needed")
if __name__ == "__main__":
main()
+78
View File
@@ -7,6 +7,7 @@ submissions when issues are opened or edited.
Usage: Usage:
python -m src.contributions.validate_submission --issue-body "..." python -m src.contributions.validate_submission --issue-body "..."
python -m src.contributions.validate_submission --issue-body-file /path/to/body.txt
python -m src.contributions.validate_submission --file submission.json python -m src.contributions.validate_submission --file submission.json
echo '{"registration_number": "N12345"}' | python -m src.contributions.validate_submission --stdin echo '{"registration_number": "N12345"}' | python -m src.contributions.validate_submission --stdin
@@ -23,6 +24,7 @@ import urllib.request
import urllib.error import urllib.error
from .schema import extract_json_from_issue_body, parse_and_validate, load_schema from .schema import extract_json_from_issue_body, parse_and_validate, load_schema
from .read_community_data import read_all_submissions, build_tag_type_registry, get_python_type_name
def github_api_request(method: str, endpoint: str, data: dict | None = None) -> dict: def github_api_request(method: str, endpoint: str, data: dict | None = None) -> dict:
@@ -65,6 +67,40 @@ def remove_issue_label(issue_number: int, label: str) -> None:
pass # Label might not exist pass # Label might not exist
def validate_tag_consistency(data: dict | list, tag_registry: dict[str, str]) -> list[str]:
"""
Check that tag types in new submissions match existing tag types.
Args:
data: Single submission dict or list of submissions
tag_registry: Dict mapping tag name to expected type
Returns:
List of error messages. Empty list means validation passed.
"""
errors = []
submissions = data if isinstance(data, list) else [data]
for i, submission in enumerate(submissions):
prefix = f"[{i}] " if len(submissions) > 1 else ""
tags = submission.get("tags", {})
if not isinstance(tags, dict):
continue
for key, value in tags.items():
actual_type = get_python_type_name(value)
if key in tag_registry:
expected_type = tag_registry[key]
if actual_type != expected_type:
errors.append(
f"{prefix}tags.{key}: expected type '{expected_type}', got '{actual_type}'"
)
return errors
def validate_and_report(json_str: str, issue_number: int | None = None) -> bool: def validate_and_report(json_str: str, issue_number: int | None = None) -> bool:
""" """
Validate JSON and optionally report to GitHub issue. Validate JSON and optionally report to GitHub issue.
@@ -90,6 +126,33 @@ def validate_and_report(json_str: str, issue_number: int | None = None) -> bool:
return False return False
# Check tag type consistency against existing submissions
if data is not None:
try:
existing_submissions = read_all_submissions()
tag_registry = build_tag_type_registry(existing_submissions)
tag_errors = validate_tag_consistency(data, tag_registry)
if tag_errors:
error_list = "\n".join(f"- {e}" for e in tag_errors)
message = (
f"❌ **Tag Type Mismatch**\n\n"
f"Your submission uses tags with types that don't match existing submissions:\n\n"
f"{error_list}\n\n"
f"Please use the same type as existing tags, or use a different tag name."
)
print(message, file=sys.stderr)
if issue_number:
add_issue_comment(issue_number, message)
remove_issue_label(issue_number, "validated")
return False
except Exception as e:
# Don't fail validation if we can't read existing submissions
print(f"Warning: Could not check tag consistency: {e}", file=sys.stderr)
count = len(data) if isinstance(data, list) else 1 count = len(data) if isinstance(data, list) else 1
message = f"✅ **Validation Passed**\n\n{count} submission(s) validated successfully against the schema.\n\nA maintainer can approve this submission by adding the `approved` label." message = f"✅ **Validation Passed**\n\n{count} submission(s) validated successfully against the schema.\n\nA maintainer can approve this submission by adding the `approved` label."
@@ -106,6 +169,7 @@ def main():
parser = argparse.ArgumentParser(description="Validate community submission JSON") parser = argparse.ArgumentParser(description="Validate community submission JSON")
source_group = parser.add_mutually_exclusive_group(required=True) source_group = parser.add_mutually_exclusive_group(required=True)
source_group.add_argument("--issue-body", help="Issue body text containing JSON") source_group.add_argument("--issue-body", help="Issue body text containing JSON")
source_group.add_argument("--issue-body-file", help="File containing issue body text")
source_group.add_argument("--file", help="JSON file to validate") source_group.add_argument("--file", help="JSON file to validate")
source_group.add_argument("--stdin", action="store_true", help="Read JSON from stdin") source_group.add_argument("--stdin", action="store_true", help="Read JSON from stdin")
@@ -125,6 +189,20 @@ def main():
"Please ensure your JSON is in the 'Submission JSON' field wrapped in code blocks." "Please ensure your JSON is in the 'Submission JSON' field wrapped in code blocks."
) )
sys.exit(1) sys.exit(1)
elif args.issue_body_file:
with open(args.issue_body_file) as f:
issue_body = f.read()
json_str = extract_json_from_issue_body(issue_body)
if not json_str:
print("❌ Could not extract JSON from issue body", file=sys.stderr)
print(f"Issue body:\n{issue_body}", file=sys.stderr)
if args.issue_number:
add_issue_comment(
args.issue_number,
"❌ **Validation Failed**\n\nCould not extract JSON from submission. "
"Please ensure your JSON is in the 'Submission JSON' field."
)
sys.exit(1)
elif args.file: elif args.file:
with open(args.file) as f: with open(args.file) as f:
json_str = f.read() json_str = f.read()