Compare commits

...

26 Commits

Author SHA1 Message Date
github-actions[bot] 70aa5869c0 Update schema with new tags: airshows, dog_friendly, cat_friendly, notes 2026-02-12 21:02:36 +00:00
github-actions[bot] 83c3406699 Add community submission from @ggman12 (closes #14) 2026-02-12 21:02:35 +00:00
ggman12 fecf9ff0ea format properly 2026-02-12 16:01:14 -05:00
ggman12 7e0a396fc7 only modify key parts of schemas/community_submission.v1.schema.json schema. Lowest diffs 2026-02-12 15:55:44 -05:00
ggman12 b0503bb3b2 fix: should update schema now 2026-02-12 15:46:11 -05:00
ggman12 0b89138daf modify existing json schema instead of creating a new file every time 2026-02-12 15:40:01 -05:00
ggman12 4b756cdaef fix syntax error 2026-02-12 15:32:37 -05:00
ggman12 9acffe1e56 handle multiple PRs with schema changes 2026-02-12 15:31:53 -05:00
ggman12 1694fe0b46 allow fileupload in submission 2026-02-12 15:26:45 -05:00
ggman12 c6d9e59d01 update template 2026-02-12 13:29:45 -05:00
ggman12 dd6cd7b6fd update schema with optional start_date and end_date scope 2026-02-12 13:28:43 -05:00
ggman12 f543b671f8 updating schema 2026-02-12 13:22:56 -05:00
ggman12 efb4cbb953 update example 2026-02-12 13:22:43 -05:00
ggman12 5578133a99 update schema to be uppercase only 2026-02-12 12:36:50 -05:00
ggman12 eace7d5a63 update folder 2026-02-12 12:34:27 -05:00
ggman12 82f47b662c make blank username work 2026-02-12 12:32:41 -05:00
ggman12 787796c3ab update approve_submission 2026-02-12 12:26:54 -05:00
ggman12 61aae586ee fix approve 2026-02-12 12:18:28 -05:00
ggman12 5abfa6b226 update submission validation 2026-02-12 12:15:04 -05:00
ggman12 a743b74ae5 Merge branch 'develop' 2026-02-12 12:10:24 -05:00
ggman12 53a020ab73 add jsonschema to requirements.txt 2026-02-12 12:09:03 -05:00
ggman12 2de41c9883 update historical. To check tar and fail fast if any maps fail 2026-02-12 12:01:13 -05:00
ggman12 bccc634158 remove existing release 2026-02-12 11:50:45 -05:00
ggman12 43b07942b0 add needed permissions 2026-02-12 11:42:49 -05:00
ggman12 2c9e994a12 add debug for FAA 2026-02-12 11:06:38 -05:00
ggman12 99b680476a delete parquet chunck after load to not use so much space for big historical run 2026-02-12 10:52:42 -05:00
16 changed files with 798 additions and 63 deletions
@@ -13,29 +13,42 @@ body:
**Rules (enforced on review/automation):**
- Each object must include **at least one** of:
- `registration_number`
- `transponder_code_hex` (6 hex chars)
- `transponder_code_hex` (6 uppercase hex chars, e.g., `ABC123`)
- `planequery_airframe_id`
- Your contributor name (entered below) will be applied to all objects.
- `contributor_uuid` is derived from your GitHub account automatically.
- `creation_timestamp` is created by the system (you may omit it).
**Optional date scoping:**
- `start_date` - When the tags become valid (ISO 8601: `YYYY-MM-DD`)
- `end_date` - When the tags stop being valid (ISO 8601: `YYYY-MM-DD`)
**Example: single object**
```json
{
"transponder_code_hex": "a1b2c3"
"registration_number": "N12345",
"tags": {"owner": "John Doe"},
"start_date": "2025-01-01"
}
```
**Example: multiple objects (array)**
```json
[
{
"registration_number": "N123AB"
},
{
"planequery_airframe_id": "cessna|172s|12345",
"transponder_code_hex": "0f1234"
}
{
"registration_number": "N12345",
"tags": {"internet": "starlink"},
"start_date": "2025-05-01"
},
{
"registration_number": "N12345",
"tags": {"owner": "John Doe"},
"start_date": "2025-01-01",
"end_date": "2025-07-20"
},
{
"transponder_code_hex": "ABC123",
"tags": {"internet": "viasat", "owner": "John Doe"}
}
]
```
@@ -52,9 +65,11 @@ body:
id: submission_json
attributes:
label: Submission JSON
description: Paste either one JSON object or an array of JSON objects. Must be valid JSON. Do not include contributor_name or contributor_uuid in your JSON.
description: |
Paste JSON directly, OR drag-and-drop a .json file here.
Must be valid JSON. Do not include contributor_name or contributor_uuid.
placeholder: |
Paste JSON here...
Paste JSON here, or drag-and-drop a .json file...
validations:
required: true
@@ -38,9 +38,10 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
ISSUE_BODY: ${{ github.event.issue.body }}
run: |
python -m src.contributions.approve_submission \
--issue-number ${{ github.event.issue.number }} \
--issue-body "${{ github.event.issue.body }}" \
--issue-body "$ISSUE_BODY" \
--author "${{ steps.author.outputs.username }}" \
--author-id ${{ steps.author.outputs.user_id }}
+29 -8
View File
@@ -81,8 +81,22 @@ jobs:
- name: Create tar of extracted data
run: |
cd data/output
tar -cf extracted_data.tar *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt 2>/dev/null || echo "Some files may not exist"
ls -lah extracted_data.tar || echo "No tar created"
echo "=== Disk space before tar ==="
df -h .
echo "=== Files to tar ==="
ls -lah *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt 2>/dev/null || echo "No files found"
# Create tar with explicit error checking
if ls *-planes-readsb-prod-0.tar_0 1>/dev/null 2>&1; then
tar -cvf extracted_data.tar *-planes-readsb-prod-0.tar_0 icao_manifest_*.txt
echo "=== Tar file created ==="
ls -lah extracted_data.tar
# Verify tar integrity
tar -tf extracted_data.tar > /dev/null && echo "Tar integrity check passed" || { echo "Tar integrity check FAILED"; exit 1; }
else
echo "ERROR: No extracted directories found, cannot create tar"
exit 1
fi
- name: Upload extracted data
uses: actions/upload-artifact@v4
@@ -97,7 +111,7 @@ jobs:
needs: [generate-matrix, adsb-extract]
runs-on: ubuntu-24.04-arm
strategy:
fail-fast: false
fail-fast: true
matrix:
chunk: ${{ fromJson(needs.generate-matrix.outputs.chunks) }}
icao_chunk: [0, 1, 2, 3]
@@ -134,7 +148,12 @@ jobs:
run: |
cd data/output
if [ -f extracted_data.tar ]; then
tar -xf extracted_data.tar
echo "=== Tar file info ==="
ls -lah extracted_data.tar
echo "=== Verifying tar integrity ==="
tar -tf extracted_data.tar > /dev/null || { echo "ERROR: Tar file is corrupted"; exit 1; }
echo "=== Extracting ==="
tar -xvf extracted_data.tar
rm extracted_data.tar
echo "has_data=true" >> "$GITHUB_OUTPUT"
echo "=== Contents of data/output ==="
@@ -188,17 +207,19 @@ jobs:
- name: Debug downloaded files
run: |
echo "=== Disk space before processing ==="
df -h
echo "=== Listing data/output/adsb_chunks/ ==="
find data/output/adsb_chunks/ -type f 2>/dev/null | head -50 || echo "No files found"
echo "=== Looking for parquet files ==="
find . -name "*.parquet" 2>/dev/null | head -20 || echo "No parquet files found"
find data/output/adsb_chunks/ -type f 2>/dev/null | wc -l
echo "=== Total parquet size ==="
du -sh data/output/adsb_chunks/ || echo "No chunks dir"
- name: Combine chunks to CSV
env:
START_DATE: ${{ needs.generate-matrix.outputs.global_start }}
END_DATE: ${{ needs.generate-matrix.outputs.global_end }}
run: |
python -m src.adsb.combine_chunks_to_csv --chunks-dir data/output/adsb_chunks --start-date "$START_DATE" --end-date "$END_DATE" --skip-base
python -m src.adsb.combine_chunks_to_csv --chunks-dir data/output/adsb_chunks --start-date "$START_DATE" --end-date "$END_DATE" --skip-base --stream
ls -lah data/planequery_aircraft/
- name: Upload final artifact
@@ -277,6 +277,15 @@ jobs:
name: community-release
path: artifacts/community
- name: Debug artifact structure
run: |
echo "=== FAA artifacts ==="
find artifacts/faa -type f 2>/dev/null || echo "No files found in artifacts/faa"
echo "=== ADS-B artifacts ==="
find artifacts/adsb -type f 2>/dev/null || echo "No files found in artifacts/adsb"
echo "=== Community artifacts ==="
find artifacts/community -type f 2>/dev/null || echo "No files found in artifacts/community"
- name: Prepare release metadata
id: meta
run: |
@@ -312,6 +321,13 @@ jobs:
echo "zip_basename=$ZIP_BASENAME" >> "$GITHUB_OUTPUT"
echo "name=planequery-aircraft snapshot ($DATE)${BRANCH_SUFFIX}" >> "$GITHUB_OUTPUT"
- name: Delete existing release if exists
run: |
gh release delete "${{ steps.meta.outputs.tag }}" --yes 2>/dev/null || true
git push --delete origin "refs/tags/${{ steps.meta.outputs.tag }}" 2>/dev/null || true
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Create GitHub Release and upload assets
uses: softprops/action-gh-release@v2
with:
@@ -0,0 +1,77 @@
name: Update Community PRs After Merge
on:
push:
branches: [main]
paths:
- 'community/**'
- 'schemas/community_submission.v1.schema.json'
permissions:
contents: write
pull-requests: write
jobs:
update-open-prs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install jsonschema
- name: Find and update open community PRs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get list of open community PRs
prs=$(gh pr list --label community --state open --json number,headRefName --jq '.[] | "\(.number) \(.headRefName)"')
if [ -z "$prs" ]; then
echo "No open community PRs found"
exit 0
fi
echo "$prs" | while read pr_number branch_name; do
echo "Processing PR #$pr_number (branch: $branch_name)"
# Checkout PR branch
git fetch origin "$branch_name"
git checkout "$branch_name"
# Merge main into PR branch
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
if git merge origin/main -m "Merge main to update schema"; then
# Regenerate schema for this PR's submission (adds any new tags)
python -m src.contributions.regenerate_pr_schema || true
# If there are changes, commit and push
if [ -n "$(git status --porcelain schemas/)" ]; then
git add schemas/
git commit -m "Update schema with new tags"
git push origin "$branch_name"
echo " Updated PR #$pr_number with schema changes"
else
git push origin "$branch_name"
echo " Merged main into PR #$pr_number"
fi
else
echo " Merge conflict in PR #$pr_number, adding comment"
gh pr comment "$pr_number" --body $'⚠️ **Merge Conflict**\n\nAnother community submission was merged and this PR has conflicts.\n\nA maintainer may need to:\n1. Close this PR\n2. Remove the `approved` label from the original issue\n3. Re-add the `approved` label to regenerate the PR'
git merge --abort
fi
fi
git checkout main
done
@@ -4,6 +4,9 @@ on:
issues:
types: [opened, edited]
permissions:
issues: write
jobs:
validate:
if: contains(github.event.issue.labels.*.name, 'submission')
@@ -20,11 +23,24 @@ jobs:
- name: Install dependencies
run: pip install jsonschema
- name: Debug issue body
run: |
echo "=== Issue Body ==="
cat << 'ISSUE_BODY_EOF'
${{ github.event.issue.body }}
ISSUE_BODY_EOF
- name: Save issue body to file
run: |
cat << 'ISSUE_BODY_EOF' > /tmp/issue_body.txt
${{ github.event.issue.body }}
ISSUE_BODY_EOF
- name: Validate submission
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
run: |
python -m src.contributions.validate_submission \
--issue-body "${{ github.event.issue.body }}" \
--issue-body-file /tmp/issue_body.txt \
--issue-number ${{ github.event.issue.number }}
@@ -0,0 +1,14 @@
[
{
"contributor_name": "hellohello",
"contributor_uuid": "2981c3ee-8712-5f96-84bf-732eda515a3f",
"creation_timestamp": "2026-02-12T21:02:32.325360+00:00",
"registration_number": "N12345",
"tags": {
"airshows": true,
"cat_friendly": false,
"dog_friendly": true,
"notes": "is a pet carrier"
}
}
]
+1
View File
@@ -3,3 +3,4 @@ pandas==3.0.0
pyarrow==23.0.0
orjson==3.11.7
polars==1.38.1
jsonschema==4.26.0
+62 -16
View File
@@ -3,7 +3,6 @@
"title": "PlaneQuery Aircraft Community Submission (v1)",
"type": "object",
"additionalProperties": false,
"properties": {
"registration_number": {
"type": "string",
@@ -11,13 +10,12 @@
},
"transponder_code_hex": {
"type": "string",
"pattern": "^[0-9A-Fa-f]{6}$"
"pattern": "^[0-9A-F]{6}$"
},
"planequery_airframe_id": {
"type": "string",
"minLength": 1
},
"contributor_uuid": {
"type": "string",
"format": "uuid"
@@ -28,14 +26,24 @@
"maxLength": 150,
"description": "Display name (may be blank)"
},
"creation_timestamp": {
"type": "string",
"format": "date-time",
"description": "Set by the system when the submission is persisted/approved.",
"readOnly": true
},
"start_date": {
"type": "string",
"format": "date",
"pattern": "^\\d{4}-\\d{2}-\\d{2}$",
"description": "Optional start date for when this submission's tags are valid (ISO 8601, e.g., 2025-05-01)."
},
"end_date": {
"type": "string",
"format": "date",
"pattern": "^\\d{4}-\\d{2}-\\d{2}$",
"description": "Optional end date for when this submission's tags are valid (ISO 8601, e.g., 2025-07-03)."
},
"tags": {
"type": "object",
"description": "Additional community-defined tags as key/value pairs (values may be scalar, array, or object).",
@@ -43,38 +51,76 @@
"type": "string",
"pattern": "^[a-z][a-z0-9_]{0,63}$"
},
"additionalProperties": { "$ref": "#/$defs/tagValue" }
"additionalProperties": {
"$ref": "#/$defs/tagValue"
},
"properties": {
"airshows": {
"type": "boolean"
},
"cat_friendly": {
"type": "boolean"
},
"dog_friendly": {
"type": "boolean"
},
"notes": {
"type": "string"
}
}
}
},
"allOf": [
{
"anyOf": [
{ "required": ["registration_number"] },
{ "required": ["transponder_code_hex"] },
{ "required": ["planequery_airframe_id"] }
{
"required": [
"registration_number"
]
},
{
"required": [
"transponder_code_hex"
]
},
{
"required": [
"planequery_airframe_id"
]
}
]
}
],
"$defs": {
"tagScalar": {
"type": ["string", "number", "integer", "boolean", "null"]
"type": [
"string",
"number",
"integer",
"boolean",
"null"
]
},
"tagValue": {
"anyOf": [
{ "$ref": "#/$defs/tagScalar" },
{
"$ref": "#/$defs/tagScalar"
},
{
"type": "array",
"maxItems": 50,
"items": { "$ref": "#/$defs/tagScalar" }
"items": {
"$ref": "#/$defs/tagScalar"
}
},
{
"type": "object",
"maxProperties": 50,
"additionalProperties": { "$ref": "#/$defs/tagScalar" }
"additionalProperties": {
"$ref": "#/$defs/tagScalar"
}
}
]
}
}
}
}
+18 -3
View File
@@ -36,8 +36,13 @@ def get_target_day() -> datetime:
return datetime.utcnow() - timedelta(days=1)
def process_single_chunk(chunk_path: str) -> pl.DataFrame:
"""Load and compress a single chunk parquet file."""
def process_single_chunk(chunk_path: str, delete_after_load: bool = False) -> pl.DataFrame:
"""Load and compress a single chunk parquet file.
Args:
chunk_path: Path to parquet file
delete_after_load: If True, delete the parquet file after loading to free disk space
"""
print(f"Processing {os.path.basename(chunk_path)}... | {get_resource_usage()}")
# Load chunk - only columns we need
@@ -45,6 +50,14 @@ def process_single_chunk(chunk_path: str) -> pl.DataFrame:
df = pl.read_parquet(chunk_path, columns=needed_columns)
print(f" Loaded {len(df)} rows")
# Delete file immediately after loading to free disk space
if delete_after_load:
try:
os.remove(chunk_path)
print(f" Deleted {chunk_path} to free disk space")
except Exception as e:
print(f" Warning: Failed to delete {chunk_path}: {e}")
# Compress to aircraft records (one per ICAO) using shared function
compressed = compress_multi_icao_df(df, verbose=True)
print(f" Compressed to {len(compressed)} aircraft records")
@@ -156,6 +169,7 @@ def main():
parser.add_argument("--chunks-dir", type=str, default=DEFAULT_CHUNK_DIR, help="Directory containing chunk parquet files")
parser.add_argument("--skip-base", action="store_true", help="Skip downloading and merging base release")
parser.add_argument("--keep-chunks", action="store_true", help="Keep chunk files after merging")
parser.add_argument("--stream", action="store_true", help="Delete parquet files immediately after loading to save disk space")
args = parser.parse_args()
# Determine output ID and filename based on mode
@@ -190,9 +204,10 @@ def main():
print(f"Found {len(chunk_files)} chunk files")
# Process each chunk separately to save memory
# With --stream, delete parquet files immediately after loading to save disk space
compressed_chunks = []
for chunk_path in chunk_files:
compressed = process_single_chunk(chunk_path)
compressed = process_single_chunk(chunk_path, delete_after_load=args.stream)
compressed_chunks.append(compressed)
gc.collect()
+73 -13
View File
@@ -21,12 +21,14 @@ import urllib.request
import urllib.error
from datetime import datetime, timezone
from .schema import extract_json_from_issue_body, extract_contributor_name_from_issue_body, parse_and_validate
from .schema import extract_json_from_issue_body, extract_contributor_name_from_issue_body, parse_and_validate, load_schema, SCHEMAS_DIR
from .contributor import (
generate_contributor_uuid,
generate_submission_filename,
compute_content_hash,
)
from .update_schema import generate_updated_schema, check_for_new_tags, get_existing_tag_definitions
from .read_community_data import build_tag_type_registry
def github_api_request(
@@ -54,7 +56,11 @@ def github_api_request(
try:
with urllib.request.urlopen(req) as response:
return json.loads(response.read())
response_body = response.read()
# DELETE requests return empty body (204 No Content)
if not response_body:
return {}
return json.loads(response_body)
except urllib.error.HTTPError as e:
error_body = e.read().decode() if e.fp else ""
print(f"GitHub API error: {e.code} {e.reason}: {error_body}", file=sys.stderr)
@@ -94,14 +100,30 @@ def create_branch(branch_name: str, sha: str) -> None:
raise
def get_file_sha(path: str, branch: str) -> str | None:
"""Get the SHA of an existing file, or None if it doesn't exist."""
try:
response = github_api_request("GET", f"/contents/{path}?ref={branch}")
return response.get("sha")
except Exception:
return None
def create_or_update_file(path: str, content: str, message: str, branch: str) -> None:
"""Create or update a file in the repository."""
content_b64 = base64.b64encode(content.encode()).decode()
github_api_request("PUT", f"/contents/{path}", {
payload = {
"message": message,
"content": content_b64,
"branch": branch,
})
}
# If file exists, we need to include its SHA to update it
sha = get_file_sha(path, branch)
if sha:
payload["sha"] = sha
github_api_request("PUT", f"/contents/{path}", payload)
def create_pull_request(title: str, head: str, base: str, body: str) -> dict:
@@ -144,21 +166,19 @@ def process_submission(
return False
data, errors = parse_and_validate(json_str)
if errors:
error_list = "\n".join(f"- {e}" for e in errors)
if errors or data is None:
error_list = "\n".join(f"- {e}" for e in errors) if errors else "Unknown error"
add_issue_comment(issue_number, f"❌ **Validation Failed**\n\n{error_list}")
return False
# Normalize to list
submissions = data if isinstance(data, list) else [data]
submissions: list[dict] = data if isinstance(data, list) else [data]
# Generate contributor UUID from GitHub ID
contributor_uuid = generate_contributor_uuid(author_id)
# Extract contributor name from issue form (or default to GitHub username)
# Extract contributor name from issue form (None means user opted out of attribution)
contributor_name = extract_contributor_name_from_issue_body(issue_body)
if not contributor_name:
contributor_name = f"@{author_username}"
# Add metadata to each submission
now = datetime.now(timezone.utc)
@@ -167,14 +187,15 @@ def process_submission(
for submission in submissions:
submission["contributor_uuid"] = contributor_uuid
submission["contributor_name"] = contributor_name
if contributor_name:
submission["contributor_name"] = contributor_name
submission["creation_timestamp"] = timestamp_str
# Generate unique filename
content_json = json.dumps(submissions, indent=2, sort_keys=True)
content_hash = compute_content_hash(content_json)
filename = generate_submission_filename(author_username, date_str, content_hash)
file_path = f"community/{filename}"
file_path = f"community/{date_str}/{filename}"
# Create branch
branch_name = f"community-submission-{issue_number}"
@@ -185,14 +206,53 @@ def process_submission(
commit_message = f"Add community submission from @{author_username} (closes #{issue_number})"
create_or_update_file(file_path, content_json, commit_message, branch_name)
# Update schema with any new tags (modifies v1 in place)
schema_updated = False
new_tags = []
try:
# Build tag registry from new submissions
tag_registry = build_tag_type_registry(submissions)
# Get current schema and merge existing tags
current_schema = load_schema()
existing_tags = get_existing_tag_definitions(current_schema)
# Merge existing tags into registry
for tag_name, tag_def in existing_tags.items():
if tag_name not in tag_registry:
tag_type = tag_def.get("type", "string")
tag_registry[tag_name] = tag_type
# Check for new tags
new_tags = check_for_new_tags(tag_registry, current_schema)
if new_tags:
# Generate updated schema
updated_schema = generate_updated_schema(current_schema, tag_registry)
schema_json = json.dumps(updated_schema, indent=2) + "\n"
create_or_update_file(
"schemas/community_submission.v1.schema.json",
schema_json,
f"Update schema with new tags: {', '.join(new_tags)}",
branch_name
)
schema_updated = True
except Exception as e:
print(f"Warning: Could not update schema: {e}", file=sys.stderr)
# Create PR
schema_note = ""
if schema_updated:
schema_note = f"\n**Schema Updated:** Added new tags: `{', '.join(new_tags)}`\n"
pr_body = f"""## Community Submission
Adds {len(submissions)} submission(s) from @{author_username}.
**File:** `{file_path}`
**Contributor UUID:** `{contributor_uuid}`
{schema_note}
Closes #{issue_number}
---
+48 -1
View File
@@ -30,7 +30,8 @@ def read_all_submissions(community_dir: Path | None = None) -> list[dict]:
all_submissions = []
for json_file in sorted(community_dir.glob("*.json")):
# Search both root directory and date subdirectories (e.g., 2026-02-12/)
for json_file in sorted(community_dir.glob("**/*.json")):
try:
with open(json_file) as f:
data = json.load(f)
@@ -50,6 +51,52 @@ def read_all_submissions(community_dir: Path | None = None) -> list[dict]:
return all_submissions
def get_python_type_name(value) -> str:
"""Get a normalized type name for a value."""
if value is None:
return "null"
if isinstance(value, bool):
return "boolean"
if isinstance(value, int):
return "integer"
if isinstance(value, float):
return "number"
if isinstance(value, str):
return "string"
if isinstance(value, list):
return "array"
if isinstance(value, dict):
return "object"
return type(value).__name__
def build_tag_type_registry(submissions: list[dict]) -> dict[str, str]:
"""
Build a registry of tag names to their expected types from existing submissions.
Args:
submissions: List of existing submission dictionaries
Returns:
Dict mapping tag name to expected type (e.g., {"internet": "string", "year_built": "integer"})
"""
tag_types = {}
for submission in submissions:
tags = submission.get("tags", {})
if not isinstance(tags, dict):
continue
for key, value in tags.items():
inferred_type = get_python_type_name(value)
if key not in tag_types:
tag_types[key] = inferred_type
# If there's a conflict, keep the first type (it's already in use)
return tag_types
def group_by_identifier(submissions: list[dict]) -> dict[str, list[dict]]:
"""
Group submissions by their identifier (registration, transponder, or airframe ID).
+66
View File
@@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""
Regenerate schema for a PR branch after main has been merged in.
This script looks at the submission files in this branch and updates
the schema if new tags were introduced.
Usage: python -m src.contributions.regenerate_pr_schema
"""
import json
import sys
from pathlib import Path
# Add parent to path for imports when running as script
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from src.contributions.read_community_data import read_all_submissions, build_tag_type_registry
from src.contributions.update_schema import (
get_existing_tag_definitions,
check_for_new_tags,
generate_updated_schema,
)
from src.contributions.schema import load_schema, SCHEMAS_DIR
def main():
"""Main entry point."""
# Load current schema
current_schema = load_schema()
# Get existing tag definitions from schema
existing_tags = get_existing_tag_definitions(current_schema)
# Read all submissions (including ones from this PR branch)
submissions = read_all_submissions()
if not submissions:
print("No submissions found")
return
# Build tag registry from all submissions
tag_registry = build_tag_type_registry(submissions)
# Check for new tags not in the current schema
new_tags = check_for_new_tags(tag_registry, current_schema)
if new_tags:
print(f"Found new tags: {new_tags}")
print("Updating schema...")
# Generate updated schema
updated_schema = generate_updated_schema(current_schema, tag_registry)
# Write updated schema (in place)
schema_path = SCHEMAS_DIR / "community_submission.v1.schema.json"
with open(schema_path, 'w') as f:
json.dump(updated_schema, f, indent=2)
f.write("\n")
print(f"Updated {schema_path}")
else:
print("No new tags found, schema is up to date")
if __name__ == "__main__":
main()
+116 -8
View File
@@ -10,12 +10,59 @@ except ImportError:
Draft202012Validator = None
SCHEMA_PATH = Path(__file__).parent.parent.parent / "schemas" / "community_submission.v1.schema.json"
SCHEMAS_DIR = Path(__file__).parent.parent.parent / "schemas"
# For backwards compatibility
SCHEMA_PATH = SCHEMAS_DIR / "community_submission.v1.schema.json"
def load_schema() -> dict:
"""Load the community submission schema."""
with open(SCHEMA_PATH) as f:
def get_latest_schema_version() -> int:
"""
Find the latest schema version number.
Returns:
Latest version number (e.g., 1, 2, 3)
"""
import re
pattern = re.compile(r"community_submission\.v(\d+)\.schema\.json$")
max_version = 0
for path in SCHEMAS_DIR.glob("community_submission.v*.schema.json"):
match = pattern.search(path.name)
if match:
version = int(match.group(1))
max_version = max(max_version, version)
return max_version
def get_schema_path(version: int | None = None) -> Path:
"""
Get path to a specific schema version, or latest if version is None.
Args:
version: Schema version number, or None for latest
Returns:
Path to schema file
"""
if version is None:
version = get_latest_schema_version()
return SCHEMAS_DIR / f"community_submission.v{version}.schema.json"
def load_schema(version: int | None = None) -> dict:
"""
Load the community submission schema.
Args:
version: Schema version to load. If None, loads the latest version.
Returns:
Schema dict
"""
schema_path = get_schema_path(version)
with open(schema_path) as f:
return json.load(f)
@@ -50,11 +97,36 @@ def validate_submission(data: dict | list, schema: dict | None = None) -> list[s
return errors
def download_github_attachment(url: str) -> str | None:
"""
Download content from a GitHub attachment URL.
Args:
url: GitHub attachment URL (e.g., https://github.com/user-attachments/files/...)
Returns:
File content as string, or None if download failed
"""
import urllib.request
import urllib.error
try:
req = urllib.request.Request(url, headers={"User-Agent": "PlaneQuery-Bot"})
with urllib.request.urlopen(req, timeout=30) as response:
return response.read().decode("utf-8")
except (urllib.error.URLError, urllib.error.HTTPError, UnicodeDecodeError) as e:
print(f"Failed to download attachment from {url}: {e}")
return None
def extract_json_from_issue_body(body: str) -> str | None:
"""
Extract JSON from GitHub issue body.
Looks for JSON in the 'Submission JSON' section wrapped in code blocks.
Looks for JSON in the 'Submission JSON' section, either:
- A GitHub file attachment URL (drag-and-drop .json file)
- Wrapped in code blocks (```json ... ``` or ``` ... ```)
- Or raw JSON after the header
Args:
body: The issue body text
@@ -62,13 +134,49 @@ def extract_json_from_issue_body(body: str) -> str | None:
Returns:
Extracted JSON string or None if not found
"""
# Match JSON in "### Submission JSON" section
pattern = r"### Submission JSON\s*\n\s*```(?:json)?\s*\n([\s\S]*?)\n\s*```"
match = re.search(pattern, body)
# Try: GitHub attachment URL in the Submission JSON section
# Format: [filename.json](https://github.com/user-attachments/files/...)
# Or just the raw URL
pattern_attachment = r"### Submission JSON\s*\n[\s\S]*?(https://github\.com/(?:user-attachments/files|.*?/files)/[^\s\)\]]+\.json)"
match = re.search(pattern_attachment, body)
if match:
url = match.group(1)
content = download_github_attachment(url)
if content:
return content.strip()
# Also check for GitHub user-attachments URL anywhere in submission section
pattern_attachment_alt = r"\[.*?\.json\]\((https://github\.com/[^\)]+)\)"
match = re.search(pattern_attachment_alt, body)
if match:
url = match.group(1)
if ".json" in url or "user-attachments" in url:
content = download_github_attachment(url)
if content:
return content.strip()
# Try: JSON in code blocks after "### Submission JSON"
pattern_codeblock = r"### Submission JSON\s*\n\s*```(?:json)?\s*\n([\s\S]*?)\n\s*```"
match = re.search(pattern_codeblock, body)
if match:
return match.group(1).strip()
# Try: Raw JSON after "### Submission JSON" until next section or end
pattern_raw = r"### Submission JSON\s*\n\s*([\[{][\s\S]*?[\]}])(?=\n###|\n\n###|$)"
match = re.search(pattern_raw, body)
if match:
return match.group(1).strip()
# Try: Any JSON object/array in the body (fallback)
pattern_any = r"([\[{][\s\S]*?[\]}])"
for match in re.finditer(pattern_any, body):
candidate = match.group(1).strip()
# Validate it looks like JSON
if candidate.startswith('{') and candidate.endswith('}'):
return candidate
if candidate.startswith('[') and candidate.endswith(']'):
return candidate
return None
+154
View File
@@ -0,0 +1,154 @@
#!/usr/bin/env python3
"""
Update the schema with tag type definitions from existing submissions.
This script reads all community submissions and generates a new schema version
that includes explicit type definitions for all known tags.
When new tags are introduced, a new schema version is created (e.g., v1 -> v2 -> v3).
Usage:
python -m src.contributions.update_schema
python -m src.contributions.update_schema --check # Check if update needed
"""
import argparse
import json
import sys
from pathlib import Path
from .read_community_data import read_all_submissions, build_tag_type_registry
from .schema import SCHEMAS_DIR, get_latest_schema_version, get_schema_path, load_schema
def get_existing_tag_definitions(schema: dict) -> dict[str, dict]:
"""Extract existing tag property definitions from schema."""
tags_props = schema.get("properties", {}).get("tags", {}).get("properties", {})
return tags_props
def type_name_to_json_schema(type_name: str) -> dict:
"""Convert a type name to a JSON Schema type definition."""
type_map = {
"string": {"type": "string"},
"integer": {"type": "integer"},
"number": {"type": "number"},
"boolean": {"type": "boolean"},
"null": {"type": "null"},
"array": {"type": "array", "items": {"$ref": "#/$defs/tagScalar"}},
"object": {"type": "object", "additionalProperties": {"$ref": "#/$defs/tagScalar"}},
}
return type_map.get(type_name, {"$ref": "#/$defs/tagValue"})
def generate_updated_schema(base_schema: dict, tag_registry: dict[str, str]) -> dict:
"""
Generate an updated schema with explicit tag definitions.
Args:
base_schema: The current schema to update
tag_registry: Dict mapping tag name to type name
Returns:
Updated schema dict
"""
schema = json.loads(json.dumps(base_schema)) # Deep copy
# Build tag properties with explicit types
tag_properties = {}
for tag_name, type_name in sorted(tag_registry.items()):
tag_properties[tag_name] = type_name_to_json_schema(type_name)
# Only add/update the properties key within tags, preserve everything else
if "properties" in schema and "tags" in schema["properties"]:
schema["properties"]["tags"]["properties"] = tag_properties
return schema
def check_for_new_tags(tag_registry: dict[str, str], current_schema: dict) -> list[str]:
"""
Check which tags in the registry are not yet defined in the schema.
Returns:
List of new tag names
"""
existing_tags = get_existing_tag_definitions(current_schema)
return [tag for tag in tag_registry if tag not in existing_tags]
def update_schema_file(
tag_registry: dict[str, str],
check_only: bool = False
) -> tuple[bool, list[str]]:
"""
Update the v1 schema file with new tag definitions.
Args:
tag_registry: Dict mapping tag name to type name
check_only: If True, only check if update is needed without writing
Returns:
Tuple of (was_updated, list_of_new_tags)
"""
current_schema = load_schema()
# Find new tags
new_tags = check_for_new_tags(tag_registry, current_schema)
if not new_tags:
return False, []
if check_only:
return True, new_tags
# Generate and write updated schema (in place)
updated_schema = generate_updated_schema(current_schema, tag_registry)
schema_path = get_schema_path()
with open(schema_path, "w") as f:
json.dump(updated_schema, f, indent=2)
f.write("\n")
return True, new_tags
def update_schema_from_submissions(check_only: bool = False) -> tuple[bool, list[str]]:
"""
Read all submissions and update the schema if needed.
Args:
check_only: If True, only check if update is needed without writing
Returns:
Tuple of (was_updated, list_of_new_tags)
"""
submissions = read_all_submissions()
tag_registry = build_tag_type_registry(submissions)
return update_schema_file(tag_registry, check_only)
def main():
parser = argparse.ArgumentParser(description="Update schema with tag definitions")
parser.add_argument("--check", action="store_true", help="Check if update needed without writing")
args = parser.parse_args()
was_updated, new_tags = update_schema_from_submissions(check_only=args.check)
if args.check:
if was_updated:
print(f"Schema update needed. New tags: {', '.join(new_tags)}")
sys.exit(1)
else:
print("Schema is up to date")
sys.exit(0)
else:
if was_updated:
print(f"Updated {get_schema_path()}")
print(f"Added tags: {', '.join(new_tags)}")
else:
print("No update needed")
if __name__ == "__main__":
main()
+78
View File
@@ -7,6 +7,7 @@ submissions when issues are opened or edited.
Usage:
python -m src.contributions.validate_submission --issue-body "..."
python -m src.contributions.validate_submission --issue-body-file /path/to/body.txt
python -m src.contributions.validate_submission --file submission.json
echo '{"registration_number": "N12345"}' | python -m src.contributions.validate_submission --stdin
@@ -23,6 +24,7 @@ import urllib.request
import urllib.error
from .schema import extract_json_from_issue_body, parse_and_validate, load_schema
from .read_community_data import read_all_submissions, build_tag_type_registry, get_python_type_name
def github_api_request(method: str, endpoint: str, data: dict | None = None) -> dict:
@@ -65,6 +67,40 @@ def remove_issue_label(issue_number: int, label: str) -> None:
pass # Label might not exist
def validate_tag_consistency(data: dict | list, tag_registry: dict[str, str]) -> list[str]:
"""
Check that tag types in new submissions match existing tag types.
Args:
data: Single submission dict or list of submissions
tag_registry: Dict mapping tag name to expected type
Returns:
List of error messages. Empty list means validation passed.
"""
errors = []
submissions = data if isinstance(data, list) else [data]
for i, submission in enumerate(submissions):
prefix = f"[{i}] " if len(submissions) > 1 else ""
tags = submission.get("tags", {})
if not isinstance(tags, dict):
continue
for key, value in tags.items():
actual_type = get_python_type_name(value)
if key in tag_registry:
expected_type = tag_registry[key]
if actual_type != expected_type:
errors.append(
f"{prefix}tags.{key}: expected type '{expected_type}', got '{actual_type}'"
)
return errors
def validate_and_report(json_str: str, issue_number: int | None = None) -> bool:
"""
Validate JSON and optionally report to GitHub issue.
@@ -90,6 +126,33 @@ def validate_and_report(json_str: str, issue_number: int | None = None) -> bool:
return False
# Check tag type consistency against existing submissions
if data is not None:
try:
existing_submissions = read_all_submissions()
tag_registry = build_tag_type_registry(existing_submissions)
tag_errors = validate_tag_consistency(data, tag_registry)
if tag_errors:
error_list = "\n".join(f"- {e}" for e in tag_errors)
message = (
f"❌ **Tag Type Mismatch**\n\n"
f"Your submission uses tags with types that don't match existing submissions:\n\n"
f"{error_list}\n\n"
f"Please use the same type as existing tags, or use a different tag name."
)
print(message, file=sys.stderr)
if issue_number:
add_issue_comment(issue_number, message)
remove_issue_label(issue_number, "validated")
return False
except Exception as e:
# Don't fail validation if we can't read existing submissions
print(f"Warning: Could not check tag consistency: {e}", file=sys.stderr)
count = len(data) if isinstance(data, list) else 1
message = f"✅ **Validation Passed**\n\n{count} submission(s) validated successfully against the schema.\n\nA maintainer can approve this submission by adding the `approved` label."
@@ -106,6 +169,7 @@ def main():
parser = argparse.ArgumentParser(description="Validate community submission JSON")
source_group = parser.add_mutually_exclusive_group(required=True)
source_group.add_argument("--issue-body", help="Issue body text containing JSON")
source_group.add_argument("--issue-body-file", help="File containing issue body text")
source_group.add_argument("--file", help="JSON file to validate")
source_group.add_argument("--stdin", action="store_true", help="Read JSON from stdin")
@@ -125,6 +189,20 @@ def main():
"Please ensure your JSON is in the 'Submission JSON' field wrapped in code blocks."
)
sys.exit(1)
elif args.issue_body_file:
with open(args.issue_body_file) as f:
issue_body = f.read()
json_str = extract_json_from_issue_body(issue_body)
if not json_str:
print("❌ Could not extract JSON from issue body", file=sys.stderr)
print(f"Issue body:\n{issue_body}", file=sys.stderr)
if args.issue_number:
add_issue_comment(
args.issue_number,
"❌ **Validation Failed**\n\nCould not extract JSON from submission. "
"Please ensure your JSON is in the 'Submission JSON' field."
)
sys.exit(1)
elif args.file:
with open(args.file) as f:
json_str = f.read()