feat: Add code refactoring skill based on Martin Fowler's methodology

Add comprehensive Agent Skill for systematic code refactoring with:

- Main SKILL.md with 6-phase workflow (research, test coverage, smell
  identification, planning, implementation, review)
- Code smells reference catalog (22+ smells with examples)
- Refactoring techniques catalog (25+ techniques with mechanics)
- Refactoring plan template for tracking progress
- Python scripts for complexity analysis and smell detection
- User consultation points at each decision phase
- Safety guidelines emphasizing tests and small steps
This commit is contained in:
Luong NGUYEN
2026-01-15 14:28:09 +01:00
parent 5c54d753a1
commit 245272f6d4
6 changed files with 3658 additions and 0 deletions
+426
View File
@@ -0,0 +1,426 @@
---
name: code-refactor
description: Systematic code refactoring based on Martin Fowler's methodology. Use when users ask to refactor code, improve code structure, reduce technical debt, clean up legacy code, eliminate code smells, or improve code maintainability. This skill guides through a phased approach with research, planning, and safe incremental implementation.
---
# Code Refactoring Skill
A systematic approach to refactoring code based on Martin Fowler's *Refactoring: Improving the Design of Existing Code* (2nd Edition). This skill emphasizes safe, incremental changes backed by tests.
> "Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure." — Martin Fowler
## Core Principles
1. **Behavior Preservation**: External behavior must remain unchanged
2. **Small Steps**: Make tiny, testable changes
3. **Test-Driven**: Tests are the safety net
4. **Continuous**: Refactoring is ongoing, not a one-time event
5. **Collaborative**: User approval required at each phase
## Workflow Overview
```
Phase 1: Research & Analysis
Phase 2: Test Coverage Assessment
Phase 3: Code Smell Identification
Phase 4: Refactoring Plan Creation
Phase 5: Incremental Implementation
Phase 6: Review & Iteration
```
---
## Phase 1: Research & Analysis
### Objectives
- Understand the codebase structure and purpose
- Identify the scope of refactoring
- Gather context about business requirements
### Questions to Ask User
Before starting, clarify:
1. **Scope**: Which files/modules/functions need refactoring?
2. **Goals**: What problems are you trying to solve? (readability, performance, maintainability)
3. **Constraints**: Are there any areas that should NOT be changed?
4. **Timeline pressure**: Is this blocking other work?
5. **Test status**: Do tests exist? Are they passing?
### Actions
- [ ] Read and understand the target code
- [ ] Identify dependencies and integrations
- [ ] Document current architecture
- [ ] Note any existing technical debt markers (TODOs, FIXMEs)
### Output
Present findings to user:
- Code structure summary
- Identified problem areas
- Initial recommendations
- **Request approval to proceed**
---
## Phase 2: Test Coverage Assessment
### Why Tests Matter
> "Refactoring without tests is like driving without a seatbelt." — Martin Fowler
Tests are the **key enabler** of safe refactoring. Without them, you risk introducing bugs.
### Assessment Steps
1. **Check for existing tests**
```bash
# Look for test files
find . -name "*test*" -o -name "*spec*" | head -20
```
2. **Run existing tests**
```bash
# JavaScript/TypeScript
npm test
# Python
pytest -v
# Java
mvn test
```
3. **Check coverage (if available)**
```bash
# JavaScript
npm run test:coverage
# Python
pytest --cov=.
```
### Decision Point: Ask User
**If tests exist and pass:**
- Proceed to Phase 3
**If tests are missing or incomplete:**
Present options:
1. Write tests first (recommended)
2. Add tests incrementally during refactoring
3. Proceed without tests (risky - requires user acknowledgment)
**If tests are failing:**
- STOP. Fix failing tests before refactoring
- Ask user: Should we fix tests first?
### Test Writing Guidelines (if needed)
For each function being refactored, ensure tests cover:
- Happy path (normal operation)
- Edge cases (empty inputs, null, boundaries)
- Error scenarios (invalid inputs, exceptions)
Use the "red-green-refactor" cycle:
1. Write failing test (red)
2. Make it pass (green)
3. Refactor
---
## Phase 3: Code Smell Identification
### What Are Code Smells?
Symptoms of deeper problems in code. They're not bugs, but indicators that the code could be improved.
### Common Code Smells to Check
See [references/code-smells.md](references/code-smells.md) for the complete catalog.
#### Quick Reference
| Smell | Signs | Impact |
|-------|-------|--------|
| **Long Method** | Methods > 30-50 lines | Hard to understand, test, maintain |
| **Duplicated Code** | Same logic in multiple places | Bug fixes needed in multiple places |
| **Large Class** | Class with too many responsibilities | Violates Single Responsibility |
| **Feature Envy** | Method uses another class's data more | Poor encapsulation |
| **Primitive Obsession** | Overuse of primitives instead of objects | Missing domain concepts |
| **Long Parameter List** | Methods with 4+ parameters | Hard to call correctly |
| **Data Clumps** | Same data items appearing together | Missing abstraction |
| **Switch Statements** | Complex switch/if-else chains | Hard to extend |
| **Speculative Generality** | Code "just in case" | Unnecessary complexity |
| **Dead Code** | Unused code | Confusion, maintenance burden |
### Analysis Steps
1. **Automated Analysis** (if scripts available)
```bash
python scripts/detect-smells.py <file>
```
2. **Manual Review**
- Walk through code systematically
- Note each smell with location and severity
- Categorize by impact (Critical/High/Medium/Low)
3. **Prioritization**
Focus on smells that:
- Block current development
- Cause bugs or confusion
- Affect most-changed code paths
### Output: Smell Report
Present to user:
- List of identified smells with locations
- Severity assessment for each
- Recommended priority order
- **Request approval on priorities**
---
## Phase 4: Refactoring Plan Creation
### Selecting Refactorings
For each smell, select an appropriate refactoring from the catalog.
See [references/refactoring-catalog.md](references/refactoring-catalog.md) for the complete list.
#### Smell-to-Refactoring Mapping
| Code Smell | Recommended Refactoring(s) |
|------------|---------------------------|
| Long Method | Extract Method, Replace Temp with Query |
| Duplicated Code | Extract Method, Pull Up Method, Form Template Method |
| Large Class | Extract Class, Extract Subclass |
| Feature Envy | Move Method, Move Field |
| Primitive Obsession | Replace Primitive with Object, Replace Type Code with Class |
| Long Parameter List | Introduce Parameter Object, Preserve Whole Object |
| Data Clumps | Extract Class, Introduce Parameter Object |
| Switch Statements | Replace Conditional with Polymorphism |
| Speculative Generality | Collapse Hierarchy, Inline Class, Remove Dead Code |
| Dead Code | Remove Dead Code |
### Plan Structure
Use the template at [templates/refactoring-plan.md](templates/refactoring-plan.md).
For each refactoring:
1. **Target**: What code will change
2. **Smell**: What problem it addresses
3. **Refactoring**: Which technique to apply
4. **Steps**: Detailed micro-steps
5. **Risks**: What could go wrong
6. **Rollback**: How to undo if needed
### Phased Approach
**CRITICAL**: Introduce refactoring gradually in phases.
**Phase A: Quick Wins** (Low risk, high value)
- Rename variables for clarity
- Extract obvious duplicate code
- Remove dead code
**Phase B: Structural Improvements** (Medium risk)
- Extract methods from long functions
- Introduce parameter objects
- Move methods to appropriate classes
**Phase C: Architectural Changes** (Higher risk)
- Replace conditionals with polymorphism
- Extract classes
- Introduce design patterns
### Decision Point: Present Plan to User
Before implementation:
- Show complete refactoring plan
- Explain each phase and its risks
- Get explicit approval for each phase
- **Ask**: "Should I proceed with Phase A?"
---
## Phase 5: Incremental Implementation
### The Golden Rule
> "Change → Test → Green? → Commit → Next step"
### Implementation Rhythm
For each refactoring step:
1. **Pre-check**
- Tests are passing (green)
- Code compiles
2. **Make ONE small change**
- Follow the mechanics from the catalog
- Keep changes minimal
3. **Verify**
- Run tests immediately
- Check for compilation errors
4. **If tests pass (green)**
- Commit with descriptive message
- Move to next step
5. **If tests fail (red)**
- STOP immediately
- Undo the change
- Analyze what went wrong
- Ask user if unclear
### Commit Strategy
Each commit should be:
- **Atomic**: One logical change
- **Reversible**: Easy to revert
- **Descriptive**: Clear commit message
Example commit messages:
```
refactor: Extract calculateTotal() from processOrder()
refactor: Rename 'x' to 'customerCount' for clarity
refactor: Remove unused validateOldFormat() method
```
### Progress Reporting
After each sub-phase, report to user:
- Changes made
- Tests still passing?
- Any issues encountered
- **Ask**: "Continue with next batch?"
---
## Phase 6: Review & Iteration
### Post-Refactoring Checklist
- [ ] All tests passing
- [ ] No new warnings/errors
- [ ] Code compiles successfully
- [ ] Behavior unchanged (manual verification)
- [ ] Documentation updated if needed
- [ ] Commit history is clean
### Metrics Comparison
Run complexity analysis before and after:
```bash
python scripts/analyze-complexity.py <file>
```
Present improvements:
- Lines of code change
- Cyclomatic complexity change
- Maintainability index change
### User Review
Present final results:
- Summary of all changes
- Before/after code comparison
- Metrics improvements
- Remaining technical debt
- **Ask**: "Are you satisfied with these changes?"
### Next Steps
Discuss with user:
- Additional smells to address?
- Schedule follow-up refactoring?
- Apply similar changes elsewhere?
---
## Important Guidelines
### When to STOP and Ask
Always pause and consult user when:
- Unsure about business logic
- Change might affect external APIs
- Test coverage is inadequate
- Significant architectural decision needed
- Risk level increases
- You encounter unexpected complexity
### Safety Rules
1. **Never refactor without tests** (unless user explicitly acknowledges risk)
2. **Never make big changes** - break into tiny steps
3. **Never skip the test run** after each change
4. **Never continue if tests fail** - fix or rollback first
5. **Never assume** - when in doubt, ask
### What NOT to Do
- Don't combine refactoring with feature additions
- Don't refactor during production emergencies
- Don't refactor code you don't understand
- Don't over-engineer - keep it simple
- Don't refactor everything at once
---
## Quick Start Example
### Scenario: Long Method with Duplication
**Before:**
```javascript
function processOrder(order) {
// 150 lines of code with:
// - Duplicated validation logic
// - Inline calculations
// - Mixed responsibilities
}
```
**Refactoring Steps:**
1. **Ensure tests exist** for processOrder()
2. **Extract** validation into validateOrder()
3. **Test** - should pass
4. **Extract** calculation into calculateOrderTotal()
5. **Test** - should pass
6. **Extract** notification into notifyCustomer()
7. **Test** - should pass
8. **Review** - processOrder() now orchestrates 3 clear functions
**After:**
```javascript
function processOrder(order) {
validateOrder(order);
const total = calculateOrderTotal(order);
notifyCustomer(order, total);
return { order, total };
}
```
---
## References
- [Code Smells Catalog](references/code-smells.md) - Complete list of code smells
- [Refactoring Catalog](references/refactoring-catalog.md) - Refactoring techniques
- [Refactoring Plan Template](templates/refactoring-plan.md) - Planning template
## Scripts
- `scripts/analyze-complexity.py` - Analyze code complexity metrics
- `scripts/detect-smells.py` - Automated smell detection
## Version History
- v1.0.0 (2025-01-15): Initial release with Fowler methodology, phased approach, user consultation points
@@ -0,0 +1,669 @@
# Code Smells Catalog
A comprehensive reference of code smells based on Martin Fowler's *Refactoring* (2nd Edition). Code smells are symptoms of deeper problems—they indicate that something might be wrong with your code's design.
> "A code smell is a surface indication that usually corresponds to a deeper problem in the system." — Martin Fowler
---
## Bloaters
Code smells representing something that has grown too large to be handled effectively.
### Long Method
**Signs:**
- Method exceeds 30-50 lines
- Need to scroll to see the whole method
- Multiple levels of nesting
- Comments explaining what sections do
**Why it's bad:**
- Hard to understand
- Difficult to test in isolation
- Changes have unintended consequences
- Duplicate logic hides inside
**Refactorings:**
- Extract Method
- Replace Temp with Query
- Introduce Parameter Object
- Replace Method with Method Object
- Decompose Conditional
**Example (Before):**
```javascript
function processOrder(order) {
// Validate order (20 lines)
if (!order.items) throw new Error('No items');
if (order.items.length === 0) throw new Error('Empty order');
// ... more validation
// Calculate totals (30 lines)
let subtotal = 0;
for (const item of order.items) {
subtotal += item.price * item.quantity;
}
// ... tax, shipping, discounts
// Send notifications (20 lines)
// ... email logic
}
```
**Example (After):**
```javascript
function processOrder(order) {
validateOrder(order);
const totals = calculateOrderTotals(order);
sendOrderNotifications(order, totals);
return { order, totals };
}
```
---
### Large Class
**Signs:**
- Class has many instance variables (>7-10)
- Class has many methods (>15-20)
- Class name is vague (Manager, Handler, Processor)
- Methods don't use all instance variables
**Why it's bad:**
- Violates Single Responsibility Principle
- Hard to test
- Changes ripple through unrelated features
- Difficult to reuse parts
**Refactorings:**
- Extract Class
- Extract Subclass
- Extract Interface
**Detection:**
```
Lines of code > 300
Number of methods > 15
Number of fields > 10
```
---
### Primitive Obsession
**Signs:**
- Using primitives for domain concepts (string for email, int for money)
- Arrays of primitives instead of objects
- String constants for type codes
- Magic numbers/strings
**Why it's bad:**
- No validation at type level
- Logic scattered across codebase
- Easy to pass wrong values
- Missing domain concepts
**Refactorings:**
- Replace Primitive with Object
- Replace Type Code with Class
- Replace Type Code with Subclasses
- Replace Type Code with State/Strategy
**Example (Before):**
```javascript
const user = {
email: 'john@example.com', // Just a string
phone: '1234567890', // Just a string
status: 'active', // Magic string
balance: 10050 // Cents as integer
};
```
**Example (After):**
```javascript
const user = {
email: new Email('john@example.com'),
phone: new PhoneNumber('1234567890'),
status: UserStatus.ACTIVE,
balance: Money.cents(10050)
};
```
---
### Long Parameter List
**Signs:**
- Methods with 4+ parameters
- Parameters that always appear together
- Boolean flags changing method behavior
- Null/undefined passed frequently
**Why it's bad:**
- Hard to call correctly
- Parameter order confusion
- Indicates method doing too much
- Hard to add new parameters
**Refactorings:**
- Introduce Parameter Object
- Preserve Whole Object
- Replace Parameter with Method Call
- Remove Flag Argument
**Example (Before):**
```javascript
function createUser(firstName, lastName, email, phone,
street, city, state, zip,
isAdmin, isActive, createdBy) {
// ...
}
```
**Example (After):**
```javascript
function createUser(personalInfo, address, options) {
// personalInfo: { firstName, lastName, email, phone }
// address: { street, city, state, zip }
// options: { isAdmin, isActive, createdBy }
}
```
---
### Data Clumps
**Signs:**
- Same 3+ fields appear together repeatedly
- Parameters that always travel together
- Classes with field subsets belonging together
**Why it's bad:**
- Duplicate handling logic
- Missing abstraction
- Harder to extend
- Indicates hidden class
**Refactorings:**
- Extract Class
- Introduce Parameter Object
- Preserve Whole Object
**Example:**
```javascript
// Data clump: (x, y, z) coordinates
function movePoint(x, y, z, dx, dy, dz) { }
function scalePoint(x, y, z, factor) { }
function distanceBetween(x1, y1, z1, x2, y2, z2) { }
// Extract Point3D class
class Point3D {
constructor(x, y, z) { }
move(delta) { }
scale(factor) { }
distanceTo(other) { }
}
```
---
## Object-Orientation Abusers
Smells indicating incomplete or incorrect use of OOP principles.
### Switch Statements
**Signs:**
- Long switch/case or if/else chains
- Same switch in multiple places
- Switch on type codes
- Adding new cases requires changes everywhere
**Why it's bad:**
- Violates Open/Closed Principle
- Changes ripple to all switch locations
- Hard to extend
- Often indicates missing polymorphism
**Refactorings:**
- Replace Conditional with Polymorphism
- Replace Type Code with Subclasses
- Replace Type Code with State/Strategy
**Example (Before):**
```javascript
function calculatePay(employee) {
switch (employee.type) {
case 'hourly':
return employee.hours * employee.rate;
case 'salaried':
return employee.salary / 12;
case 'commissioned':
return employee.sales * employee.commission;
}
}
```
**Example (After):**
```javascript
class HourlyEmployee {
calculatePay() {
return this.hours * this.rate;
}
}
class SalariedEmployee {
calculatePay() {
return this.salary / 12;
}
}
```
---
### Temporary Field
**Signs:**
- Instance variables only used in some methods
- Fields set conditionally
- Complex initialization for certain cases
**Why it's bad:**
- Confusing—field exists but might be null
- Hard to understand object state
- Indicates conditional logic hiding
**Refactorings:**
- Extract Class
- Introduce Null Object
- Replace Temp Field with Local
---
### Refused Bequest
**Signs:**
- Subclass doesn't use inherited methods/data
- Subclass overrides to do nothing
- Inheritance used for code reuse, not IS-A relationship
**Why it's bad:**
- Wrong abstraction
- Violates Liskov Substitution Principle
- Misleading hierarchy
**Refactorings:**
- Push Down Method/Field
- Replace Subclass with Delegate
- Replace Inheritance with Delegation
---
### Alternative Classes with Different Interfaces
**Signs:**
- Two classes that do similar things
- Different method names for same concept
- Could be used interchangeably
**Why it's bad:**
- Duplicate implementations
- No common interface
- Hard to switch between
**Refactorings:**
- Rename Method
- Move Method
- Extract Superclass
- Extract Interface
---
## Change Preventers
Smells that make changes difficult—changing one thing requires changing many others.
### Divergent Change
**Signs:**
- One class changed for multiple different reasons
- Changes in different areas trigger same class edits
- Class is a "God class"
**Why it's bad:**
- Violates Single Responsibility
- High change frequency
- Merge conflicts
**Refactorings:**
- Extract Class
- Extract Superclass
- Extract Subclass
**Example:**
A `User` class changes for:
- Authentication changes
- Profile changes
- Billing changes
- Notification changes
→ Extract: `AuthService`, `ProfileService`, `BillingService`, `NotificationService`
---
### Shotgun Surgery
**Signs:**
- One change requires edits in many classes
- Small feature needs touching 10+ files
- Changes are scattered, hard to find all
**Why it's bad:**
- Easy to miss a spot
- High coupling
- Changes are error-prone
**Refactorings:**
- Move Method
- Move Field
- Inline Class
**Detection:**
Look for: adding one field requires changes in >5 files.
---
### Parallel Inheritance Hierarchies
**Signs:**
- Creating subclass in one hierarchy requires subclass in another
- Class prefixes match (e.g., `DatabaseOrder`, `DatabaseProduct`)
**Why it's bad:**
- Double the maintenance
- Coupling between hierarchies
- Easy to forget one side
**Refactorings:**
- Move Method
- Move Field
- Eliminate one hierarchy
---
## Dispensables
Something unnecessary that should be removed.
### Comments (Excessive)
**Signs:**
- Comments explaining what code does
- Commented-out code
- TODO/FIXME that linger forever
- Apologies in comments
**Why it's bad:**
- Comments lie (get out of sync)
- Code should be self-documenting
- Dead code causes confusion
**Refactorings:**
- Extract Method (name explains what)
- Rename (clarity without comments)
- Remove commented code
- Introduce Assertion
**Good vs Bad Comments:**
```javascript
// BAD: Explaining what
// Loop through users and check if active
for (const user of users) {
if (user.status === 'active') { }
}
// GOOD: Explaining why
// Active users only - inactive are handled by cleanup job
const activeUsers = users.filter(u => u.isActive);
```
---
### Duplicate Code
**Signs:**
- Same code in multiple places
- Similar code with small variations
- Copy-paste patterns
**Why it's bad:**
- Bug fixes needed in multiple places
- Inconsistency risk
- Bloated codebase
**Refactorings:**
- Extract Method
- Extract Class
- Pull Up Method (in hierarchies)
- Form Template Method
**Detection Rule:**
Any code duplicated 3+ times should be extracted.
---
### Lazy Class
**Signs:**
- Class doesn't do enough to justify existence
- Wrapper with no added value
- Result of over-engineering
**Why it's bad:**
- Maintenance overhead
- Unnecessary indirection
- Complexity without benefit
**Refactorings:**
- Inline Class
- Collapse Hierarchy
---
### Dead Code
**Signs:**
- Unreachable code
- Unused variables/methods/classes
- Commented-out code
- Code behind impossible conditions
**Why it's bad:**
- Confusion
- Maintenance burden
- Slows down understanding
**Refactorings:**
- Remove Dead Code
- Safe Delete
**Detection:**
```bash
# Look for unused exports
# Look for unreferenced functions
# IDE "unused" warnings
```
---
### Speculative Generality
**Signs:**
- Abstract classes with one subclass
- Unused parameters "for future use"
- Methods that only delegate
- "Framework" for one use case
**Why it's bad:**
- Complexity without benefit
- YAGNI (You Ain't Gonna Need It)
- Harder to understand
**Refactorings:**
- Collapse Hierarchy
- Inline Class
- Remove Parameter
- Rename Method
---
## Couplers
Smells that represent excessive coupling between classes.
### Feature Envy
**Signs:**
- Method uses more data from another class than its own
- Many getter calls to another object
- Data and behavior are separated
**Why it's bad:**
- Wrong location for behavior
- Poor encapsulation
- Hard to maintain
**Refactorings:**
- Move Method
- Move Field
- Extract Method (then move)
**Example (Before):**
```javascript
class Order {
getDiscountedPrice(customer) {
// Uses customer data heavily
if (customer.loyaltyYears > 5) {
return this.price * customer.discountRate;
}
return this.price;
}
}
```
**Example (After):**
```javascript
class Customer {
getDiscountedPriceFor(price) {
if (this.loyaltyYears > 5) {
return price * this.discountRate;
}
return price;
}
}
```
---
### Inappropriate Intimacy
**Signs:**
- Classes access each other's private parts
- Bidirectional references
- Subclasses know too much about parents
**Why it's bad:**
- High coupling
- Changes cascade
- Hard to modify one without other
**Refactorings:**
- Move Method
- Move Field
- Change Bidirectional to Unidirectional
- Extract Class
- Hide Delegate
---
### Message Chains
**Signs:**
- Long chains of method calls: `a.getB().getC().getD().getValue()`
- Client depends on navigation structure
- "Train wreck" code
**Why it's bad:**
- Fragile—any change breaks chain
- Violates Law of Demeter
- Coupling to structure
**Refactorings:**
- Hide Delegate
- Extract Method
- Move Method
**Example:**
```javascript
// Bad: Message chain
const managerName = employee.getDepartment().getManager().getName();
// Better: Hide delegation
const managerName = employee.getManagerName();
```
---
### Middle Man
**Signs:**
- Class that only delegates to another
- Half the methods are delegations
- No added value
**Why it's bad:**
- Unnecessary indirection
- Maintenance overhead
- Confusing architecture
**Refactorings:**
- Remove Middle Man
- Inline Method
---
## Smell Severity Guide
| Severity | Description | Action |
|----------|-------------|--------|
| **Critical** | Blocks development, causes bugs | Fix immediately |
| **High** | Significant maintenance burden | Fix in current sprint |
| **Medium** | Noticeable but manageable | Plan for near future |
| **Low** | Minor inconvenience | Fix opportunistically |
---
## Quick Detection Checklist
Use this checklist when scanning code:
- [ ] Any method > 30 lines?
- [ ] Any class > 300 lines?
- [ ] Any method with > 4 parameters?
- [ ] Any duplicated code blocks?
- [ ] Any switch/case on type codes?
- [ ] Any unused code?
- [ ] Any methods using another class's data heavily?
- [ ] Any long chains of method calls?
- [ ] Any comments explaining "what" not "why"?
- [ ] Any primitives that should be objects?
---
## Further Reading
- Fowler, M. (2018). *Refactoring: Improving the Design of Existing Code* (2nd ed.)
- Kerievsky, J. (2004). *Refactoring to Patterns*
- Feathers, M. (2004). *Working Effectively with Legacy Code*
File diff suppressed because it is too large Load Diff
+545
View File
@@ -0,0 +1,545 @@
#!/usr/bin/env python3
"""
Code Complexity Analyzer
Analyzes code complexity metrics for Python, JavaScript, and TypeScript files.
Helps measure the impact of refactoring by comparing before/after metrics.
Usage:
python analyze-complexity.py <file>
python analyze-complexity.py <before_file> <after_file> # Compare mode
python analyze-complexity.py --dir <directory> # Analyze directory
Metrics:
- Cyclomatic Complexity: Decision points in code
- Cognitive Complexity: How hard is it to understand
- Maintainability Index: Overall maintainability score (0-100)
- Lines of Code: Total lines
- Function Count: Number of functions/methods
- Average Function Length: Lines per function
"""
import argparse
import os
import re
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Optional
@dataclass
class FunctionMetrics:
"""Metrics for a single function."""
name: str
start_line: int
end_line: int
lines: int
cyclomatic_complexity: int
cognitive_complexity: int
parameter_count: int
@dataclass
class FileMetrics:
"""Metrics for a file."""
filename: str
lines_of_code: int
blank_lines: int
comment_lines: int
function_count: int
class_count: int
cyclomatic_complexity: int
cognitive_complexity: int
maintainability_index: float
avg_function_length: float
max_function_length: int
functions: List[FunctionMetrics]
class ComplexityAnalyzer:
"""Analyze code complexity for multiple languages."""
# Patterns for different languages
PATTERNS = {
'python': {
'function': r'^\s*def\s+(\w+)\s*\(',
'class': r'^\s*class\s+(\w+)',
'decision': [r'\bif\b', r'\belif\b', r'\bfor\b', r'\bwhile\b',
r'\bexcept\b', r'\band\b(?!$)', r'\bor\b(?!$)',
r'\bcase\b', r'\btry\b'],
'comment': r'^\s*#',
'multiline_comment_start': r'^\s*["\'][\"\'][\"\']',
'multiline_comment_end': r'["\'][\"\'][\"\']',
},
'javascript': {
'function': r'(?:function\s+(\w+)|(\w+)\s*[=:]\s*(?:async\s+)?(?:function|\([^)]*\)\s*=>))',
'class': r'class\s+(\w+)',
'decision': [r'\bif\b', r'\belse\s+if\b', r'\bfor\b', r'\bwhile\b',
r'\bcatch\b', r'\b\?\b', r'\b&&\b', r'\b\|\|\b',
r'\bcase\b', r'\btry\b'],
'comment': r'^\s*//',
'multiline_comment_start': r'/\*',
'multiline_comment_end': r'\*/',
},
'typescript': {
'function': r'(?:function\s+(\w+)|(\w+)\s*[=:]\s*(?:async\s+)?(?:function|\([^)]*\)\s*=>))',
'class': r'class\s+(\w+)',
'decision': [r'\bif\b', r'\belse\s+if\b', r'\bfor\b', r'\bwhile\b',
r'\bcatch\b', r'\b\?\b', r'\b&&\b', r'\b\|\|\b',
r'\bcase\b', r'\btry\b'],
'comment': r'^\s*//',
'multiline_comment_start': r'/\*',
'multiline_comment_end': r'\*/',
}
}
def __init__(self, filepath: str):
self.filepath = filepath
self.filename = os.path.basename(filepath)
self.language = self._detect_language()
self.patterns = self.PATTERNS.get(self.language, self.PATTERNS['python'])
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
self.code = f.read()
self.lines = self.code.split('\n')
def _detect_language(self) -> str:
"""Detect programming language from file extension."""
ext = os.path.splitext(self.filepath)[1].lower()
ext_map = {
'.py': 'python',
'.js': 'javascript',
'.jsx': 'javascript',
'.ts': 'typescript',
'.tsx': 'typescript',
}
return ext_map.get(ext, 'python')
def calculate_cyclomatic_complexity(self, code: Optional[str] = None) -> int:
"""
Calculate cyclomatic complexity using McCabe's method.
CC = E - N + 2P where E=edges, N=nodes, P=connected components
Simplified: Count decision points + 1
"""
if code is None:
code = self.code
complexity = 1 # Base complexity
for pattern in self.patterns['decision']:
matches = re.findall(pattern, code)
complexity += len(matches)
return complexity
def calculate_cognitive_complexity(self, code: Optional[str] = None) -> int:
"""
Calculate cognitive complexity.
Measures how hard it is to understand the code.
Accounts for nesting depth and control flow breaks.
"""
if code is None:
code = self.code
lines = code.split('\n')
cognitive = 0
nesting_depth = 0
in_function = False
for line in lines:
stripped = line.strip()
# Track function boundaries
if re.search(self.patterns['function'], line):
in_function = True
nesting_depth = 0
# Increment for control flow structures
if re.search(r'\b(if|for|while|switch)\b', stripped):
nesting_depth += 1
cognitive += nesting_depth # Nested structures cost more
elif re.search(r'\b(elif|else if|else|catch|finally)\b', stripped):
cognitive += nesting_depth # Same level as parent
# Track nesting through braces/indentation
if self.language in ['javascript', 'typescript']:
nesting_depth += stripped.count('{') - stripped.count('}')
nesting_depth = max(0, nesting_depth)
# Bonus for breaks in linear flow
if re.search(r'\b(break|continue|return|throw)\b', stripped):
if nesting_depth > 1:
cognitive += 1
# Bonus for recursion
# (simplified: just look for function calling itself)
return cognitive
def calculate_maintainability_index(self) -> float:
"""
Calculate Maintainability Index (0-100).
Based on Halstead Volume, Cyclomatic Complexity, and Lines of Code.
MI = max(0, (171 - 5.2*ln(V) - 0.23*CC - 16.2*ln(LOC)) * 100/171)
Interpretation:
- 85-100: Highly maintainable
- 65-84: Moderately maintainable
- 50-64: Difficult to maintain
- 0-49: Very difficult to maintain
"""
import math
loc = len([l for l in self.lines if l.strip()])
cc = self.calculate_cyclomatic_complexity()
# Simplified Halstead Volume approximation
# Count unique operators and operands
operators = len(re.findall(r'[+\-*/%=<>!&|^~]', self.code))
operands = len(re.findall(r'\b\w+\b', self.code))
volume = (operators + operands) * math.log2(max(1, operators + operands))
# Calculate MI
mi = 171 - 5.2 * math.log(max(1, volume)) - 0.23 * cc - 16.2 * math.log(max(1, loc))
mi = max(0, min(100, mi * 100 / 171))
return round(mi, 2)
def count_lines(self) -> Dict[str, int]:
"""Count different types of lines."""
total = len(self.lines)
blank = 0
comment = 0
in_multiline_comment = False
for line in self.lines:
stripped = line.strip()
# Check for multiline comments
if re.search(self.patterns['multiline_comment_start'], stripped):
in_multiline_comment = True
if re.search(self.patterns['multiline_comment_end'], stripped):
in_multiline_comment = False
comment += 1
continue
if in_multiline_comment:
comment += 1
elif not stripped:
blank += 1
elif re.match(self.patterns['comment'], stripped):
comment += 1
return {
'total': total,
'blank': blank,
'comment': comment,
'code': total - blank - comment
}
def find_functions(self) -> List[FunctionMetrics]:
"""Find all functions and calculate their individual metrics."""
functions = []
current_function = None
function_start = 0
brace_depth = 0
for i, line in enumerate(self.lines):
# Check for function definition
match = re.search(self.patterns['function'], line)
if match:
# Save previous function if exists
if current_function:
func_code = '\n'.join(self.lines[function_start:i])
functions.append(self._create_function_metrics(
current_function, function_start, i - 1, func_code
))
current_function = match.group(1) or match.group(2) if match.lastindex and match.lastindex > 1 else match.group(1)
function_start = i
brace_depth = 0
# Track braces for JS/TS
if self.language in ['javascript', 'typescript']:
brace_depth += line.count('{') - line.count('}')
# Don't forget the last function
if current_function:
func_code = '\n'.join(self.lines[function_start:])
functions.append(self._create_function_metrics(
current_function, function_start, len(self.lines) - 1, func_code
))
return functions
def _create_function_metrics(self, name: str, start: int, end: int, code: str) -> FunctionMetrics:
"""Create metrics for a single function."""
lines = end - start + 1
# Count parameters (simplified)
param_match = re.search(r'\(([^)]*)\)', code.split('\n')[0])
param_count = 0
if param_match and param_match.group(1).strip():
param_count = len([p for p in param_match.group(1).split(',') if p.strip()])
return FunctionMetrics(
name=name,
start_line=start + 1,
end_line=end + 1,
lines=lines,
cyclomatic_complexity=self.calculate_cyclomatic_complexity(code),
cognitive_complexity=self.calculate_cognitive_complexity(code),
parameter_count=param_count
)
def analyze(self) -> FileMetrics:
"""Perform complete analysis of the file."""
line_counts = self.count_lines()
functions = self.find_functions()
# Count classes
class_count = len(re.findall(self.patterns['class'], self.code))
# Calculate averages
func_lengths = [f.lines for f in functions] if functions else [0]
avg_func_length = sum(func_lengths) / len(func_lengths)
max_func_length = max(func_lengths)
return FileMetrics(
filename=self.filename,
lines_of_code=line_counts['code'],
blank_lines=line_counts['blank'],
comment_lines=line_counts['comment'],
function_count=len(functions),
class_count=class_count,
cyclomatic_complexity=self.calculate_cyclomatic_complexity(),
cognitive_complexity=self.calculate_cognitive_complexity(),
maintainability_index=self.calculate_maintainability_index(),
avg_function_length=round(avg_func_length, 1),
max_function_length=max_func_length,
functions=functions
)
def print_metrics(metrics: FileMetrics, verbose: bool = False) -> None:
"""Print metrics in a readable format."""
print("=" * 60)
print(f"CODE COMPLEXITY ANALYSIS: {metrics.filename}")
print("=" * 60)
print("\n📊 OVERVIEW")
print("-" * 40)
print(f" Lines of Code: {metrics.lines_of_code}")
print(f" Blank Lines: {metrics.blank_lines}")
print(f" Comment Lines: {metrics.comment_lines}")
print(f" Functions/Methods: {metrics.function_count}")
print(f" Classes: {metrics.class_count}")
print("\n📈 COMPLEXITY METRICS")
print("-" * 40)
print(f" Cyclomatic Complexity: {metrics.cyclomatic_complexity}")
print(f" Cognitive Complexity: {metrics.cognitive_complexity}")
print(f" Maintainability Index: {metrics.maintainability_index}")
# Interpret maintainability
mi = metrics.maintainability_index
if mi >= 85:
mi_label = "Highly maintainable ✅"
elif mi >= 65:
mi_label = "Moderately maintainable 🔶"
elif mi >= 50:
mi_label = "Difficult to maintain ⚠️"
else:
mi_label = "Very difficult to maintain ❌"
print(f"{mi_label}")
print("\n📐 FUNCTION METRICS")
print("-" * 40)
print(f" Avg Function Length: {metrics.avg_function_length} lines")
print(f" Max Function Length: {metrics.max_function_length} lines")
if verbose and metrics.functions:
print("\n📋 FUNCTION DETAILS")
print("-" * 40)
for f in sorted(metrics.functions, key=lambda x: x.cyclomatic_complexity, reverse=True):
flag = " ⚠️" if f.cyclomatic_complexity > 10 or f.lines > 50 else ""
print(f" {f.name}() [lines {f.start_line}-{f.end_line}]{flag}")
print(f" - Lines: {f.lines}, CC: {f.cyclomatic_complexity}, "
f"Cognitive: {f.cognitive_complexity}, Params: {f.parameter_count}")
print("\n" + "=" * 60)
def print_comparison(before: FileMetrics, after: FileMetrics) -> None:
"""Print comparison between two analyses."""
print("=" * 70)
print("CODE COMPLEXITY COMPARISON")
print("=" * 70)
print(f"\n{'Metric':<30} {'Before':<15} {'After':<15} {'Change':<10}")
print("-" * 70)
def fmt_change(before_val, after_val, lower_is_better=True):
diff = after_val - before_val
if lower_is_better:
symbol = "" if diff < 0 else ("⚠️" if diff > 0 else "")
else:
symbol = "" if diff > 0 else ("⚠️" if diff < 0 else "")
return f"{diff:+.1f} {symbol}" if isinstance(diff, float) else f"{diff:+d} {symbol}"
metrics = [
("Lines of Code", before.lines_of_code, after.lines_of_code, True),
("Function Count", before.function_count, after.function_count, False),
("Class Count", before.class_count, after.class_count, False),
("Cyclomatic Complexity", before.cyclomatic_complexity, after.cyclomatic_complexity, True),
("Cognitive Complexity", before.cognitive_complexity, after.cognitive_complexity, True),
("Maintainability Index", before.maintainability_index, after.maintainability_index, False),
("Avg Function Length", before.avg_function_length, after.avg_function_length, True),
("Max Function Length", before.max_function_length, after.max_function_length, True),
]
for name, b_val, a_val, lower_better in metrics:
change = fmt_change(b_val, a_val, lower_better)
print(f"{name:<30} {b_val:<15} {a_val:<15} {change:<10}")
print("\n" + "=" * 70)
# Overall assessment
print("\n🎯 ASSESSMENT")
print("-" * 40)
improvements = 0
regressions = 0
if after.maintainability_index > before.maintainability_index:
print(" ✅ Maintainability improved")
improvements += 1
elif after.maintainability_index < before.maintainability_index:
print(" ⚠️ Maintainability decreased")
regressions += 1
if after.cyclomatic_complexity < before.cyclomatic_complexity:
print(" ✅ Complexity reduced")
improvements += 1
elif after.cyclomatic_complexity > before.cyclomatic_complexity:
print(" ⚠️ Complexity increased")
regressions += 1
if after.avg_function_length < before.avg_function_length:
print(" ✅ Functions are smaller on average")
improvements += 1
elif after.avg_function_length > before.avg_function_length:
print(" ⚠️ Functions grew larger on average")
regressions += 1
print(f"\n Summary: {improvements} improvements, {regressions} regressions")
print("=" * 70)
def analyze_directory(directory: str, verbose: bool = False) -> None:
"""Analyze all supported files in a directory."""
supported_extensions = ['.py', '.js', '.jsx', '.ts', '.tsx']
files = []
for root, _, filenames in os.walk(directory):
for filename in filenames:
if any(filename.endswith(ext) for ext in supported_extensions):
files.append(os.path.join(root, filename))
if not files:
print(f"No supported files found in {directory}")
return
print(f"Analyzing {len(files)} files in {directory}...\n")
total_loc = 0
total_cc = 0
total_functions = 0
all_metrics = []
for filepath in sorted(files):
try:
analyzer = ComplexityAnalyzer(filepath)
metrics = analyzer.analyze()
all_metrics.append(metrics)
total_loc += metrics.lines_of_code
total_cc += metrics.cyclomatic_complexity
total_functions += metrics.function_count
if verbose:
print_metrics(metrics, verbose=True)
else:
flag = " ⚠️" if metrics.maintainability_index < 65 else ""
print(f" {metrics.filename}: LOC={metrics.lines_of_code}, "
f"CC={metrics.cyclomatic_complexity}, MI={metrics.maintainability_index}{flag}")
except Exception as e:
print(f" Error analyzing {filepath}: {e}")
print("\n" + "=" * 60)
print("SUMMARY")
print("=" * 60)
print(f" Files analyzed: {len(all_metrics)}")
print(f" Total lines of code: {total_loc}")
print(f" Total complexity: {total_cc}")
print(f" Total functions: {total_functions}")
if all_metrics:
avg_mi = sum(m.maintainability_index for m in all_metrics) / len(all_metrics)
print(f" Avg maintainability: {avg_mi:.1f}")
def main():
parser = argparse.ArgumentParser(
description='Analyze code complexity metrics',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s myfile.py Analyze single file
%(prog)s before.py after.py Compare two versions
%(prog)s --dir src/ Analyze directory
%(prog)s -v myfile.py Verbose output with function details
"""
)
parser.add_argument('files', nargs='*', help='File(s) to analyze')
parser.add_argument('--dir', '-d', help='Directory to analyze')
parser.add_argument('--verbose', '-v', action='store_true', help='Show detailed function metrics')
parser.add_argument('--json', '-j', action='store_true', help='Output as JSON')
args = parser.parse_args()
if args.dir:
analyze_directory(args.dir, args.verbose)
elif len(args.files) == 1:
analyzer = ComplexityAnalyzer(args.files[0])
metrics = analyzer.analyze()
if args.json:
import json
print(json.dumps({
'filename': metrics.filename,
'lines_of_code': metrics.lines_of_code,
'cyclomatic_complexity': metrics.cyclomatic_complexity,
'cognitive_complexity': metrics.cognitive_complexity,
'maintainability_index': metrics.maintainability_index,
'function_count': metrics.function_count,
'avg_function_length': metrics.avg_function_length,
}, indent=2))
else:
print_metrics(metrics, args.verbose)
elif len(args.files) == 2:
before_analyzer = ComplexityAnalyzer(args.files[0])
after_analyzer = ComplexityAnalyzer(args.files[1])
before_metrics = before_analyzer.analyze()
after_metrics = after_analyzer.analyze()
print_comparison(before_metrics, after_metrics)
else:
parser.print_help()
sys.exit(1)
if __name__ == '__main__':
main()
+711
View File
@@ -0,0 +1,711 @@
#!/usr/bin/env python3
"""
Code Smell Detector
Detects common code smells in Python, JavaScript, and TypeScript files.
Based on Martin Fowler's catalog of code smells.
Usage:
python detect-smells.py <file>
python detect-smells.py --dir <directory>
python detect-smells.py -v <file> # Verbose with code snippets
Detects:
- Long Method (>30 lines)
- Long Parameter List (>4 params)
- Duplicate Code (similar code blocks)
- Large Class (>300 lines, >10 methods)
- Dead Code (unused variables/functions)
- Complex Conditionals (deep nesting, long chains)
- Magic Numbers/Strings
- Feature Envy (methods using other class data heavily)
- Comments explaining what (not why)
"""
import argparse
import os
import re
import sys
from dataclasses import dataclass, field
from enum import Enum
from typing import Dict, List, Optional, Set, Tuple
from collections import defaultdict
class SmellSeverity(Enum):
"""Severity levels for code smells."""
LOW = "Low"
MEDIUM = "Medium"
HIGH = "High"
CRITICAL = "Critical"
class SmellType(Enum):
"""Types of code smells."""
LONG_METHOD = "Long Method"
LONG_PARAMETER_LIST = "Long Parameter List"
DUPLICATE_CODE = "Duplicate Code"
LARGE_CLASS = "Large Class"
DEAD_CODE = "Dead Code"
COMPLEX_CONDITIONAL = "Complex Conditional"
MAGIC_NUMBER = "Magic Number/String"
FEATURE_ENVY = "Feature Envy"
EXCESSIVE_COMMENTS = "Excessive Comments"
DEEPLY_NESTED = "Deeply Nested Code"
PRIMITIVE_OBSESSION = "Primitive Obsession"
DATA_CLUMPS = "Data Clumps"
SWITCH_STATEMENT = "Switch Statement"
MESSAGE_CHAIN = "Message Chain"
@dataclass
class CodeSmell:
"""Represents a detected code smell."""
smell_type: SmellType
severity: SmellSeverity
location: str
line_start: int
line_end: int
description: str
suggestion: str
code_snippet: str = ""
@dataclass
class SmellReport:
"""Report of all smells found in a file."""
filename: str
smells: List[CodeSmell] = field(default_factory=list)
@property
def critical_count(self) -> int:
return sum(1 for s in self.smells if s.severity == SmellSeverity.CRITICAL)
@property
def high_count(self) -> int:
return sum(1 for s in self.smells if s.severity == SmellSeverity.HIGH)
@property
def medium_count(self) -> int:
return sum(1 for s in self.smells if s.severity == SmellSeverity.MEDIUM)
@property
def low_count(self) -> int:
return sum(1 for s in self.smells if s.severity == SmellSeverity.LOW)
class SmellDetector:
"""Detect code smells in source files."""
# Thresholds (configurable)
THRESHOLDS = {
'long_method_lines': 30,
'very_long_method_lines': 50,
'max_parameters': 4,
'large_class_lines': 300,
'large_class_methods': 10,
'max_nesting_depth': 4,
'long_chain_length': 3,
'duplicate_min_lines': 5,
}
def __init__(self, filepath: str):
self.filepath = filepath
self.filename = os.path.basename(filepath)
self.language = self._detect_language()
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
self.code = f.read()
self.lines = self.code.split('\n')
self.smells: List[CodeSmell] = []
def _detect_language(self) -> str:
"""Detect programming language from file extension."""
ext = os.path.splitext(self.filepath)[1].lower()
ext_map = {
'.py': 'python',
'.js': 'javascript',
'.jsx': 'javascript',
'.ts': 'typescript',
'.tsx': 'typescript',
}
return ext_map.get(ext, 'python')
def detect_all(self) -> SmellReport:
"""Run all smell detectors."""
self._detect_long_methods()
self._detect_long_parameter_lists()
self._detect_large_class()
self._detect_complex_conditionals()
self._detect_magic_numbers()
self._detect_excessive_comments()
self._detect_deeply_nested()
self._detect_switch_statements()
self._detect_message_chains()
self._detect_duplicate_code()
self._detect_dead_code()
return SmellReport(filename=self.filename, smells=self.smells)
def _get_snippet(self, start: int, end: int, context: int = 2) -> str:
"""Get code snippet with context."""
actual_start = max(0, start - context)
actual_end = min(len(self.lines), end + context)
snippet_lines = []
for i in range(actual_start, actual_end):
prefix = "" if start <= i < end else " "
snippet_lines.append(f"{i+1:4d} {prefix}{self.lines[i]}")
return '\n'.join(snippet_lines)
def _detect_long_methods(self) -> None:
"""Detect methods that are too long."""
if self.language == 'python':
pattern = r'^\s*def\s+(\w+)\s*\([^)]*\):'
else:
pattern = r'(?:function\s+(\w+)|(\w+)\s*[=:]\s*(?:async\s+)?(?:function|\([^)]*\)\s*=>))'
current_method = None
method_start = 0
brace_depth = 0
indent_level = 0
for i, line in enumerate(self.lines):
match = re.search(pattern, line)
if match:
# Check previous method if exists
if current_method:
method_lines = i - method_start
self._check_method_length(current_method, method_start, i - 1, method_lines)
current_method = match.group(1) or (match.group(2) if match.lastindex and match.lastindex > 1 else None)
method_start = i
indent_level = len(line) - len(line.lstrip())
# Track end of Python functions by indentation
if self.language == 'python' and current_method:
if line.strip() and not line.strip().startswith('#'):
current_indent = len(line) - len(line.lstrip())
if current_indent <= indent_level and i > method_start:
method_lines = i - method_start
self._check_method_length(current_method, method_start, i - 1, method_lines)
current_method = None
# Check last method
if current_method:
method_lines = len(self.lines) - method_start
self._check_method_length(current_method, method_start, len(self.lines) - 1, method_lines)
def _check_method_length(self, name: str, start: int, end: int, lines: int) -> None:
"""Check if method is too long and add smell if so."""
if lines > self.THRESHOLDS['very_long_method_lines']:
severity = SmellSeverity.HIGH
desc = f"Method '{name}' is {lines} lines (threshold: {self.THRESHOLDS['long_method_lines']})"
elif lines > self.THRESHOLDS['long_method_lines']:
severity = SmellSeverity.MEDIUM
desc = f"Method '{name}' is {lines} lines (threshold: {self.THRESHOLDS['long_method_lines']})"
else:
return
self.smells.append(CodeSmell(
smell_type=SmellType.LONG_METHOD,
severity=severity,
location=f"{self.filename}:{start+1}-{end+1}",
line_start=start + 1,
line_end=end + 1,
description=desc,
suggestion="Apply Extract Method to break down into smaller functions",
code_snippet=self._get_snippet(start, min(start + 5, end), 0)
))
def _detect_long_parameter_lists(self) -> None:
"""Detect functions with too many parameters."""
if self.language == 'python':
pattern = r'def\s+(\w+)\s*\(([^)]*)\)'
else:
pattern = r'(?:function\s+(\w+)\s*\(([^)]*)\)|(\w+)\s*[=:]\s*(?:async\s+)?(?:function\s*)?\(([^)]*)\))'
for i, line in enumerate(self.lines):
match = re.search(pattern, line)
if match:
# Safely extract groups
groups = match.groups()
func_name = groups[0] or (groups[2] if len(groups) > 2 else None)
params_str = groups[1] if len(groups) > 1 else ""
if not params_str and len(groups) > 3:
params_str = groups[3] or ""
# Count parameters
if params_str.strip():
params = [p.strip() for p in params_str.split(',') if p.strip()]
# Filter out 'self', 'cls' for Python
if self.language == 'python':
params = [p for p in params if p not in ('self', 'cls')]
param_count = len(params)
if param_count > self.THRESHOLDS['max_parameters']:
severity = SmellSeverity.HIGH if param_count > 6 else SmellSeverity.MEDIUM
self.smells.append(CodeSmell(
smell_type=SmellType.LONG_PARAMETER_LIST,
severity=severity,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description=f"Function '{func_name}' has {param_count} parameters (max: {self.THRESHOLDS['max_parameters']})",
suggestion="Consider Introduce Parameter Object or Preserve Whole Object",
code_snippet=self._get_snippet(i, i + 1, 1)
))
def _detect_large_class(self) -> None:
"""Detect classes that are too large."""
if self.language == 'python':
class_pattern = r'^\s*class\s+(\w+)'
method_pattern = r'^\s+def\s+\w+'
else:
class_pattern = r'class\s+(\w+)'
method_pattern = r'(?:^\s+\w+\s*\([^)]*\)\s*\{|^\s+(?:async\s+)?\w+\s*=)'
current_class = None
class_start = 0
method_count = 0
class_indent = 0
for i, line in enumerate(self.lines):
class_match = re.search(class_pattern, line)
if class_match:
# Check previous class
if current_class:
self._check_class_size(current_class, class_start, i - 1, method_count)
current_class = class_match.group(1)
class_start = i
method_count = 0
class_indent = len(line) - len(line.lstrip())
# Count methods in current class
if current_class and re.search(method_pattern, line):
method_count += 1
# Check last class
if current_class:
self._check_class_size(current_class, class_start, len(self.lines) - 1, method_count)
def _check_class_size(self, name: str, start: int, end: int, methods: int) -> None:
"""Check if class is too large."""
lines = end - start + 1
issues = []
severity = SmellSeverity.MEDIUM
if lines > self.THRESHOLDS['large_class_lines']:
issues.append(f"{lines} lines (max: {self.THRESHOLDS['large_class_lines']})")
severity = SmellSeverity.HIGH
if methods > self.THRESHOLDS['large_class_methods']:
issues.append(f"{methods} methods (max: {self.THRESHOLDS['large_class_methods']})")
if severity != SmellSeverity.HIGH:
severity = SmellSeverity.MEDIUM
if issues:
self.smells.append(CodeSmell(
smell_type=SmellType.LARGE_CLASS,
severity=severity,
location=f"{self.filename}:{start+1}-{end+1}",
line_start=start + 1,
line_end=end + 1,
description=f"Class '{name}' is too large: {', '.join(issues)}",
suggestion="Apply Extract Class to split responsibilities",
code_snippet=self._get_snippet(start, start + 3, 0)
))
def _detect_complex_conditionals(self) -> None:
"""Detect complex conditional expressions."""
for i, line in enumerate(self.lines):
# Count logical operators in line
and_or_count = len(re.findall(r'\b(and|or|&&|\|\|)\b', line))
if and_or_count >= 3:
self.smells.append(CodeSmell(
smell_type=SmellType.COMPLEX_CONDITIONAL,
severity=SmellSeverity.MEDIUM,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description=f"Complex conditional with {and_or_count} logical operators",
suggestion="Apply Decompose Conditional or Consolidate Conditional Expression",
code_snippet=self._get_snippet(i, i + 1, 1)
))
def _detect_magic_numbers(self) -> None:
"""Detect magic numbers and strings."""
# Skip common acceptable values
acceptable = {'0', '1', '-1', '2', '100', 'true', 'false', 'null', 'None', '""', "''"}
for i, line in enumerate(self.lines):
# Skip comments and imports
stripped = line.strip()
if stripped.startswith('#') or stripped.startswith('//') or \
stripped.startswith('import') or stripped.startswith('from'):
continue
# Find numeric literals (excluding in variable names)
numbers = re.findall(r'(?<![a-zA-Z_])\b(\d+\.?\d*)\b(?![a-zA-Z_])', line)
for num in numbers:
if num not in acceptable and float(num) > 2:
# Check if it's likely a magic number (in calculation or comparison)
if re.search(rf'[<>=+\-*/]\s*{re.escape(num)}|{re.escape(num)}\s*[<>=+\-*/]', line):
self.smells.append(CodeSmell(
smell_type=SmellType.MAGIC_NUMBER,
severity=SmellSeverity.LOW,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description=f"Magic number '{num}' - consider using a named constant",
suggestion="Replace magic number with named constant",
code_snippet=self._get_snippet(i, i + 1, 0)
))
break # One magic number per line is enough
def _detect_excessive_comments(self) -> None:
"""Detect comments that explain 'what' instead of 'why'."""
what_patterns = [
r'#\s*(set|get|return|loop|iterate|check|if|increment|decrement)',
r'//\s*(set|get|return|loop|iterate|check|if|increment|decrement)',
]
for i, line in enumerate(self.lines):
for pattern in what_patterns:
if re.search(pattern, line, re.IGNORECASE):
self.smells.append(CodeSmell(
smell_type=SmellType.EXCESSIVE_COMMENTS,
severity=SmellSeverity.LOW,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description="Comment explains 'what' not 'why' - consider renaming or removing",
suggestion="Use Extract Method with descriptive name instead of comment",
code_snippet=self._get_snippet(i, i + 1, 0)
))
break
def _detect_deeply_nested(self) -> None:
"""Detect deeply nested code blocks."""
max_depth = 0
current_depth = 0
depth_start = 0
for i, line in enumerate(self.lines):
if self.language == 'python':
# Count by indentation
if line.strip():
indent = len(line) - len(line.lstrip())
depth = indent // 4 # Assume 4-space indent
if depth > current_depth:
if depth > max_depth:
max_depth = depth
if depth >= self.THRESHOLDS['max_nesting_depth']:
depth_start = i
current_depth = depth
else:
# Count braces
current_depth += line.count('{') - line.count('}')
if current_depth > max_depth:
max_depth = current_depth
if current_depth >= self.THRESHOLDS['max_nesting_depth']:
depth_start = i
if max_depth >= self.THRESHOLDS['max_nesting_depth']:
self.smells.append(CodeSmell(
smell_type=SmellType.DEEPLY_NESTED,
severity=SmellSeverity.HIGH if max_depth > 5 else SmellSeverity.MEDIUM,
location=f"{self.filename}:{depth_start+1}",
line_start=depth_start + 1,
line_end=depth_start + 1,
description=f"Code nested {max_depth} levels deep (max: {self.THRESHOLDS['max_nesting_depth']})",
suggestion="Apply Replace Nested Conditional with Guard Clauses or Extract Method",
code_snippet=self._get_snippet(depth_start, depth_start + 5, 0)
))
def _detect_switch_statements(self) -> None:
"""Detect switch statements that might need polymorphism."""
if self.language == 'python':
# Python 3.10+ match statements or if/elif chains
pattern = r'^\s*(if|elif).*==.*:'
consecutive_conditions = 0
chain_start = 0
for i, line in enumerate(self.lines):
if re.search(pattern, line):
if consecutive_conditions == 0:
chain_start = i
consecutive_conditions += 1
else:
if consecutive_conditions >= 4:
self._add_switch_smell(chain_start, i - 1, consecutive_conditions)
consecutive_conditions = 0
else:
# JavaScript/TypeScript switch
pattern = r'\bswitch\s*\('
for i, line in enumerate(self.lines):
if re.search(pattern, line):
# Count cases
case_count = 0
for j in range(i, min(i + 50, len(self.lines))):
case_count += len(re.findall(r'\bcase\b', self.lines[j]))
if case_count >= 4:
self._add_switch_smell(i, i + 1, case_count)
def _add_switch_smell(self, start: int, end: int, cases: int) -> None:
"""Add a switch statement smell."""
self.smells.append(CodeSmell(
smell_type=SmellType.SWITCH_STATEMENT,
severity=SmellSeverity.MEDIUM,
location=f"{self.filename}:{start+1}",
line_start=start + 1,
line_end=end + 1,
description=f"Switch/case statement with {cases} cases - consider polymorphism",
suggestion="Apply Replace Conditional with Polymorphism",
code_snippet=self._get_snippet(start, start + 5, 0)
))
def _detect_message_chains(self) -> None:
"""Detect long method chains (train wrecks)."""
chain_pattern = r'(\w+(?:\.\w+\([^)]*\)){3,})'
for i, line in enumerate(self.lines):
matches = re.findall(chain_pattern, line)
for match in matches:
chain_length = match.count('.')
if chain_length >= self.THRESHOLDS['long_chain_length']:
self.smells.append(CodeSmell(
smell_type=SmellType.MESSAGE_CHAIN,
severity=SmellSeverity.MEDIUM,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description=f"Message chain with {chain_length} calls - violates Law of Demeter",
suggestion="Apply Hide Delegate to reduce coupling",
code_snippet=self._get_snippet(i, i + 1, 0)
))
def _detect_duplicate_code(self) -> None:
"""Detect potential duplicate code blocks (simplified)."""
# Create line hashes for comparison
line_hashes: Dict[str, List[int]] = defaultdict(list)
for i, line in enumerate(self.lines):
normalized = re.sub(r'\s+', ' ', line.strip())
if len(normalized) > 20: # Only significant lines
line_hashes[normalized].append(i)
# Find duplicates
for normalized, positions in line_hashes.items():
if len(positions) >= 3: # At least 3 occurrences
self.smells.append(CodeSmell(
smell_type=SmellType.DUPLICATE_CODE,
severity=SmellSeverity.MEDIUM,
location=f"{self.filename}:{positions[0]+1}",
line_start=positions[0] + 1,
line_end=positions[0] + 1,
description=f"Potential duplicate code found {len(positions)} times",
suggestion="Apply Extract Method to eliminate duplication",
code_snippet=self._get_snippet(positions[0], positions[0] + 1, 0)
))
def _detect_dead_code(self) -> None:
"""Detect potentially dead code (simplified)."""
# Look for common dead code patterns
patterns = [
(r'^\s*#.*TODO.*delete', "TODO to delete"),
(r'^\s*#.*FIXME.*remove', "FIXME to remove"),
(r'^\s*//.*TODO.*delete', "TODO to delete"),
(r'^\s*//.*FIXME.*remove', "FIXME to remove"),
(r'^\s*if\s+False:', "Code behind 'if False'"),
(r'^\s*if\s*\(\s*false\s*\)', "Code behind 'if (false)'"),
]
for i, line in enumerate(self.lines):
for pattern, desc in patterns:
if re.search(pattern, line, re.IGNORECASE):
self.smells.append(CodeSmell(
smell_type=SmellType.DEAD_CODE,
severity=SmellSeverity.LOW,
location=f"{self.filename}:{i+1}",
line_start=i + 1,
line_end=i + 1,
description=f"Potential dead code: {desc}",
suggestion="Remove dead code",
code_snippet=self._get_snippet(i, i + 1, 0)
))
def print_report(report: SmellReport, verbose: bool = False) -> None:
"""Print smell report in readable format."""
print("=" * 70)
print(f"CODE SMELL DETECTION REPORT: {report.filename}")
print("=" * 70)
print(f"\n📊 SUMMARY")
print("-" * 40)
print(f" Total smells found: {len(report.smells)}")
print(f" Critical: {report.critical_count}")
print(f" High: {report.high_count}")
print(f" Medium: {report.medium_count}")
print(f" Low: {report.low_count}")
if not report.smells:
print("\n✅ No code smells detected!")
print("=" * 70)
return
# Group by type
by_type: Dict[SmellType, List[CodeSmell]] = defaultdict(list)
for smell in report.smells:
by_type[smell.smell_type].append(smell)
print(f"\n📋 FINDINGS BY TYPE")
print("-" * 40)
for smell_type, smells in sorted(by_type.items(), key=lambda x: -len(x[1])):
print(f"\n### {smell_type.value} ({len(smells)} found)")
for smell in sorted(smells, key=lambda x: x.severity.value):
severity_icon = {
SmellSeverity.CRITICAL: "🔴",
SmellSeverity.HIGH: "🟠",
SmellSeverity.MEDIUM: "🟡",
SmellSeverity.LOW: "🟢",
}[smell.severity]
print(f"\n {severity_icon} [{smell.severity.value}] {smell.location}")
print(f" {smell.description}")
print(f" 💡 {smell.suggestion}")
if verbose and smell.code_snippet:
print(f"\n Code:")
for snippet_line in smell.code_snippet.split('\n'):
print(f" {snippet_line}")
print("\n" + "=" * 70)
print("💡 RECOMMENDED ACTIONS")
print("-" * 40)
if report.critical_count > 0:
print(" 1. Address CRITICAL issues immediately")
if report.high_count > 0:
print(" 2. Plan to fix HIGH severity issues this sprint")
if report.medium_count > 0:
print(" 3. Schedule MEDIUM issues for upcoming work")
if report.low_count > 0:
print(" 4. Fix LOW issues opportunistically")
print("\n" + "=" * 70)
def analyze_directory(directory: str, verbose: bool = False) -> None:
"""Analyze all supported files in a directory."""
supported_extensions = ['.py', '.js', '.jsx', '.ts', '.tsx']
files = []
for root, _, filenames in os.walk(directory):
for filename in filenames:
if any(filename.endswith(ext) for ext in supported_extensions):
files.append(os.path.join(root, filename))
if not files:
print(f"No supported files found in {directory}")
return
print(f"Scanning {len(files)} files in {directory}...\n")
total_smells = 0
total_critical = 0
total_high = 0
files_with_smells = 0
for filepath in sorted(files):
try:
detector = SmellDetector(filepath)
report = detector.detect_all()
if report.smells:
files_with_smells += 1
total_smells += len(report.smells)
total_critical += report.critical_count
total_high += report.high_count
flag = " 🔴" if report.critical_count else (" 🟠" if report.high_count else " 🟡")
print(f" {report.filename}: {len(report.smells)} smells{flag}")
if verbose:
for smell in report.smells:
print(f" - [{smell.severity.value}] {smell.smell_type.value}: line {smell.line_start}")
else:
print(f" {report.filename}: ✅ Clean")
except Exception as e:
print(f" Error analyzing {filepath}: {e}")
print("\n" + "=" * 60)
print("SUMMARY")
print("=" * 60)
print(f" Files analyzed: {len(files)}")
print(f" Files with smells: {files_with_smells}")
print(f" Total smells found: {total_smells}")
print(f" Critical issues: {total_critical}")
print(f" High severity issues: {total_high}")
def main():
parser = argparse.ArgumentParser(
description='Detect code smells in source files',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s myfile.py Analyze single file
%(prog)s --dir src/ Analyze directory
%(prog)s -v myfile.py Verbose with code snippets
"""
)
parser.add_argument('file', nargs='?', help='File to analyze')
parser.add_argument('--dir', '-d', help='Directory to analyze')
parser.add_argument('--verbose', '-v', action='store_true', help='Show code snippets')
parser.add_argument('--json', '-j', action='store_true', help='Output as JSON')
args = parser.parse_args()
if args.dir:
analyze_directory(args.dir, args.verbose)
elif args.file:
detector = SmellDetector(args.file)
report = detector.detect_all()
if args.json:
import json
smells_data = [{
'type': s.smell_type.value,
'severity': s.severity.value,
'location': s.location,
'line_start': s.line_start,
'line_end': s.line_end,
'description': s.description,
'suggestion': s.suggestion,
} for s in report.smells]
print(json.dumps({
'filename': report.filename,
'total_smells': len(report.smells),
'critical': report.critical_count,
'high': report.high_count,
'medium': report.medium_count,
'low': report.low_count,
'smells': smells_data
}, indent=2))
else:
print_report(report, args.verbose)
else:
parser.print_help()
sys.exit(1)
if __name__ == '__main__':
main()
@@ -0,0 +1,284 @@
# Refactoring Plan Template
Use this template to document and track your refactoring effort.
---
## Project Information
| Field | Value |
|-------|-------|
| **Project/Module** | [Project name] |
| **Target Files** | [List of files to refactor] |
| **Date Created** | [Date] |
| **Author** | [Name] |
| **Status** | Draft / In Review / Approved / In Progress / Completed |
---
## Executive Summary
### Goals
- [ ] [Primary goal: e.g., Improve readability of payment processing]
- [ ] [Secondary goal: e.g., Reduce code duplication]
- [ ] [Tertiary goal: e.g., Improve testability]
### Constraints
- [ ] [Constraint 1: e.g., Cannot change public API]
- [ ] [Constraint 2: e.g., Must maintain backward compatibility]
- [ ] [Constraint 3: e.g., No changes to database schema]
### Risk Level
- [ ] Low - Minor changes, well-tested code
- [ ] Medium - Moderate changes, some risk
- [ ] High - Significant changes, careful attention needed
---
## Pre-Refactoring Checklist
### Test Coverage Assessment
| Metric | Current | Target | Status |
|--------|---------|--------|--------|
| Unit Test Coverage | __% | ≥80% | |
| Integration Tests | Yes/No | Yes | |
| All Tests Passing | Yes/No | Yes | |
### Required Before Starting
- [ ] All tests passing
- [ ] Code reviewed and understood
- [ ] Backup/version control in place
- [ ] User approval obtained
---
## Identified Code Smells
### Summary
| # | Smell | Location | Severity | Priority |
|---|-------|----------|----------|----------|
| 1 | [e.g., Long Method] | [file:line] | High | P1 |
| 2 | [e.g., Duplicate Code] | [file:line] | Medium | P2 |
| 3 | [e.g., Feature Envy] | [file:line] | Low | P3 |
### Detailed Analysis
#### Smell #1: [Name]
**Location**: `path/to/file.js:45-120`
**Description**: [Detailed description of the problem]
**Impact**:
- [Impact 1]
- [Impact 2]
**Proposed Solution**: [Brief overview of how to fix]
---
## Refactoring Phases
### Phase A: Quick Wins (Low Risk)
**Objective**: Simple improvements with immediate value
**Estimated Changes**: [X files, Y methods]
**User Approval Required**: Yes / No
| # | Task | File | Refactoring | Status |
|---|------|------|-------------|--------|
| A1 | Rename variable `x` to `userCount` | utils.js:15 | Rename Variable | [ ] |
| A2 | Remove unused `oldHandler()` | api.js:89 | Remove Dead Code | [ ] |
| A3 | Extract duplicate validation | form.js:23,67 | Extract Method | [ ] |
**Rollback Plan**: Revert commits A1-A3
---
### Phase B: Structural Improvements (Medium Risk)
**Objective**: Improve code organization and clarity
**Estimated Changes**: [X files, Y methods]
**User Approval Required**: Yes
**Dependencies**: Phase A must be complete
| # | Task | File | Refactoring | Status |
|---|------|------|-------------|--------|
| B1 | Extract `calculatePrice()` from long method | order.js:45 | Extract Method | [ ] |
| B2 | Introduce `OrderDetails` parameter object | order.js:12 | Introduce Parameter Object | [ ] |
| B3 | Move `formatAddress()` to Address class | customer.js:78 | Move Method | [ ] |
**Rollback Plan**: Revert to post-Phase-A commit
---
### Phase C: Architectural Changes (Higher Risk)
**Objective**: Address deeper structural issues
**Estimated Changes**: [X files, Y methods]
**User Approval Required**: Yes
**Dependencies**: Phases A and B must be complete
| # | Task | File | Refactoring | Status |
|---|------|------|-------------|--------|
| C1 | Replace price switch with polymorphism | pricing.js:30 | Replace Conditional with Polymorphism | [ ] |
| C2 | Extract `NotificationService` class | user.js:100 | Extract Class | [ ] |
**Rollback Plan**: Revert to post-Phase-B commit
---
## Detailed Refactoring Steps
### Task [ID]: [Task Name]
**Smell Addressed**: [Smell name]
**Refactoring Technique**: [Technique name]
**Risk Level**: Low / Medium / High
#### Context
**Before** (Current State):
```javascript
// Paste current code here
```
**After** (Expected State):
```javascript
// Paste expected code here
```
#### Step-by-Step Mechanics
1. [ ] **Step 1**: [Description]
- Test: Run tests after this step
- Expected: All tests pass
2. [ ] **Step 2**: [Description]
- Test: Run tests after this step
- Expected: All tests pass
3. [ ] **Step 3**: [Description]
- Test: Run tests after this step
- Expected: All tests pass
#### Verification
- [ ] All tests passing
- [ ] Behavior unchanged
- [ ] Code compiles
- [ ] No new warnings
#### Commit Message
```
refactor: [Describe the refactoring]
```
---
## Progress Tracking
### Phase Status
| Phase | Status | Started | Completed | Tests Passing |
|-------|--------|---------|-----------|---------------|
| A | Not Started / In Progress / Done | | | |
| B | Not Started / In Progress / Done | | | |
| C | Not Started / In Progress / Done | | | |
### Issues Encountered
| # | Issue | Resolution | Status |
|---|-------|------------|--------|
| 1 | [Description] | [How resolved] | Open / Resolved |
---
## Metrics Comparison
### Before Refactoring
| Metric | File 1 | File 2 | Total |
|--------|--------|--------|-------|
| Lines of Code | | | |
| Cyclomatic Complexity | | | |
| Maintainability Index | | | |
| Number of Methods | | | |
| Avg Method Length | | | |
### After Refactoring
| Metric | File 1 | File 2 | Total | Change |
|--------|--------|--------|-------|--------|
| Lines of Code | | | | |
| Cyclomatic Complexity | | | | |
| Maintainability Index | | | | |
| Number of Methods | | | | |
| Avg Method Length | | | | |
---
## Post-Refactoring Checklist
- [ ] All tests passing
- [ ] No new warnings or errors
- [ ] Code compiles successfully
- [ ] Manual verification completed
- [ ] Documentation updated (if needed)
- [ ] Code reviewed
- [ ] Metrics improved
- [ ] User sign-off obtained
---
## Lessons Learned
### What Went Well
- [Item 1]
- [Item 2]
### What Could Be Improved
- [Item 1]
- [Item 2]
### Recommendations for Future
- [Item 1]
- [Item 2]
---
## Approvals
| Role | Name | Date | Signature |
|------|------|------|-----------|
| Plan Author | | | |
| Technical Lead | | | |
| Product Owner | | | |
---
## Appendix
### A. Related Documentation
- [Link to relevant docs]
### B. Reference Materials
- [Link to code smells catalog]
- [Link to refactoring catalog]
### C. Tools Used
- [Testing framework]
- [Linting tools]
- [Complexity analysis tools]