projectrules.ai

Model Implementation Standards

Machine LearningModel ImplementationPythonData ScienceBest Practices

Description

Guidelines for ml_models

Globs

**/*
---
description: Guidelines for ml_models
globs: **/*
---

# Model Implementation Standards

## Architecture
- Implement model classes with consistent interfaces
- Separate model definition from training logic
- Include preprocessing in model pipeline
- Ensure reproducibility through fixed seeds

## Model Types
- Ad scoring: Gradient boosting or neural networks preferred
- Account health: Time series models with anomaly detection
- Feature importance: Always include SHAP or similar
- Ensemble methods encouraged for critical predictions

## Documentation
- Document all hyperparameters with rationale
- Include training and validation metrics
- List feature dependencies explicitly
- Document model limitations clearly

## Example
```python
# Good example - well-structured model class
class AdScoreModel:
    """Model for predicting ad effectiveness scores.

    This model combines NLP features from ad content with historical
    performance metrics to predict effectiveness on a 0-100 scale.

    Attributes:
        feature_columns: List of features used by the model
        model_type: Type of model (gradient_boosting, neural_network)
        version: Model version identifier
    """

    def __init__(self, model_type="gradient_boosting", random_state=42):
        """Initialize the model with specified type.

        Args:
            model_type: Type of model to use
            random_state: Seed for reproducibility
        """
        self.model_type = model_type
        self.random_state = random_state
        self.model = self._initialize_model()
        self.feature_columns = []
        self.version = "1.0.0"

    def _initialize_model(self):
        """Create the underlying model based on model_type."""
        if self.model_type == "gradient_boosting":
            return XGBRegressor(random_state=self.random_state)
        elif self.model_type == "neural_network":
            return MLPRegressor(random_state=self.random_state)
        else:
            raise ValueError(f"Unsupported model type: {self.model_type}")

    def fit(self, X, y):
        """Train the model on provided data.

        Args:
            X: Feature matrix
            y: Target values

        Returns:
            Self for method chaining
        """
        self.feature_columns = list(X.columns)
        self.model.fit(X, y)
        return self

    def predict(self, X):
        """Generate predictions for input data.

        Args:
            X: Feature matrix

        Returns:
            Array of predicted ad scores
        """
        return self.model.predict(X)

    def get_feature_importance(self):
        """Return feature importance metrics.

        Returns:
            DataFrame with feature importance values
        """
        if hasattr(self.model, "feature_importances_"):
            return pd.DataFrame({
                'feature': self.feature_columns,
                'importance': self.model.feature_importances_
            }).sort_values('importance', ascending=False)
        else:
            return None