Requests Library Best Practices
pythonrequestshttp-clientbest-practicestesting
Description
This rule file outlines best practices for using the Python requests library, covering performance, security, code organization, and testing.
Globs
**/*.py
---
description: This rule file outlines best practices for using the Python requests library, covering performance, security, code organization, and testing.
globs: **/*.py
---
# Requests Library Best Practices
This document provides comprehensive guidelines for using the Python `requests` library effectively. It covers various aspects, including code organization, common patterns, performance considerations, security best practices, testing approaches, common pitfalls, and tooling.
## Library Information:
- Name: requests
- Tags: web, python, http-client
## 1. Code Organization and Structure
### 1.1 Directory Structure Best Practices:
* **Simple Scripts:** For simple, single-file scripts, the structure is less critical. Keep related resources (e.g., configuration files, data files) in the same directory.
* **Larger Projects:** For larger projects:
my_project/
├── src/
│ ├── __init__.py
│ ├── api_client.py # Contains requests-related functions
│ ├── models.py # Data models for API responses
│ ├── utils.py # Utility functions
│ └── config.py # Configuration settings
├── tests/
│ ├── __init__.py
│ ├── test_api_client.py
│ └── conftest.py # pytest configuration
├── README.md
├── requirements.txt
└── .gitignore
* `src/`: Contains the main application code.
* `api_client.py`: Houses the `requests` calls, session management, and error handling.
* `models.py`: Defines data classes/namedtuples to represent the structure of API responses, aiding type hinting and validation.
* `tests/`: Contains unit and integration tests.
* `requirements.txt`: Lists project dependencies.
### 1.2 File Naming Conventions:
* Use descriptive names for files related to `requests`, such as `api_client.py`, `http_utils.py`, or `<service_name>_client.py`.
* Follow PEP 8 guidelines: lowercase with underscores (e.g., `get_data.py`).
### 1.3 Module Organization Best Practices:
* **Grouping by Functionality:** Organize code into modules based on functionality. For example, a module for authentication, another for data retrieval, and another for error handling.
* **Avoiding Circular Dependencies:** Design modules to minimize dependencies between them and prevent circular imports.
### 1.4 Component Architecture Recommendations:
* **Layered Architecture:** Use a layered architecture to separate concerns:
* **Presentation Layer:** (If applicable) Handles user input and displays data.
* **Business Logic Layer:** Orchestrates the application logic and uses the data access layer.
* **Data Access Layer:** Contains the `requests` calls and interacts with external APIs. This layer should abstract away the details of the HTTP client.
* **Dependency Injection:** Use dependency injection to provide the `requests` session or client to components that need it, allowing for easier testing and configuration.
### 1.5 Code Splitting Strategies:
* **By API Endpoint:** If your application interacts with multiple API endpoints, consider creating separate modules or classes for each endpoint to improve maintainability.
* **Functional Decomposition:** Split complex tasks into smaller, manageable functions. This promotes reusability and testability.
## 2. Common Patterns and Anti-patterns
### 2.1 Design Patterns Specific to `requests`:
* **Singleton (for Session):** Use a singleton pattern to manage a single `requests.Session` instance, ensuring connection pooling is used effectively across the application. Be mindful of potential thread-safety issues with singletons in multithreaded environments.
* **Adapter Pattern:** Create an adapter class around the `requests` library to abstract away the underlying HTTP client. This makes it easier to switch to a different library in the future or add custom logic.
* **Retry Pattern:** Implement retry logic with exponential backoff to handle transient network errors and API rate limits.
### 2.2 Recommended Approaches for Common Tasks:
* **Fetching Data:** Use `requests.get()` for retrieving data. Handle potential errors using `response.raise_for_status()` and appropriate exception handling.
* **Sending Data:** Use `requests.post()`, `requests.put()`, or `requests.patch()` for sending data. Set the `Content-Type` header appropriately (e.g., `application/json`).
* **Authentication:** Utilize the `auth` parameter for authentication. Use appropriate authentication schemes such as HTTP Basic Auth, Bearer tokens, or OAuth.
* **File Uploads:** Use the `files` parameter to upload files. Provide a dictionary where the keys are the field names and the values are file-like objects or tuples containing the filename and file-like object.
### 2.3 Anti-patterns and Code Smells to Avoid:
* **Ignoring Errors:** Not handling exceptions raised by `requests` can lead to unexpected behavior and application crashes.
* **Hardcoding URLs:** Hardcoding URLs makes the code less flexible and harder to maintain. Store URLs in configuration files or environment variables.
* **Not Using Sessions:** Failing to use `requests.Session()` for multiple requests to the same host can result in performance degradation due to repeated connection setups.
* **Disabling SSL Verification Unnecessarily:** Disabling SSL verification (`verify=False`) should only be done when absolutely necessary and with caution, as it can expose the application to man-in-the-middle attacks. Investigate the root cause of SSL verification failures instead of simply disabling it.
* **Excessive Retries without Backoff:** Retrying requests excessively without an exponential backoff strategy can overwhelm the server and worsen the situation.
* **Storing Sensitive Information in Code:** Avoid storing API keys, passwords, and other sensitive information directly in the code. Use environment variables or a secure configuration management system.
* **Not setting timeouts:** Not setting timeouts on requests can lead to your application hanging indefinitely if a server is unresponsive.
### 2.4 State Management Best Practices:
* **Stateless API Clients:** Design API clients to be stateless whenever possible. Avoid storing request-specific data within the client object. Pass all necessary data as arguments to the request methods.
* **Session Management:** Use `requests.Session()` to maintain state (e.g., cookies, authentication) across multiple requests.
### 2.5 Error Handling Patterns:
* **Catching Exceptions:** Catch `requests.exceptions.RequestException` and its subclasses (e.g., `requests.exceptions.HTTPError`, `requests.exceptions.ConnectionError`, `requests.exceptions.Timeout`) to handle different types of errors.
* **Using `raise_for_status()`:** Call `response.raise_for_status()` to raise an exception for HTTP error codes (4xx and 5xx).
* **Logging Errors:** Log detailed error messages, including the URL, status code, and response content, to aid in debugging.
* **Returning Meaningful Error Responses:** When creating your own APIs that use the `requests` library internally, return informative error responses to the client.
* **Retry Logic:** Implement retry logic for transient errors (e.g., network timeouts, temporary server errors) with exponential backoff.
## 3. Performance Considerations
### 3.1 Optimization Techniques:
* **Using Sessions:** Utilize `requests.Session()` to reuse connections and reduce overhead.
* **Connection Pooling:** The `requests.Session()` object automatically handles connection pooling, which improves performance by reusing existing connections.
* **Streaming Responses:** For large responses, use `stream=True` to process data incrementally and avoid loading the entire response into memory at once. Remember to close the response after processing.
* **Caching Responses:** Consider caching responses to reduce redundant API calls. Use libraries like `requests_cache` to store responses temporarily.
* **Using HTTP/2:** If the server supports HTTP/2, use the `HTTPX` library, which provides both synchronous and asynchronous support for HTTP/2.
* **Compression:** Ensure the server is using compression (e.g., gzip) and that the `Accept-Encoding` header is set appropriately in the request.
### 3.2 Memory Management Considerations:
* **Streaming Large Responses:** Use `stream=True` to avoid loading the entire response into memory.
* **Closing Responses:** Close the response object after processing the data to release resources. Use a `try...finally` block or a context manager to ensure the response is always closed.
* **Iterating over Chunks:** When streaming, iterate over the response content in chunks using `response.iter_content()` or `response.iter_lines()` to process data in smaller pieces.
### 3.3 Bundle Size Optimization:
* **Minimize Dependencies:** Only include the dependencies that are strictly necessary.
### 3.4 Lazy Loading Strategies:
* **Lazy Initialization of Clients:** Initialize the `requests` session or client only when it is first needed, rather than at application startup. This can improve startup time if the API client is not immediately required.
## 4. Security Best Practices
### 4.1 Common Vulnerabilities and How to Prevent Them:
* **Man-in-the-Middle Attacks:** Always use HTTPS to encrypt communication and prevent eavesdropping. Verify SSL certificates unless absolutely necessary to disable verification.
* **Cross-Site Scripting (XSS):** If displaying data from API responses in a web application, sanitize the data to prevent XSS attacks.
* **Server-Side Request Forgery (SSRF):** Avoid constructing URLs based on user input without proper validation. This can prevent attackers from making requests to internal resources.
* **Exposure of Sensitive Information:** Store API keys, passwords, and other sensitive information securely using environment variables or a secrets management system.
* **Denial of Service (DoS):** Implement timeouts and rate limiting to prevent attackers from overwhelming the server with requests.
### 4.2 Input Validation Best Practices:
* **Validating Input Data:** Validate all input data before sending it to the API. This can prevent injection attacks and other security vulnerabilities.
* **Sanitizing Input Data:** Sanitize input data to remove potentially malicious characters or code.
* **Using Prepared Statements:** When constructing database queries with data from API responses, use prepared statements to prevent SQL injection attacks.
### 4.3 Authentication and Authorization Patterns:
* **Using Secure Authentication Schemes:** Use strong authentication schemes such as OAuth 2.0 or JWT (JSON Web Tokens) to protect API endpoints.
* **Storing Credentials Securely:** Store API keys, passwords, and other credentials securely using environment variables, a secrets management system, or a hardware security module (HSM).
* **Implementing Authorization:** Implement authorization to control which users or applications have access to specific API endpoints.
* **Using HTTPS:** Always use HTTPS to encrypt communication and protect credentials during transmission.
### 4.4 Data Protection Strategies:
* **Encrypting Sensitive Data:** Encrypt sensitive data at rest and in transit.
* **Masking Sensitive Data:** Mask sensitive data in logs and error messages.
* **Complying with Data Privacy Regulations:** Ensure compliance with relevant data privacy regulations such as GDPR and CCPA.
### 4.5 Secure API Communication:
* **HTTPS:** Always use HTTPS for all API communication.
* **TLS Versions:** Ensure that the server and client support the latest TLS versions (TLS 1.2 or 1.3) and disable older, insecure versions (SSLv3, TLS 1.0, TLS 1.1).
* **Cipher Suites:** Configure the server to use strong cipher suites that provide forward secrecy and authenticated encryption.
* **Certificate Pinning:** Consider using certificate pinning to prevent man-in-the-middle attacks by verifying the server's SSL certificate against a known good certificate.
## 5. Testing Approaches
### 5.1 Unit Testing Strategies:
* **Testing Individual Functions:** Write unit tests for individual functions that make `requests` calls.
* **Mocking `requests`:** Use mocking libraries like `unittest.mock` or `pytest-mock` to mock the `requests` library and isolate the code being tested.
* **Testing Error Handling:** Write unit tests to verify that the code handles different types of errors correctly.
* **Parametrizing Tests:** Use parametrization to test the same function with different inputs and expected outputs.
* **Test Data:** Create a set of test data (e.g., JSON files) to simulate API responses.
### 5.2 Integration Testing Approaches:
* **Testing API Client Integrations:** Write integration tests to verify that the API client integrates correctly with other components of the application.
* **Using a Test API Server:** Set up a test API server (e.g., using Flask or FastAPI) to simulate the real API and control the responses.
* **Verifying Data Integrity:** Verify that the data returned by the API is processed correctly and stored in the database or other data store.
### 5.3 End-to-End Testing Recommendations:
* **Testing Complete Workflows:** Write end-to-end tests to verify that complete workflows involving the API client work correctly.
* **Using Automation Tools:** Use automation tools like Selenium or Playwright to automate end-to-end tests.
### 5.4 Test Organization Best Practices:
* **Separating Tests:** Separate unit tests, integration tests, and end-to-end tests into different directories or files.
* **Using a Test Runner:** Use a test runner like pytest to discover and run tests automatically.
* **Following Test Naming Conventions:** Follow consistent test naming conventions to make it easier to understand the purpose of each test.
### 5.5 Mocking and Stubbing Techniques:
* **Mocking the `requests` Library:** Use mocking to replace the `requests` library with a mock object that returns predefined responses.
* **Using Mock Objects:** Create mock objects to simulate the behavior of external APIs.
* **Stubbing Functions:** Use stubbing to replace functions with simplified versions that return predefined values.
* **Using Context Managers for Mocking:** Use context managers to temporarily replace objects with mock objects during tests.
## 6. Common Pitfalls and Gotchas
### 6.1 Frequent Mistakes Developers Make:
* **Not Handling Exceptions:** Ignoring exceptions raised by `requests` can lead to unexpected behavior and application crashes.
* **Not Using Sessions:** Failing to use `requests.Session()` for multiple requests to the same host can result in performance degradation.
* **Not Setting Timeouts:** Not setting timeouts on requests can lead to your application hanging indefinitely if a server is unresponsive.
* **Disabling SSL Verification Unnecessarily:** Disabling SSL verification (`verify=False`) should only be done when absolutely necessary and with caution.
* **Not Handling Rate Limits:** Not handling API rate limits can lead to the application being blocked by the API provider.
* **Incorrectly Handling Character Encoding:** Failing to properly handle character encoding when dealing with non-ASCII characters in request or response bodies.
### 6.2 Edge Cases to Be Aware Of:
* **Handling Redirects:** Be aware of how `requests` handles redirects by default (following them). If you need to control redirect behavior, use the `allow_redirects` parameter.
* **Dealing with Large Payloads:** When sending or receiving large payloads, use streaming to avoid memory issues.
* **Handling Keep-Alive Connections:** Be aware that `requests` uses keep-alive connections by default, which can lead to issues if the server closes the connection unexpectedly.
* **Proxy Configuration:** Properly configure proxies if your application needs to access the internet through a proxy server.
### 6.3 Version-Specific Issues:
* **Changes in API:** Be aware of changes in the `requests` API between different versions. Consult the release notes for each version to identify any breaking changes.
* **Dependency Conflicts:** Be aware of potential dependency conflicts between `requests` and other libraries in your project. Use a virtual environment to isolate dependencies.
### 6.4 Compatibility Concerns:
* **Python Versions:** Ensure that the version of `requests` you are using is compatible with the version of Python you are using.
* **Operating Systems:** Be aware of potential compatibility issues between `requests` and different operating systems (e.g., Windows, Linux, macOS).
### 6.5 Debugging Strategies:
* **Logging Requests and Responses:** Log detailed information about requests and responses, including URLs, headers, and bodies, to aid in debugging.
* **Using Debugging Tools:** Use debugging tools such as `pdb` or `ipdb` to step through the code and inspect variables.
* **Capturing Network Traffic:** Use network traffic analysis tools like Wireshark or tcpdump to capture and analyze network traffic.
* **Using `httpbin.org` for Testing:** Use the `httpbin.org` service to test different types of requests and responses.
## 7. Tooling and Environment
### 7.1 Recommended Development Tools:
* **Virtual Environments:** Use virtual environments (e.g., `venv`, `virtualenv`) to isolate project dependencies.
* **pip:** Use `pip` to install and manage dependencies.
* **IDEs:** Use an IDE such as Visual Studio Code, PyCharm, or Sublime Text to write and debug code.
* **Debugging Tools:** Use debugging tools such as `pdb` or `ipdb` to step through the code and inspect variables.
### 7.2 Build Configuration Best Practices:
* **Using `requirements.txt`:** Use a `requirements.txt` file to specify project dependencies.
* **Using `setup.py`:** Use a `setup.py` file to define the project metadata and package the code for distribution.
* **Using a Build System:** Consider using a build system such as Make or Poetry to automate the build process.
### 7.3 Linting and Formatting Recommendations:
* **Using a Linter:** Use a linter such as `flake8` or `pylint` to identify potential code quality issues.
* **Using a Formatter:** Use a formatter such as `black` or `autopep8` to automatically format the code according to PEP 8 guidelines.
* **Configuring the IDE:** Configure the IDE to use the linter and formatter automatically.
### 7.4 Deployment Best Practices:
* **Using a Production Environment:** Deploy the application to a production environment that is separate from the development environment.
* **Using a Process Manager:** Use a process manager such as Supervisor or systemd to manage the application process.
* **Using a Load Balancer:** Use a load balancer to distribute traffic across multiple instances of the application.
* **Monitoring the Application:** Monitor the application for errors and performance issues.
### 7.5 CI/CD Integration Strategies:
* **Using a CI/CD System:** Use a CI/CD system such as Jenkins, GitLab CI, GitHub Actions, or CircleCI to automate the build, test, and deployment processes.
* **Running Tests Automatically:** Configure the CI/CD system to run tests automatically on every commit.
* **Deploying Automatically:** Configure the CI/CD system to deploy the application automatically to the production environment after the tests have passed.