Netzwerk und APIs
HTTP-Requests sind das Rückgrat moderner Web-APIs. Dieses Kapitel behandelt synchrone Requests mit requests und asynchrone Requests mit aiohttp sowie Best Practices für API-Kommunikation.
1 Die requests-Bibliothek
requests ist die Standard-Bibliothek für HTTP-Requests in Python – einfach, elegant und mächtig.
1.1 Installation
pip install requests
1.2 Grundlegende GET-Requests
import requests
# Einfacher GET-Request
response = requests.get('https://api.github.com')
# Response-Attribute
print(response.status_code) # 200
print(response.text) # Response-Body als String
print(response.content) # Response-Body als Bytes
print(response.json()) # JSON automatisch parsen
print(response.headers) # Response-Headers
print(response.url) # Finale URL (nach Redirects)
print(response.encoding) # Encoding (z.B. 'utf-8')
1.3 Status-Code prüfen
response = requests.get('https://api.github.com/user')
# Manuell prüfen
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found')
# Automatisch Exception werfen bei Fehler (empfohlen)
response.raise_for_status() # Wirft HTTPError bei 4xx/5xx
# Mit try-except
try:
response = requests.get('https://api.example.com/data')
response.raise_for_status()
data = response.json()
except requests.exceptions.HTTPError as e:
print(f'HTTP Error: {e}')
except requests.exceptions.RequestException as e:
print(f'Request failed: {e}')
2 HTTP-Methoden
2.1 GET – Daten abrufen
# Mit Query-Parametern
params = {'q': 'python', 'sort': 'stars'}
response = requests.get('https://api.github.com/search/repositories',
params=params)
# URL: https://api.github.com/search/repositories?q=python&sort=stars
print(response.json()['total_count'])
2.2 POST – Daten senden
# JSON-Daten senden
data = {'title': 'foo', 'body': 'bar', 'userId': 1}
response = requests.post('https://jsonplaceholder.typicode.com/posts',
json=data)
print(response.status_code) # 201 Created
print(response.json())
# Formulardaten senden (application/x-www-form-urlencoded)
form_data = {'username': 'john', 'password': 'secret'}
response = requests.post('https://example.com/login', data=form_data)
2.3 PUT – Daten aktualisieren
# Gesamtes Objekt ersetzen
data = {'title': 'Updated Title', 'body': 'Updated Body', 'userId': 1}
response = requests.put('https://jsonplaceholder.typicode.com/posts/1',
json=data)
2.4 PATCH – Teilaktualisierung
# Nur einzelne Felder aktualisieren
data = {'title': 'New Title'}
response = requests.patch('https://jsonplaceholder.typicode.com/posts/1',
json=data)
2.5 DELETE – Ressource löschen
response = requests.delete('https://jsonplaceholder.typicode.com/posts/1')
print(response.status_code) # 200 oder 204 No Content
3 Headers und Authentication
3.1 Custom Headers
headers = {
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.get('https://api.example.com/data', headers=headers)
3.2 Basic Authentication
from requests.auth import HTTPBasicAuth
# Variante 1: Mit auth-Parameter
response = requests.get('https://api.example.com/user',
auth=('username', 'password'))
# Variante 2: Explizit mit HTTPBasicAuth
auth = HTTPBasicAuth('username', 'password')
response = requests.get('https://api.example.com/user', auth=auth)
3.3 Bearer Token (API Keys)
# Typisch für moderne APIs (OAuth, JWT)
token = 'your_api_token_here'
headers = {'Authorization': f'Bearer {token}'}
response = requests.get('https://api.github.com/user', headers=headers)
3.4 API Key in Query-Parameter
# Manche APIs nutzen Query-Parameter für Keys
params = {'api_key': 'your_api_key'}
response = requests.get('https://api.example.com/data', params=params)
4 Sessions – Persistente Verbindungen
Sessions ermöglichen wiederverwendbare Konfiguration und Connection-Pooling.
4.1 Session-Grundlagen
# Ohne Session (neue Verbindung pro Request)
response1 = requests.get('https://api.example.com/data')
response2 = requests.get('https://api.example.com/data')
# Mit Session (wiederverwendete Verbindung)
with requests.Session() as session:
# Headers gelten für alle Requests in der Session
session.headers.update({'Authorization': 'Bearer token123'})
response1 = session.get('https://api.example.com/data')
response2 = session.get('https://api.example.com/users')
# Beide Requests nutzen denselben Header
4.2 Session mit Cookies
session = requests.Session()
# Login (setzt Cookies)
login_data = {'username': 'user', 'password': 'pass'}
session.post('https://example.com/login', data=login_data)
# Weitere Requests nutzen automatisch die Cookies
response = session.get('https://example.com/dashboard')
# Cookies manuell setzen
session.cookies.set('session_id', 'abc123', domain='example.com')
4.3 Session-Konfiguration
session = requests.Session()
# Default-Parameter für alle Requests
session.headers.update({'User-Agent': 'MyApp/1.0'})
session.params = {'api_key': 'key123'} # Query-Parameter
session.verify = True # SSL-Zertifikate prüfen (Standard)
session.timeout = 10 # Timeout in Sekunden
# Alle Requests in Session nutzen diese Einstellungen
response = session.get('https://api.example.com/data')
5 Timeouts und Retries
5.1 Timeouts
# Timeout in Sekunden
try:
response = requests.get('https://api.example.com/slow', timeout=5)
except requests.exceptions.Timeout:
print('Request timed out')
# Verschiedene Timeouts für Connect und Read
response = requests.get('https://api.example.com/data',
timeout=(3.05, 10)) # (connect, read)
# Kein Timeout (nicht empfohlen!)
response = requests.get('https://api.example.com/data', timeout=None)
5.2 Automatische Retries
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Retry-Strategie konfigurieren
retry_strategy = Retry(
total=3, # Maximale Anzahl Retries
backoff_factor=1, # Wartezeit: 1s, 2s, 4s, ...
status_forcelist=[429, 500, 502, 503, 504], # Bei diesen Status-Codes
allowed_methods=["GET", "POST"] # Nur für diese Methoden
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
# Request mit automatischen Retries
response = session.get('https://api.example.com/data')
6 Datei-Uploads und Downloads
6.1 Datei hochladen
# Einfacher Upload
with open('document.pdf', 'rb') as f:
files = {'file': f}
response = requests.post('https://api.example.com/upload', files=files)
# Mit Dateinamen und MIME-Type
with open('image.jpg', 'rb') as f:
files = {
'file': ('custom_name.jpg', f, 'image/jpeg')
}
response = requests.post('https://api.example.com/upload', files=files)
# Mehrere Dateien
files = {
'file1': open('doc1.pdf', 'rb'),
'file2': open('doc2.pdf', 'rb')
}
response = requests.post('https://api.example.com/upload', files=files)
6.2 Datei herunterladen
# Kleine Dateien (komplette Response im RAM)
response = requests.get('https://example.com/image.jpg')
with open('downloaded_image.jpg', 'wb') as f:
f.write(response.content)
# Große Dateien (streaming)
response = requests.get('https://example.com/largefile.zip', stream=True)
with open('largefile.zip', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
# Mit Progress Bar (benötigt tqdm)
from tqdm import tqdm
response = requests.get('https://example.com/largefile.zip', stream=True)
total_size = int(response.headers.get('content-length', 0))
with open('largefile.zip', 'wb') as f, tqdm(
total=total_size, unit='B', unit_scale=True
) as pbar:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
pbar.update(len(chunk))
7 JSON-Handling
7.1 JSON empfangen
response = requests.get('https://api.github.com/users/torvalds')
# JSON automatisch parsen
data = response.json()
print(data['name']) # Linus Torvalds
print(data['location']) # Portland
# Fehlerbehandlung
try:
data = response.json()
except requests.exceptions.JSONDecodeError:
print('Response is not valid JSON')
7.2 JSON senden
# json-Parameter setzt automatisch Content-Type: application/json
payload = {
'name': 'John Doe',
'email': 'john@example.com',
'age': 30
}
response = requests.post('https://api.example.com/users', json=payload)
# Äquivalent zu:
import json
headers = {'Content-Type': 'application/json'}
response = requests.post('https://api.example.com/users',
data=json.dumps(payload),
headers=headers)
8 Error Handling
8.1 Exception-Hierarchie
from requests.exceptions import (
RequestException, # Basis-Exception
ConnectionError, # Verbindungsfehler
Timeout, # Timeout
HTTPError, # 4xx/5xx Status-Codes
TooManyRedirects, # Zu viele Redirects
URLRequired, # URL fehlt
)
def safe_request(url):
try:
response = requests.get(url, timeout=5)
response.raise_for_status()
return response.json()
except Timeout:
print('Request timed out')
except ConnectionError:
print('Failed to connect')
except HTTPError as e:
print(f'HTTP Error: {e.response.status_code}')
except RequestException as e:
print(f'Request failed: {e}')
return None
8.2 Status-Code Handling
response = requests.get('https://api.example.com/data')
# Kategorien prüfen
if response.ok: # 200-299
print('Success')
elif 400 <= response.status_code < 500:
print('Client error')
elif 500 <= response.status_code < 600:
print('Server error')
# Spezifische Codes
status_handlers = {
200: lambda: print('OK'),
201: lambda: print('Created'),
400: lambda: print('Bad Request'),
401: lambda: print('Unauthorized'),
404: lambda: print('Not Found'),
500: lambda: print('Server Error')
}
handler = status_handlers.get(response.status_code)
if handler:
handler()
9 Asynchrone Requests mit aiohttp
Für viele parallele Requests ist aiohttp deutlich schneller als requests.
9.1 Installation
pip install aiohttp
9.2 Einfacher GET-Request
import aiohttp
import asyncio
async def fetch_data(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
print(f'Status: {response.status}')
data = await response.json()
return data
# Ausführen
asyncio.run(fetch_data('https://api.github.com'))
9.3 Mehrere parallele Requests
import aiohttp
import asyncio
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.json()
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results
# Ausführen
urls = [
'https://api.github.com/users/torvalds',
'https://api.github.com/users/gvanrossum',
'https://api.github.com/users/kennethreitz',
]
results = asyncio.run(fetch_all(urls))
for result in results:
print(result['name'])
9.4 POST mit aiohttp
async def post_data(url, payload):
async with aiohttp.ClientSession() as session:
async with session.post(url, json=payload) as response:
return await response.json()
payload = {'title': 'Test', 'body': 'Content'}
result = asyncio.run(post_data('https://jsonplaceholder.typicode.com/posts',
payload))
9.5 Headers und Authentication
async def fetch_with_auth(url, token):
headers = {'Authorization': f'Bearer {token}'}
async with aiohttp.ClientSession(headers=headers) as session:
async with session.get(url) as response:
return await response.json()
# Timeout
timeout = aiohttp.ClientTimeout(total=10)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(url) as response:
data = await response.json()
9.6 Session wiederverwenden
async def fetch_multiple(urls):
# Session nur einmal erstellen (effizienter)
async with aiohttp.ClientSession() as session:
tasks = []
for url in urls:
task = asyncio.create_task(fetch_url(session, url))
tasks.append(task)
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
async def fetch_url(session, url):
try:
async with session.get(url) as response:
return await response.json()
except Exception as e:
return {'error': str(e)}
9.7 Rate Limiting mit Semaphore
import asyncio
import aiohttp
async def fetch_with_limit(session, url, semaphore):
async with semaphore: # Maximale parallele Requests begrenzen
async with session.get(url) as response:
return await response.json()
async def fetch_all_limited(urls, max_concurrent=5):
semaphore = asyncio.Semaphore(max_concurrent)
async with aiohttp.ClientSession() as session:
tasks = [fetch_with_limit(session, url, semaphore) for url in urls]
results = await asyncio.gather(*tasks)
return results
# Maximal 5 parallele Requests
urls = [f'https://api.example.com/item/{i}' for i in range(100)]
results = asyncio.run(fetch_all_limited(urls, max_concurrent=5))
10 Best Practices
10.1 Immer Timeouts setzen
# ❌ Schlecht: Kein Timeout (kann ewig hängen)
response = requests.get('https://api.example.com/data')
# ✅ Gut: Timeout definieren
response = requests.get('https://api.example.com/data', timeout=10)
# ✅ Besser: Verschiedene Timeouts
response = requests.get('https://api.example.com/data',
timeout=(3, 10)) # connect, read
10.2 Sessions für mehrere Requests
# ❌ Schlecht: Neue Verbindung pro Request
for i in range(100):
response = requests.get(f'https://api.example.com/item/{i}')
# ✅ Gut: Session wiederverwendet Verbindung
with requests.Session() as session:
for i in range(100):
response = session.get(f'https://api.example.com/item/{i}')
10.3 Error Handling nicht vergessen
# ✅ Immer raise_for_status() verwenden
try:
response = requests.get('https://api.example.com/data')
response.raise_for_status() # Exception bei 4xx/5xx
data = response.json()
except requests.exceptions.RequestException as e:
logger.error(f'API request failed: {e}')
10.4 User-Agent setzen
# Viele APIs blockieren Requests ohne User-Agent
headers = {'User-Agent': 'MyApp/1.0 (contact@example.com)'}
response = requests.get('https://api.example.com/data', headers=headers)
10.5 Secrets nicht im Code
# ❌ Schlecht: API-Key im Code
API_KEY = 'sk_live_abc123xyz'
# ✅ Gut: Aus Umgebungsvariablen
import os
API_KEY = os.getenv('API_KEY')
# ✅ Oder aus .env-Datei (mit python-dotenv)
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('API_KEY')
10.6 Rate Limiting respektieren
import time
from datetime import datetime, timedelta
class RateLimiter:
def __init__(self, max_calls, period):
self.max_calls = max_calls
self.period = period # in seconds
self.calls = []
def wait_if_needed(self):
now = datetime.now()
# Alte Calls entfernen
self.calls = [call for call in self.calls
if now - call < timedelta(seconds=self.period)]
if len(self.calls) >= self.max_calls:
sleep_time = (self.calls[0] + timedelta(seconds=self.period) - now).total_seconds()
time.sleep(sleep_time)
self.calls = []
self.calls.append(now)
# Verwendung: Max 10 Requests pro Minute
limiter = RateLimiter(max_calls=10, period=60)
for i in range(100):
limiter.wait_if_needed()
response = requests.get(f'https://api.example.com/item/{i}')
11 Praxisbeispiele
11.1 GitHub API Wrapper
class GitHubAPI:
BASE_URL = 'https://api.github.com'
def __init__(self, token=None):
self.session = requests.Session()
if token:
self.session.headers.update({
'Authorization': f'Bearer {token}',
'Accept': 'application/vnd.github.v3+json'
})
self.session.headers.update({'User-Agent': 'MyGitHubApp/1.0'})
def get_user(self, username):
response = self.session.get(f'{self.BASE_URL}/users/{username}')
response.raise_for_status()
return response.json()
def get_repos(self, username):
response = self.session.get(f'{self.BASE_URL}/users/{username}/repos')
response.raise_for_status()
return response.json()
def close(self):
self.session.close()
# Verwendung
api = GitHubAPI(token='ghp_xxxxx')
user = api.get_user('torvalds')
repos = api.get_repos('torvalds')
api.close()
11.2 REST API mit Pagination
def fetch_all_pages(base_url, params=None):
"""Holt alle Seiten einer paginierten API"""
all_data = []
page = 1
with requests.Session() as session:
while True:
params_with_page = {**(params or {}), 'page': page}
response = session.get(base_url, params=params_with_page)
response.raise_for_status()
data = response.json()
if not data: # Keine weiteren Daten
break
all_data.extend(data)
page += 1
return all_data
# Verwendung
all_users = fetch_all_pages('https://api.example.com/users')
11.3 Async Batch Processing
import aiohttp
import asyncio
from typing import List, Dict, Any
async def process_batch(items: List[str],
batch_size: int = 10) -> List[Dict[Any, Any]]:
"""Verarbeitet Items in Batches asynchron"""
results = []
async with aiohttp.ClientSession() as session:
for i in range(0, len(items), batch_size):
batch = items[i:i + batch_size]
tasks = [fetch_item(session, item) for item in batch]
batch_results = await asyncio.gather(*tasks, return_exceptions=True)
results.extend(batch_results)
return results
async def fetch_item(session, item_id):
url = f'https://api.example.com/items/{item_id}'
try:
async with session.get(url, timeout=5) as response:
return await response.json()
except Exception as e:
return {'error': str(e), 'item_id': item_id}
# Verwendung
item_ids = list(range(1, 101))
results = asyncio.run(process_batch(item_ids, batch_size=10))
11.4 Retry mit exponential backoff
import time
import random
def retry_with_backoff(func, max_retries=3, base_delay=1):
"""Führt Funktion mit exponential backoff retry aus"""
for attempt in range(max_retries):
try:
return func()
except requests.exceptions.RequestException as e:
if attempt == max_retries - 1: # Letzter Versuch
raise
# Exponential backoff mit Jitter
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
print(f'Retry {attempt + 1}/{max_retries} after {delay:.2f}s')
time.sleep(delay)
# Verwendung
def make_request():
response = requests.get('https://api.example.com/data', timeout=5)
response.raise_for_status()
return response.json()
data = retry_with_backoff(make_request, max_retries=3)
12 Vergleich: requests vs. aiohttp
| Kriterium | requests | aiohttp |
|---|---|---|
| Synchron/Async | Synchron (blocking) | Asynchron (non-blocking) |
| Performance (single) | ✅ Ausreichend | ⚠️ Etwas Overhead |
| Performance (parallel) | ❌ Langsam (sequenziell) | ✅ Sehr schnell |
| Einfachheit | ✅ Sehr einfach | ⚠️ Async-Kenntnisse nötig |
| Use Cases | Normale Scripts, CLI-Tools | Web Scraping, viele APIs |
| HTTP/2 Support | ❌ | ✅ (mit aioh2) |
| Ecosystem | ✅ Riesig | ✅ Wachsend |
Faustregel:
requests: Für normale Scripts, wenige (<10) Requests, Einfachheitaiohttp: Für viele parallele Requests (>50), Web Scraping, Performance-kritisch
13 Zusammenfassung
| Thema | Verwendung |
|---|---|
requests.get() | Daten von API abrufen |
requests.post() | Daten an API senden |
response.json() | JSON automatisch parsen |
response.raise_for_status() | Exception bei Fehler-Status werfen |
Session() | Verbindungen wiederverwenden |
timeout | Maximale Wartezeit definieren |
headers | Custom Headers, Authentication |
aiohttp | Asynchrone Requests für hohe Parallelität |
Kernprinzipien:
- Immer Timeouts setzen
- Sessions für mehrere Requests nutzen
- Error Handling nicht vergessen (
raise_for_status()) - Rate Limits respektieren
- API-Keys aus Umgebungsvariablen laden
- Für viele parallele Requests
aiohttpverwenden