Posts Tagged ‘APM’
How to Bypass Elasticsearch’s 10,000-Result Limit with the Scroll API
Why the 10,000-Result Limit Exists
What Is the Scroll API?
How to Use the Scroll API: Step by Step
Step 1: Start the Scroll
GET /my_index/_search?scroll=1m
{
"size": 1000,
"query": {
"match_all": {}
}
}
Step 2: Fetch More Results
POST /_search/scroll
{
"scroll": "1m",
"scroll_id": "c2NhbjsxMDAwO...YOUR_SCROLL_ID_HERE..."
}
Step 3: Clean Up
DELETE /_search/scroll/c2NhbjsxMDAwO...YOUR_SCROLL_ID_HERE...
A Real-World Example
GET /logs/_search?scroll=2m
{
"size": 500,
"query": {
"match": {
"error_message": "timeout"
}
}
}
-
Batch Size: Stick to a `size` like 500–1000. Too large, and you’ll strain memory; too small, and you’ll make too many requests.
-
Timeout Tuning: Set the scroll duration (e.g., `1m`, `5m`) based on how fast your script processes each batch. Too short, and the context expires mid-run.
-
Automation: Use a script to handle the loop. Python’s `elasticsearch` library, for instance, has a handy scroll helper:
from elasticsearch import Elasticsearch
es = Elasticsearch(["http://localhost:9200"])
scroll = es.search(index="logs", scroll="2m", size=500, body={"query": {"match": {"error_message": "timeout"}}})
scroll_id = scroll["_scroll_id"]
while len(scroll["hits"]["hits"]):
print(scroll["hits"]["hits"]) # Process this batch
scroll = es.scroll(scroll_id=scroll_id, scroll="2m")
scroll_id = scroll["_scroll_id"]
es.clear_scroll(scroll_id=scroll_id) # Cleanup
Why Scroll Beats the Alternatives
Conclusion
Elastic APM: When to Use @CaptureSpan vs. @CaptureTransaction?
If you’re working with Elastic APM in a Java application, you might wonder when to use `@CaptureSpan` versus `@CaptureTransaction`. Both are powerful tools for observability, but they serve different purposes.
🔹 `@CaptureTransaction`:
Use this at the entry point of a request, typically at a controller, service method, or a background job. It defines the start of a transaction and allows you to trace how a request propagates through your system.
🔹 `@CaptureSpan`:
Use this to track sub-operations within a transaction, such as database queries, HTTP calls, or specific business logic. It helps break down execution time and pinpoint performance bottlenecks inside a transaction.
📌 Best Practices:
✅ Apply @CaptureTransaction at the highest-level method handling a request.
✅ Use @CaptureSpan for key internal operations you want to monitor.
✅ Avoid excessive spans—instrument only critical code paths to reduce overhead.
By balancing these annotations effectively, you can get detailed insights into your app’s performance while keeping APM overhead minimal.