80+ ChatGPT Prompts for Data Analysis: From Raw Data to Insights (2026)

Data analysis often involves repetitive tasks — writing SQL queries, cleaning messy datasets, creating visualizations, and translating numbers into actionable insights. ChatGPT excels at all of these, serving as a data analysis co-pilot that writes code, explains statistical concepts, and helps you communicate findings effectively.

This guide provides ready-to-use prompts organized by the data analysis workflow: from initial exploration to final presentation.

How to Use ChatGPT for Data Analysis

ChatGPT works best for data analysis when you:

  1. Describe your data structure: Column names, data types, relationships
  2. State the business question: What you’re trying to answer
  3. Specify the tool: Python/pandas, SQL, Excel, R, or plain explanation
  4. Include sample data when possible: Paste 5-10 rows so ChatGPT understands the format

Pro tip: Use ChatGPT‘s Code Interpreter (Advanced Data Analysis) to upload actual CSV files for hands-on analysis.

Data Exploration Prompts


I have a dataset with these columns: [list columns with data types]. The dataset has [number] rows and represents [description]. Write Python/pandas code to:
1. Display basic statistics (mean, median, std, min, max) for all numeric columns
2. Show the distribution of categorical columns
3. Identify missing values and their percentages
4. Detect potential outliers using IQR method
5. Show correlations between numeric variables

Write SQL queries to explore a new database table called [table_name] with columns [list columns]:
1. Total row count
2. Distinct values for each categorical column
3. Min, max, avg for each numeric column
4. NULL counts per column
5. Top 10 most frequent values for [column]
6. Date range if date columns exist

I'm looking at [dataset description] for the first time. What are the 10 most important questions I should ask this data to understand the business? Consider patterns, anomalies, trends, and actionable insights.

Data Cleaning Prompts


Write Python/pandas code to clean this dataset. Known issues:
- Column [name] has inconsistent date formats (MM/DD/YYYY, YYYY-MM-DD, and text like "Jan 5, 2026")
- Column [name] has mixed case and trailing whitespace
- Column [name] has "N/A", "null", "-", and empty strings that should all be NaN
- Column [name] has negative values that are data entry errors
- Duplicate rows exist based on [key columns]
Handle each issue and print a summary of changes made.

Write a Python function that standardizes address data. Input columns: [street, city, state, zip]. The function should:
- Capitalize street names properly
- Expand abbreviations (St → Street, Ave → Avenue, Dr → Drive)
- Validate state abbreviations against US states
- Clean zip codes (handle 9-digit, missing leading zeros)
- Flag rows with suspicious data for manual review

Write SQL to clean a customer table where:
- Email addresses need validation (must contain @ and valid domain)
- Phone numbers are in mixed formats (need standardizing to (XXX) XXX-XXXX)
- Names have random capitalization
- Create a cleaned view with only valid records and a separate error log table

SQL Query Prompts


Write a SQL query for [database type: PostgreSQL/MySQL/SQLite] that:
- [Describe what you need]
- Tables involved: [list tables with key columns]
- Relationships: [describe joins]
- Filters: [conditions]
- Grouping: [how to aggregate]
- Sorting: [order by]
Include comments explaining each section. Optimize for performance on large tables.

Convert this business question into a SQL query:
"[Business question, e.g., 'What's our monthly revenue by product category for the last 12 months, and how does it compare to the same month last year?']"
Available tables:
- orders (id, customer_id, order_date, total_amount)
- order_items (id, order_id, product_id, quantity, unit_price)
- products (id, name, category, price)

Write a SQL query to identify customer churn risk. Using these tables:
- customers (id, signup_date, plan_type)
- orders (id, customer_id, order_date, amount)
- logins (customer_id, login_date)
Define "at risk" as: no orders in 60 days AND login frequency decreased by 50%+ compared to their average. Return customer details and risk indicators.

Python/Pandas Analysis Prompts


Write Python code using pandas and matplotlib to analyze [dataset description]. Create:
1. Time series plot of [metric] over [time period]
2. Bar chart comparing [metric] across [categories]
3. Scatter plot showing relationship between [variable 1] and [variable 2]
4. Heatmap of correlations between all numeric variables
5. Box plots showing distribution of [metric] by [group]
Style all plots with a clean, professional look. Save each as PNG.

Write a Python function that performs cohort analysis on user data:
- Input: DataFrame with columns [user_id, signup_date, activity_date, revenue]
- Output: Cohort retention table and visualization
- Cohorts defined by signup month
- Show retention rates for months 0-12
- Include both a data table and a heatmap visualization

Write Python code for an RFM (Recency, Frequency, Monetary) analysis:
- Input: Transaction data with [customer_id, transaction_date, amount]
- Calculate R, F, M scores (1-5 scale)
- Assign customer segments (Champions, Loyal, At Risk, Lost, etc.)
- Create a summary table with segment counts and revenue
- Visualize segments with a treemap or bar chart

Excel and Google Sheets Prompts


Write Excel formulas for a sales dashboard. My data is in Sheet1 with columns: Date (A), Salesperson (B), Product (C), Region (D), Amount (E), Units (F).

Create formulas for:
1. Total revenue this month vs. last month (with % change)
2. Top 5 salespeople by revenue (using LARGE/INDEX/MATCH)
3. Revenue by region using SUMIFS
4. Running total by date
5. Moving 7-day average of daily sales
6. Year-over-year comparison by month

Create a Google Sheets formula-based dashboard for tracking [metrics]. Include:
- QUERY function to filter and aggregate data
- SPARKLINE charts in cells for visual trends
- Conditional formatting rules for KPI thresholds
- Data validation dropdowns for filters
- IMPORTRANGE to pull from other sheets if needed
Provide formulas and setup instructions.

Statistical Analysis Prompts


I have two groups of data and want to know if the difference is statistically significant:
- Group A: [describe data, e.g., "conversion rates before the change"]
- Group B: [describe data, e.g., "conversion rates after the change"]
- Sample sizes: A=[n], B=[n]
Write Python code to:
1. Determine the appropriate statistical test (and explain why)
2. Run the test
3. Calculate effect size (Cohen's d or odds ratio)
4. Interpret the results in plain English
5. State the practical significance (not just statistical)

Explain and implement the following in Python with sample data:
- [Statistical concept, e.g., "A/B test analysis with Bayesian approach"]
Include:
1. When to use this method (and when not to)
2. Assumptions to check
3. Implementation code with comments
4. How to interpret the output
5. Common pitfalls
Write for someone who understands basic statistics but not this specific technique.

Data Visualization Prompts


I need to present [data description] to [audience: executives / technical team / clients]. Recommend the best chart type for each insight and explain why:
1. [Insight 1]
2. [Insight 2]
3. [Insight 3]
Then write Python code (matplotlib/seaborn or plotly) to create each visualization with professional styling.

Write Python/Plotly code to create an interactive dashboard showing:
- [Chart 1 description]
- [Chart 2 description]
- [Chart 3 description]
Include dropdown filters for [filter fields]. Use a clean color palette suitable for business presentations. The dashboard should work in a Jupyter notebook.

Reporting and Communication Prompts


I've completed an analysis of [topic]. Here are my key findings:
[List 5-7 findings with numbers]
Write an executive summary (one page) that:
- Opens with the business impact (bottom line first)
- Presents 3-4 key findings with context
- Includes specific recommendations tied to each finding
- Closes with next steps and timeline
- Uses non-technical language appropriate for C-level executives

Transform these raw numbers into a compelling data story:
[Paste your data or findings]
Structure as:
1. The hook: What's the most surprising or impactful finding?
2. Context: What's the situation and why does it matter?
3. Evidence: Walk through the data supporting the narrative
4. Implications: What does this mean for the business?
5. Call to action: What should we do about it?

Automation Prompts


Write a Python script that automates this weekly reporting task:
1. Connect to [database/API/CSV source]
2. Run these queries: [describe]
3. Clean and transform the results
4. Generate charts for key metrics
5. Create a formatted report (PDF or HTML)
6. Email the report to [recipients]
Include error handling, logging, and a configuration file for easy maintenance.

FAQ

Can I paste sensitive data into ChatGPT?

Be cautious with confidential data. For sensitive business data, use anonymized or sample data. ChatGPT (with data opt-out settings or Team/Enterprise plans) provides better privacy, but never paste PII (personally identifiable information), financial records, or trade secrets without checking your organization’s policy.

Is ChatGPT good at writing SQL?

Yes, ChatGPT writes accurate SQL for most common scenarios. It handles joins, subqueries, window functions, CTEs, and aggregations well. Always test generated queries on a development database before running in production. Complex database-specific optimizations may need manual review.

Can ChatGPT replace a data analyst?

No. ChatGPT is excellent at code generation, explanation, and first-pass analysis, but it cannot understand your business context, verify data accuracy, or make judgment calls about what analysis matters. It’s a productivity tool that makes analysts faster, not a replacement for analytical thinking.

How accurate are ChatGPT’s statistical recommendations?

Generally accurate for common techniques (t-tests, regression, correlation). For advanced methods, always verify the approach with a statistics reference. ChatGPT occasionally suggests inappropriate tests or misses assumption checks. Cross-reference with documentation for important analyses.

Should I use ChatGPT or Code Interpreter?

Use Code Interpreter (Advanced Data Analysis) when you have actual data files to analyze — it can execute code and show results directly. Use regular ChatGPT when you need code templates, explanations, or are working with data you can’t upload.

Conclusion

ChatGPT dramatically accelerates every stage of the data analysis workflow. From writing exploratory queries to creating publication-ready visualizations and executive summaries, these prompts help you focus on insights rather than implementation details.

Start with the prompts matching your most time-consuming tasks. For most analysts, that’s SQL query writing and data cleaning. As you build confidence, use ChatGPT for more complex analysis and automated reporting pipelines.

Ready to get started?

Try ChatGPT Free →

Find the Perfect AI Tool for Your Needs

Compare pricing, features, and reviews of 50+ AI tools

Browse All AI Tools →

Get Weekly AI Tool Updates

Join 1,000+ professionals. Free AI tools cheatsheet included.

Similar Posts