What is the difference between regression and correlation analysis?+
Correlation measures the strength and direction of the relationship between two variables. Regression quantifies how much the dependent variable changes per unit change in an independent variable, while controlling for other variables simultaneously. Correlation between X and Y might be 0.65, but regression can tell you that 40% of that apparent relationship disappears when you control for a third variable Z. This is why regression is more actionable: it isolates the independent effect of each variable.
What R-squared value is "good enough" for product analytics?+
In product analytics, R-squared values of 0.15-0.40 are typical and useful. An R-squared of 0.30 means your model explains 30% of the variation in the outcome. The remaining 70% is explained by factors not in your model (user intent, market conditions, personal preferences). Unlike physical sciences where R-squared > 0.90 is expected, human behavior is inherently noisy. A model with R-squared = 0.25 that identifies 3 actionable levers is more valuable than a model with R-squared = 0.50 that uses uncontrollable variables.
When should I use logistic regression vs. linear regression?+
Use linear regression when your dependent variable is continuous (revenue, time, count of actions). Use logistic regression when your dependent variable is binary (converted/not, retained/churned, activated/not). The key difference: linear regression predicts a continuous value, logistic regression predicts a probability between 0 and 1. If you use linear regression on a binary outcome, you can get predicted values above 1 or below 0, which are nonsensical probabilities.
How do I handle categorical variables like signup source?+
Categorical variables must be encoded as dummy variables (also called indicator variables). If your categorical variable has K categories, create K-1 dummy variables. One category becomes the "reference category." All coefficients are interpreted relative to the reference. For example, with signup_source (Organic, Paid, Referral), if Organic is the reference, the coefficient for Paid tells you how Paid users differ from Organic users. Your tool likely handles this automatically (Python, R, and most BI tools create dummies when you specify a variable as categorical).
Can I run regression analysis in a spreadsheet?+
Yes, for simple cases. Google Sheets has `=LINEST()` for linear regression. Excel has the Data Analysis Toolpak with a full Regression function. Both work for 1-5 independent variables and up to 10,000 rows. For logistic regression, you need Python, R, or a BI tool. For anything with more than 5 variables or 10,000+ rows, switch to Python (statsmodels or sklearn) or R for better diagnostics and handling of categorical variables.