🔍Correlation coefficients are powerful tools for understanding relationships between variables, but with so many options available-Pearson, Spearman, Kendall, and more. How do you know which one is right for your data? Choosing the correct correlation coefficient can make or break your analysis, so let’s dive into how to pick the best one for your specific needs. 💡Understand the Types of Correlation Coefficients Before deciding, it’s important to understand the most common correlation measures: Pearson Correlation Coefficient 🔹Measures linear relationships between two continuous variables. 🔹Assumes: 🔹Variables are normally distributed. 🔹The relationship is linear. 🔹No significant outliers. 🔹Best for: Analyzing straightforward linear relationships, like height vs. weight or temperature vs. ice cream sales. Spearman Rank Correlation 🔹Measures monotonic relationships (whether linear or not) based on ranks rather than raw values. 🔹Does not assume normality or linearity. 🔹Best for: Non-linear but consistently increasing or decreasing trends, such as survey rankings or skewed data. Kendall’s Tau 🔹Measures the strength of association based on concordant and discordant pairs. 🔹Less sensitive to outliers compared to Pearson and Spearman. 🔹Best for: Small datasets, ordinal data, or when robustness to outliers is critical. Other Options 🔹Point-Biserial Correlation : For one continuous variable and one binary variable (e.g., gender vs. income). 🔹Phi Coefficient : For two binary variables (e.g., pass/fail vs. attended/didn’t attend training). ❓Ask the Right Questions About Your Data Choosing the best correlation coefficient starts with understanding your data and the goals of your analysis. Ask yourself: Are the Variables Continuous or Categorical? 🔹If both variables are continuous, Pearson or Spearman may work. 🔹If one or both variables are ordinal or categorical, consider Spearman or Kendall. Is the Relationship Linear or Non-Linear? 🔹Use Pearson if the relationship is linear. 🔹Use Spearman if the relationship is monotonic but not necessarily linear. Is the Data Normally Distributed? 🔹Pearson assumes normality. If your data isn’t normally distributed, Spearman or Kendall might be better choices. Are There Outliers? 🔹Pearson is sensitive to outliers; Spearman and Kendall are more robust. 🔹If outliers are present, consider using Spearman or Kendall. What’s the Sample Size? 🔹Kendall’s Tau performs well with small datasets, while Pearson and Spearman require larger samples for reliable results. 📈Visualize Your Data A scatter plot is your best friend when choosing a correlation coefficient. It helps you: 🔹Identify whether the relationship is linear, monotonic, or something else. 🔹Spot outliers that might influence the correlation. 🔹Determine if transformations (e.g., log scaling) are needed before calculating the coefficient.
Correlation Analysis in Engineering
Explore top LinkedIn content from expert professionals.
Summary
Correlation analysis in engineering helps professionals understand how different variables relate to each other, making it easier to predict outcomes, design reliable systems, and interpret real-world data. By quantifying the strength and direction of relationships, engineers can select the most suitable methods for analyzing their data, ranging from soil properties to risk modeling.
- Select smartly: Choose correlation methods based on your data's characteristics, such as linearity, distribution, and presence of outliers, to avoid misleading conclusions.
- Validate assumptions: Always check your results against site-specific or context-based information, since correlations can change depending on conditions and empirical factors.
- Apply to modeling: Integrate correlation analysis when simulating risk, estimating material properties, or designing systems to make predictions more reliable and grounded in real data relationships.
-
-
🥊 𝗧𝗵𝗲 𝗖𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻 𝗕𝗮𝘁𝘁𝗹𝗲 𝟮: 𝗣𝗲𝗮𝗿𝘀𝗼𝗻, 𝗦𝗽𝗲𝗮𝗿𝗺𝗮𝗻, 𝗞𝗲𝗻𝗱𝗮𝗹𝗹, 𝗕𝗶𝗰𝗼𝗿, 𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 "Is Pearson Lying to You? Do we need any other type of correlation due to outliers" 🧪 𝗟𝗲𝘁'𝘀 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺 𝗮 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻: I plotted: 📍X-axis: Experience (Years) 📍Y-axis: Income ($) Then, I gradually added an 𝗼𝘂𝘁𝗹𝗶𝗲𝗿, increasing its distance. And then it started changing its direction. And here's what I found 👇 🤯 𝗥𝗲𝘀𝘂𝗹𝘁𝘀? Let’s just say: Not all correlation metrics are built equal... 𝟭. 𝗣𝗲𝗮𝗿𝘀𝗼𝗻 🚨 Skyrocketed when outlier followed the trend 📉 Crashed hard when the outlier flipped direction ➡️ Most sensitive to outliers 𝟮. 𝗦𝗽𝗲𝗮𝗿𝗺𝗮𝗻 💪 Stayed steady as long as the rank order wasn’t destroyed ⚠️ Dropped only when outlier became extreme ➡️ More robust, but not invincible 𝟯. 𝗞𝗲𝗻𝗱𝗮𝗹𝗹 𝗧𝗮𝘂 🧱 Even more resistant than Spearman ✅ Only dropped under very aggressive distortion ➡️ Solid for ordinal or ranked data 𝟰. 𝗕𝗶𝘄𝗲𝗶𝗴𝗵𝘁 𝗠𝗶𝗱𝗰𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻 🛡️ Practically ignored the outlier 🧮 Downweighted its impact ➡️ Most robust among all 𝟱. 𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗖𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻 🔄 Captured complex, non-linear effects 📉 Dropped as dependency faded, not just due to linearity loss ➡️ Smart but moderately outlier-sensitive 𝗪𝗵𝗮𝘁 𝗶𝘁 𝘁𝗲𝗹𝗹𝘀 𝘂𝘀: • One extreme point can artificially inflate or destroy your correlation. • Just because Pearson is easy and standard doesn’t mean it’s always right. 💡 𝗠𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆: • Use Pearson for clean, linear data, or regret it later • Use Spearman or Kendall for monotonic trends or ranked data • Use Biweight or Distance when facing outliers, nonlinearities, or real-world noise 🤔 𝗢𝗽𝗲𝗻 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 𝗳𝗼𝗿 𝗬𝗼𝘂: • Which correlation metric do you trust when your data is noisy or complex? • Have you tried robust correlation methods in production models? • In which direction does an outlier hurt correlation the most? 📉 Left? Bottom? Diagonal? 📌 𝗧𝗟;𝗗𝗥: Don’t just trust the numbers — question them. Your model’s insights are only as good as the assumptions behind your stats. 🔁 Repost if your models have ever been fooled by a “strong” Pearson 💬 Comment your experiences with robust correlation 🔔 Follow me for more real-world simulations & data science breakdowns #DataScience #MachineLearning #FeatureEngineering #Correlation #Outliers #Statistics #EDA #DataAnalysis #Quant #MLInsights
-
📊 𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐢𝐧𝐠 𝐒𝐨𝐢𝐥 𝐌𝐨𝐝𝐮𝐥𝐮𝐬 𝐨𝐟 𝐄𝐥𝐚𝐬𝐭𝐢𝐜𝐢𝐭𝐲 (𝐄) 𝐟𝐫𝐨𝐦 𝐒𝐏𝐓 𝐍𝐮𝐦𝐛𝐞𝐫 (𝐍) In geotechnical engineering, one of the most common challenges is estimating the modulus of elasticity (E) for soil when laboratory test data is not available. Since the Standard Penetration Test (SPT) is performed at almost every site, engineers often rely on empirical correlations between E and N-values to estimate stiffness for design and numerical modeling. 📘 The figure below summarizes several well-known correlations developed by researchers such as Webb (1969), Ferrent (1963), Begemann (1974), and Kulhawy & Mayne (1990). These correlations provide quick estimation methods for sand, silt, and gravelly soils, linking field penetration resistance to deformation modulus. 💡 Key Insight: Although these relationships are widely used, it’s important to remember that they are empirical and depend on soil type, density, fines content, and stress history. Always validate with site-specific data or adjust using experience-based judgment. 🧠 Understanding and applying these correlations correctly can significantly improve the accuracy of settlement and deformation analyses — especially when using software like PLAXIS 2D/3D or other FEM tools. #GeotechnicalEngineering #SoilMechanics #SPT #PLAXIS #FiniteElementAnalysis #CivilEngineering #FoundationDesign #Geotech #SoilElasticModulus #NumericalModeling #EngineeringEducation
-
Risk, Monte Carlo Simulation, and Correlated Random Variables. When performing some Monte Carlo simulations, it may be of interest to simulate correlated random draws. Correlation structure is common in credit risk and quant finance in general. Many assets exhibit correlation with other quantities and correlation accounted for. Some common areas where this appears frequently: 1. Portfolio loss distribution e.g the Vasicek loan loss model that underlie the Basel regulatory capital for credit risk advanced IRB approach. The Vasicek model starts with risk Y, with a correlation parameter between borrower asset modelled as X and systematic factor modelled as U, called the Asymptotic Single Risk Factor (ASRF) model. 2. IFRS9 expected credit loss ECL estimation using Monte Carlo approach to generate scenarios. 3. Economic capital. Model correlation between various asset classes and model the "true" loss distribution including diversification. 4. Traded credit risk. When simulating the Potential Future Exposure PFE, the simulation takes correlation between assets. 5. Market risk. Similarly, when simulating risk measures especially on aggregate across assets, the correction is plugged in as well. 6. Pricing of exotic products with dependence structure requiring simulation. 7. Surely appears in other areas as well. For two variables, suppose X and Y are correlated with correlation rho. Then suppose X is a standard normal variable. Then the random variable Y's formula is given as Y=rho*X+sqrt(1-rho^2)*U where U is also standard normal as a proxy for Y before correlation. This is the result of the Cholesky Decomposition of the correlation matrix. For three variables or more, linear algebra formulation is better. We can sense check the formula by simple simulation in Python. Generate two series of random normal, apply the formula with defined rho. Then calculate the correlation and see if it matches the defined value. Happy reading..
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development