.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "packages/statistics/auto_examples/plot_wage_data.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_packages_statistics_auto_examples_plot_wage_data.py: Visualizing factors influencing wages ===================================== This example uses seaborn to quickly plot various factors relating wages, experience and eduction. Seaborn (https://seaborn.pydata.org) is a library that combines visualization and statistical fits to show trends in data. Note that importing seaborn changes the matplotlib style to have an "excel-like" feeling. This changes affect other matplotlib figures. To restore defaults once this example is run, we would need to call plt.rcdefaults(). .. GENERATED FROM PYTHON SOURCE LINES 16-22 .. code-block:: default # Standard library imports import os import matplotlib.pyplot as plt .. GENERATED FROM PYTHON SOURCE LINES 23-24 Load the data .. GENERATED FROM PYTHON SOURCE LINES 24-59 .. code-block:: default import pandas import requests if not os.path.exists('wages.txt'): # Download the file if it is not present r = requests.get('http://lib.stat.cmu.edu/datasets/CPS_85_Wages') with open('wages.txt', 'wb') as f: f.write(r.content) # Give names to the columns names = [ 'EDUCATION: Number of years of education', 'SOUTH: 1=Person lives in South, 0=Person lives elsewhere', 'SEX: 1=Female, 0=Male', 'EXPERIENCE: Number of years of work experience', 'UNION: 1=Union member, 0=Not union member', 'WAGE: Wage (dollars per hour)', 'AGE: years', 'RACE: 1=Other, 2=Hispanic, 3=White', 'OCCUPATION: 1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other', 'SECTOR: 0=Other, 1=Manufacturing, 2=Construction', 'MARR: 0=Unmarried, 1=Married', ] short_names = [n.split(':')[0] for n in names] data = pandas.read_csv('wages.txt', skiprows=27, skipfooter=6, sep=None, header=None, engine='python') data.columns = short_names # Log-transform the wages, because they typically are increased with # multiplicative factors import numpy as np data['WAGE'] = np.log10(data['WAGE']) .. GENERATED FROM PYTHON SOURCE LINES 60-61 Plot scatter matrices highlighting different aspects .. GENERATED FROM PYTHON SOURCE LINES 61-79 .. code-block:: default import seaborn seaborn.pairplot(data, vars=['WAGE', 'AGE', 'EDUCATION'], kind='reg') seaborn.pairplot(data, vars=['WAGE', 'AGE', 'EDUCATION'], kind='reg', hue='SEX') plt.suptitle('Effect of gender: 1=Female, 0=Male') seaborn.pairplot(data, vars=['WAGE', 'AGE', 'EDUCATION'], kind='reg', hue='RACE') plt.suptitle('Effect of race: 1=Other, 2=Hispanic, 3=White') seaborn.pairplot(data, vars=['WAGE', 'AGE', 'EDUCATION'], kind='reg', hue='UNION') plt.suptitle('Effect of union: 1=Union member, 0=Not union member') .. rst-class:: sphx-glr-horizontal * .. image-sg:: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_001.png :alt: plot wage data :srcset: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_001.png :class: sphx-glr-multi-img * .. image-sg:: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_002.png :alt: Effect of gender: 1=Female, 0=Male :srcset: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_002.png :class: sphx-glr-multi-img * .. image-sg:: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_003.png :alt: Effect of race: 1=Other, 2=Hispanic, 3=White :srcset: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_003.png :class: sphx-glr-multi-img * .. image-sg:: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_004.png :alt: Effect of union: 1=Union member, 0=Not union member :srcset: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_004.png :class: sphx-glr-multi-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 0.98, 'Effect of union: 1=Union member, 0=Not union member') .. GENERATED FROM PYTHON SOURCE LINES 80-81 Plot a simple regression .. GENERATED FROM PYTHON SOURCE LINES 81-85 .. code-block:: default seaborn.lmplot(y='WAGE', x='EDUCATION', data=data) plt.show() .. image-sg:: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_005.png :alt: plot wage data :srcset: /packages/statistics/auto_examples/images/sphx_glr_plot_wage_data_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 10.278 seconds) .. _sphx_glr_download_packages_statistics_auto_examples_plot_wage_data.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_wage_data.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_wage_data.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_