Generate the dataframe for replicability:
df = pd.DataFrame(np.random.randn(50, 1000), columns=list('ABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDEDABCDABCDED'))
Check for normalcy of distribution of each variable (note: this takes a long time to run)
# Set the column names
columns= df.columns
# Loop over all columns
fig, axs = plt.subplots(len(df.columns), figsize=(5, 25))
for n, col in enumerate(df.columns):
df[col].hist(ax=axs[n])
Result generates illegible histograms and takes a very long time to run.
The length of time is okay, but I am curious if anyone has suggestions for generating legible histograms (do not have to be fancy), which can be quickly reviewed for the entire dataframe to ensure the normality of the distributions.
from How to generate legible plots in pandas when looping over columns?
0 komentar:
Posting Komentar