本文共 1821 个字数,平均阅读时长 ≈ 5分钟
numpy
导入numpy
库并简写为 np
(★☆☆)
(提示: import … as …
)
import numpy as np
创建一个长度为10的空向量 (★☆☆)
(提示: np.zeros
)
Z = np.zeros(10)
print(Z)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
创建一个长度为10并且除了第五个值为1的空向量 (★☆☆)
(提示: array[4]
)
Z = np.zeros(10)
Z[4] = 1
print(Z)
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
创建一个值域范围从10到49的向量(★☆☆)
(提示: np.arange
)
Z = np.arange(10,50)
print(Z)
[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]
反转一个向量(第一个元素变为最后一个) (★☆☆)
(提示: array[::-1]
)
Z = np.arange(50)
Z = Z[::-1]
print(Z)
[49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26
25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
1 0]
创建一个 3x3
并且值从0到8的矩阵(★☆☆)
(提示: reshape
)
Z = np.arange(9).reshape(3,3)
print(Z)
[[0 1 2]
[3 4 5]
[6 7 8]]
找到数组[1,2,0,0,4,0]
中非0元素的位置索引 (★☆☆)
(提示: np.where
)
nz = np.array([1,2,0,0,4,0])
nz = np.where(nz!=0)
print(nz)
(array([0, 1, 4]),)
创建一个 3x3x3
的随机数组 (★☆☆)
(提示: np.random.random
)
Z = np.random.random((3,3,3))
print(Z)
[[[0.49540183 0.03833072 0.17015454]
[0.53560863 0.00536714 0.76869732]
[0.57771647 0.00343808 0.07679618]]
[[0.51326329 0.34007645 0.31003736]
[0.05885512 0.61487165 0.86874288]
[0.37408803 0.24506961 0.50094522]]
[[0.56903475 0.12505482 0.5400201 ]
[0.46160486 0.00820837 0.56462576]
[0.10545321 0.17982915 0.89136815]]]
创建一个 10x10
的随机数组并找到它的最大值和最小值 (★☆☆)
(提示: min, max
)
Z = np.random.random((10,10))
Zmin, Zmax = Z.min(), Z.max()
print(Zmin, Zmax)
0.007735040835088136 0.9787372284134425
创建一个长度为30的随机向量并找到它的平均值 (★☆☆)
(提示: mean
)
Z = np.random.random(30)
m = Z.mean()
print(m)
0.4802736181542377
创建一个二维数组,其中边界值为1,其余值为0 (★☆☆)
(提示: array[1:-1, 1:-1]
)
Z = np.ones((10,10))
Z[1:-1,1:-1] = 0
print(Z)
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
以下表达式运行的结果分别是什么? (★☆☆)
(提示: NaN = not a number
)
0 * np.nan
np.nan == np.nan
np.nan - np.nan
0.3 == 3 * 0.1
print(0 * np.nan)
nan
print(np.nan == np.nan)
False
print(np.nan - np.nan)
nan
print(0.3 == 3 * 0.1)
False
创建一个 8x8
的矩阵,并且设置成棋盘样式 (★☆☆)
(提示: array[::2]
)
Z = np.zeros((8,8),dtype=int)
Z[1::2,::2] = 1
Z[::2,1::2] = 1
print(Z)
[[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
[1 0 1 0 1 0 1 0]]
对一个5x5
的随机矩阵做归一化(★☆☆)
(提示: min-max和z-score
)
Z = np.random.random((5,5))
Zmax, Zmin = Z.max(), Z.min()
Z = (Z - Zmin)/(Zmax - Zmin)
print(Z)
[[0.28005346 0.76908806 0.94344695 0.64199284 0.12711646]
[0.51537725 0.12372809 0.45086104 0.29787342 0.84346246]
[0.71279596 0.60373318 0.04030923 0. 0.80699155]
[1. 0.67174818 0.12411185 0.34138983 0.40129933]
[0.57061635 0.40793513 0.1658807 0.62630389 0.62997557]]
Z = np.random.random((5,5))
Zmean, Zstd = Z.mean(), Z.std()
Z = (Z - Zmean)/Zstd
print(Z)
[[-0.3881769 0.43273592 -1.1007492 0.11627415 1.20322183]
[-1.88538572 0.3273551 1.05532266 0.78136778 0.14407397]
[ 0.85746185 -1.55078272 -0.50247391 1.02981358 -0.99543472]
[-0.42427793 0.02952572 0.41339576 -0.11907466 0.28742804]
[ 1.1604349 -1.77222348 1.31790618 -1.68842404 1.27068583]]
给定一个一维数组,对其在3到8之间的所有元素取反 (★☆☆)
(提示: >, <=
)
Z = np.arange(11)
Z[(3 < Z) & (Z <= 8)] = Z[(3 < Z) & (Z <= 8)] * (-1)
print(Z)
[ 0 1 2 3 -4 -5 -6 -7 -8 9 10]
下面脚本运行后的结果是什么? (★☆☆)
(提示: np.sum)
print(sum(range(5),-1))
from numpy import *
print(sum(range(5),-1))
print(sum(range(5),-1))
9
from numpy import *
print(sum(range(5),-1))
10
考虑一个整数向量Z,下列表达合法的是哪个? (★☆☆)
Z**Z
2 << Z >> 2
Z <- Z
Z/1/1
Z<Z>Z
Z = np.arange(5)
Z ** Z # legal
array([ 1, 1, 4, 27, 256])
Z = np.arange(5)
2 << Z >> 2 # legal
array([0, 1, 2, 4, 8])
Z = np.arange(5)
Z <- Z # legal
array([False, False, False, False, False])
Z = np.arange(5)
Z/1/1 # legal
array([0., 1., 2., 3., 4.])
Z = np.arange(5)
Z<Z>Z # false
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/47/5yry4px511s8gxnw_31hzwlw0000gn/T/ipykernel_31933/1838066248.py in <module>
1 Z = np.arange(5)
----> 2 Z<Z>Z # false
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
下列表达式的结果分别是什么?(★☆☆)
np.array(0) / np.array(0)
np.array(0) // np.array(0)
np.array([np.nan]).astype(int)
print(np.array(0) / np.array(0))
nan
/var/folders/47/5yry4px511s8gxnw_31hzwlw0000gn/T/ipykernel_31933/4120864939.py:1: RuntimeWarning: invalid value encountered in true_divide
print(np.array(0) / np.array(0))
print(np.array(0) // np.array(0))
0
/var/folders/47/5yry4px511s8gxnw_31hzwlw0000gn/T/ipykernel_31933/4108562882.py:1: RuntimeWarning: divide by zero encountered in floor_divide
print(np.array(0) // np.array(0))
print(np.array([np.nan]).astype(int))
[-9223372036854775808]
用三种不同的方法去提取一个随机数组的整数部分(★★☆)
(提示: %, np.floor, np.ceil
)
Z = np.random.uniform(0,10,10)
print (Z - Z%1)
[5. 3. 8. 7. 3. 8. 7. 3. 3. 5.]
print (np.floor(Z))
[5. 3. 8. 7. 3. 8. 7. 3. 3. 5.]
print (np.ceil(Z)-1)
[5. 3. 8. 7. 3. 8. 7. 3. 3. 5.]
创建一个5x5
的矩阵,其中每行的数值范围从0到4 (★★☆)
(提示: np.arange
)
Z = np.zeros((5,5))
Z += np.arange(5)
print (Z)
[[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]]
创建一个长度为10的随机向量,并将其排序 (★★☆)
(提示: sort
)
Z = np.random.random(10)
Z.sort()
print (Z)
[0.05948668 0.06772389 0.18053073 0.3690779 0.43207858 0.59212272
0.61474614 0.64012558 0.67395373 0.72028118]
创建一个长度为10的向量,并将向量中最大值替换为1 (★★☆)
(提示: argmax
)
Z = np.random.random(10)
Z[Z.argmax()] = 1
print (Z)
[0.2322195 0.72417001 0.54942971 0.83360414 1. 0.05253964
0.91218709 0.64805915 0.73789832 0.09189523]
减去一个矩阵中的每一行的平均值 (★★☆)
(提示: mean(axis=
)
X = np.random.rand(5, 10)
Y = X - X.mean(axis=1).reshape(5, 1)
print (Y)
[[ 0.02477607 -0.44978471 0.35835226 -0.11303505 0.41311033 -0.04494334
0.37069344 -0.33173567 -0.2418161 0.01438276]
[-0.21299384 0.32298826 0.00319469 -0.24739807 -0.36345109 -0.28780506
0.37067288 -0.08722747 0.57116507 -0.06914538]
[ 0.32097414 -0.13742142 0.00140244 -0.45716547 0.47243945 -0.19557543
0.29747343 -0.37215733 -0.02343571 0.0934659 ]
[ 0.05670052 0.22122356 0.02275285 -0.11814782 -0.12763654 -0.22702062
-0.2752823 0.21867024 -0.20739751 0.43613762]
[ 0.50697231 0.19782196 0.01629516 -0.24005828 -0.27824507 -0.08727165
-0.26113724 -0.29081586 0.44296733 -0.00652866]]
如何通过第n列对一个数组进行排序? (★★☆)
(提示: argsort
)
Z = np.random.randint(0,10,(3,3))
print (Z)
print (Z[Z[:,1].argsort()])
[[4 3 3]
[1 0 1]
[7 8 4]]
[[1 0 1]
[4 3 3]
[7 8 4]]
考虑一个向量[1,2,3,4,5]
,如何建立一个新的向量,在这个新向量中每个值之间有3个连续的零?(★★★)
(提示: array[::4]
)
Z = np.array([1,2,3,4,5])
nz = 3
Z0 = np.zeros(len(Z) + (len(Z)-1)*(nz))
Z0[::nz+1] = Z
print (Z0)
[1. 0. 0. 0. 2. 0. 0. 0. 3. 0. 0. 0. 4. 0. 0. 0. 5.]
如何对一个数组中任意两行做交换? (★★★)
(提示: array[[]] = array[[]]
)
A = np.arange(25).reshape(5,5)
A[[0,1]] = A[[1,0]]
print (A)
[[ 5 6 7 8 9]
[ 0 1 2 3 4]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
如何通过滑动窗口计算一个数组的平均数? (★★★)
(提示: np.cumsum
)
def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
Z = np.arange(20)
print(moving_average(Z, n=3))
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.]
如何找到一个数组的第n个最大值? (★★★)
(提示: np.argsort
)
Z = np.arange(10000)
n = 5
print (Z[np.argsort(Z)[-n:]])
[9995 9996 9997 9998 9999]
考虑一个大向量Z
, 用两种不同的方法计算它的立方(★★★)
(提示: np.power, \*
)
x = np.random.rand()
np.power(x,3)
0.6061196882414749
# 方法2
x*x*x
0.6061196882414749
附1.考虑两个形状分别为(8,3)
和(2,2)
的数组A
和B
. 如何在数组A
中找到满足包含B
中元素的行?(不考虑B
中每行元素顺序)?
附2.随意生成一个(4,4)
的二维数组,找出最大的三个元素及其坐标,将结果以如下形式输出:{(行索引,列索引):元素值,...}
附3.随机生成两个二维数组,请问两个数组之间的相同元素有哪些,如果有相同元素,输出出来,若没有,输出“无相同元素”。
附4.现有一条线穿过P0(x1,y1)
和P1(x2,y2)
两个点,请计算点p(x3,y3)
到这条线的距离?
Pandas
将一个列表转换成Pandas的数据框
import pandas as pd
my_list=[('join',25,'male'),('lisa',30,'female'),('david','18','male')]
df=pd.DataFrame(my_list,columns=['Name','age','gender'])
print(df)
Name age gender
0 join 25 male
1 lisa 30 female
2 david 18 male
从一个CSV文件中读取数据到一个Pandas数据框
df=pd.read_csv('文件路径')
print(df)
查看一个Pandas数据框的行数和列数
import pandas as pd
df=pd.DataFrame({'A':[1,2,3],"B":[4,5,6],"C":[7,8,9]})
print(df.shape)
(3, 3)
查看一个Pandas数据框的列名
import pandas as pd
data={"name":['alex','box','chery'],'age':[18,20,12]}
df=pd.DataFrame(data)
print(df.columns)
Index(['name', 'age'], dtype='object')
查看一个Pandas数据框的索引
import pandas as pd
data={"name":['alex','box','chery'],'age':[18,20,12]}
df=pd.DataFrame(data)
print(df.index)
RangeIndex(start=0, stop=3, step=1)
从CSV文件中读取数据并读取前面部分数据
import pandas as pd
df=pd.read_csv("文件路径")
df.head(3)
查看一个Pandas数据框的数据类型
import pandas as pd
data={"name":['alex','bob','chery'],'age':[10,12,13]}
df=pd.DataFrame(data)
print(df.dtypes)
name object
age int64
dtype: object
查看一个Pandas数据框的数据摘要统计信息
import pandas as pd
df=pd.DataFrame({'A':[1,2,3,4,5],'B':[2.1,4.2,6.3,8.4,10.5],'C':['a','b','a','b','a']})
df.describe()
|
A |
B |
count |
5.000000 |
5.000000 |
mean |
3.000000 |
6.300000 |
std |
1.581139 |
3.320392 |
min |
1.000000 |
2.100000 |
25% |
2.000000 |
4.200000 |
50% |
3.000000 |
6.300000 |
75% |
4.000000 |
8.400000 |
max |
5.000000 |
10.500000 |
如何选择一个Pandas数据框的行?
import pandas as pd
df=pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
30 |
Paris |
2 |
Charlie |
35 |
London |
first_row=df.loc[0]
first_row
Name Alice
Age 25
City New York
Name: 0, dtype: object
fist_two=df.loc[[0,2],:]
fist_two
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
2 |
Charlie |
35 |
London |
sub=df.loc[[0,2],['Name','Age']]
sub
|
Name |
Age |
0 |
Alice |
25 |
2 |
Charlie |
35 |
如何选择一个Pandas数据框的列?
import pandas as pd
df=pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
30 |
Paris |
2 |
Charlie |
35 |
London |
df['Name']
0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object
df[['Name','Age']]
|
Name |
Age |
0 |
Alice |
25 |
1 |
Bob |
30 |
2 |
Charlie |
35 |
df.iloc[:,0]
0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object
df.iloc[:,1:3]
|
Age |
City |
0 |
25 |
New York |
1 |
30 |
Paris |
2 |
35 |
London |
如何选择一个Pandas数据框的行和列?
import pandas as pd
# 创建一个数据框
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
30 |
Paris |
2 |
Charlie |
35 |
London |
sub=df.loc[[0,2],['Name','Age']]
sub
|
Name |
Age |
0 |
Alice |
25 |
2 |
Charlie |
35 |
sub1=df.iloc[[0,2],[0,1]]
sub1
|
Name |
Age |
0 |
Alice |
25 |
2 |
Charlie |
35 |
如何筛选一个Pandas数据框的行?
import pandas as pd
# 创建一个数据框
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
30 |
Paris |
2 |
Charlie |
35 |
London |
bool_index=df["Age"]>25
bool_index
0 False
1 True
2 True
Name: Age, dtype: bool
filt=df[bool_index]
print(filt)
Name Age City
1 Bob 30 Paris
2 Charlie 35 London
如何筛选一个Pandas数据框的行和列?
import pandas as pd
# 创建一个数据框
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
30 |
Paris |
2 |
Charlie |
35 |
London |
# 选择年龄大于 25 岁的行以及 'Name' 和 'Age' 两列
sub=df.loc[df['Age']>25,['Name','Age']]
print(sub)
Name Age
1 Bob 30
2 Charlie 35
sub=df.loc[df['Name']=='Bob',['Age','City']]
sub
sub1=df.iloc[[0,2],[0,1]]
sub1
|
Name |
Age |
0 |
Alice |
25 |
2 |
Charlie |
35 |
sub1=df.iloc[1,1:]
sub1
Age 30
City Paris
Name: 1, dtype: object
如何根据某一列的值对一个Pandas数据框进行排序?
import pandas as pd
# 创建一个数据框
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 10, 35],
'City': ['New York', 'Paris', 'London']})
df
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
10 |
Paris |
2 |
Charlie |
35 |
London |
df_sort=df.sort_values('Age')
df_sort
|
Name |
Age |
City |
1 |
Bob |
10 |
Paris |
0 |
Alice |
25 |
New York |
2 |
Charlie |
35 |
London |
df_sorted=df.sort_values('Name',ascending='False')
df_sorted
|
Name |
Age |
City |
0 |
Alice |
25 |
New York |
1 |
Bob |
10 |
Paris |
2 |
Charlie |
35 |
London |
df_sorted=df.sort_values(['Age','Name'])
print(df_sorted)
Name Age City
1 Bob 10 Paris
0 Alice 25 New York
2 Charlie 35 London
如何对一个Pandas数据框进行聚合操作?
import pandas as pd
# 创建一个包含销售数据的数据框
df = pd.DataFrame({'Product': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
'SalesDate': ['2022-01-01', '2022-01-01', '2022-01-01',
'2022-01-02', '2022-01-02', '2022-01-02',
'2022-01-03', '2022-01-03', '2022-01-03'],
'SalesAmount': [100, 200, 150, 50, 75, 125, 300, 250, 200]})
df
|
Product |
SalesDate |
SalesAmount |
0 |
A |
2022-01-01 |
100 |
1 |
B |
2022-01-01 |
200 |
2 |
C |
2022-01-01 |
150 |
3 |
A |
2022-01-02 |
50 |
4 |
B |
2022-01-02 |
75 |
5 |
C |
2022-01-02 |
125 |
6 |
A |
2022-01-03 |
300 |
7 |
B |
2022-01-03 |
250 |
8 |
C |
2022-01-03 |
200 |
jhe_df=df.groupby('Product')['SalesAmount'].agg(['sum','mean','max'])
jhe_df
|
sum |
mean |
max |
Product |
|
|
|
A |
450 |
150.000000 |
300 |
B |
525 |
175.000000 |
250 |
C |
475 |
158.333333 |
200 |
如何对一个Pandas数据框进行合并操作
import pandas as pd
#解决数据输出时列名不对齐的问题
pd.set_option('display.unicode.east_asian_width', True)
df1 = pd.DataFrame({'编号':['mr001','mr002','mr003'],
'语文':[110,105,109],
'数学':[105,88,120],
'英语':[99,115,130]})
print(df1)
编号 语文 数学 英语
0 mr001 110 105 99
1 mr002 105 88 115
2 mr003 109 120 130
df2 = pd.DataFrame({'编号':['mr002','mr001','mr003','mr004'],
'体育':[34.5,39.7,38,45]})
print(df2)
编号 体育
0 mr002 34.5
1 mr001 39.7
2 mr003 38.0
3 mr004 45.0
cont_df=pd.concat([df1,df2],axis=0)
cont_df
|
编号 |
语文 |
数学 |
英语 |
体育 |
0 |
mr001 |
110.0 |
105.0 |
99.0 |
NaN |
1 |
mr002 |
105.0 |
88.0 |
115.0 |
NaN |
2 |
mr003 |
109.0 |
120.0 |
130.0 |
NaN |
0 |
mr002 |
NaN |
NaN |
NaN |
34.5 |
1 |
mr001 |
NaN |
NaN |
NaN |
39.7 |
2 |
mr003 |
NaN |
NaN |
NaN |
38.0 |
3 |
mr004 |
NaN |
NaN |
NaN |
45.0 |
如何在 Pandas 数据框中删除一列数据?
import pandas as pd
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df=pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
df.drop('height',axis=1,inplace=True)
print(df)
name age
0 Jack 24
1 Sarah 30
2 Mike 21
3 David 29
如何在 Pandas 数据框中添加一行数据?
import pandas as pd
data= {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df=pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
new_row={'name':'jeames','age':28,'height':181}
df.loc[len(df)]=new_row
print(df)
name age height
0 Jack 24 175
1 Sarah 30 165
2 Mike 21 180
3 David 29 170
4 jeames 28 181
如何在 Pandas 数据框中删除一行数据?
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
df.drop(1,inplace=True)
print(df)
name age height
0 Jack 24 175
2 Mike 21 180
3 David 29 170
如何在 Pandas 数据框中选择某个范围内的行?
import pandas as pd
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
new_df=df[1:4]
new_df
|
name |
age |
height |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
如何在 Pandas 数据框中选择某个范围内的行?
data={
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
data
{'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]}
df=pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
df.set_index('name',inplace=True)
new_df=df.loc['Sarah':'David']
print(new_df)
age height
name
Sarah 30 165
Mike 21 180
David 29 170
如何在 Pandas 数据框中按特定条件选择行?
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
new_df=df[df['height']>170]
print(new_df)
name age height
0 Jack 24 175
2 Mike 21 180
如何在 Pandas 数据框中对某一列进行排序?
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
new_df=df.sort_values(by='age',ascending=True)
print(new_df)
name age height
2 Mike 21 180
0 Jack 24 175
3 David 29 170
1 Sarah 30 165
如何在 Pandas 数据框中计算某一列的总和?
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
total_age=df['age'].sum()
total_age
104
如何在 Pandas 数据框中计算某一列的平均值
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David'],
'age': [24, 30, 21, 29],
'height': [175, 165, 180, 170]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
avg_age=df["age"].mean()
avg_age
26.0
如何在 Pandas 数据框中计算某一列的中位数?
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David', 'Zoe'],
'age': [24, 30, 21, 29, 28],
'height': [175, 165, 180, 170, 172]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
4 |
Zoe |
28 |
172 |
median_age=df["height"].median()
median_age
172.0
如何在 Pandas 数据框中计算某一列的标准差?
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David', 'Zoe'],
'age': [24, 30, 21, 29, 28],
'height': [175, 165, 180, 170, 172]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
4 |
Zoe |
28 |
172 |
std_age=df["age"].std()
std_age
3.7815340802378072
如何在 Pandas 数据框中计算某一列的方差?
import pandas as pd
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David', 'Zoe'],
'age': [24, 30, 21, 29, 28],
'height': [175, 165, 180, 170, 172]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
4 |
Zoe |
28 |
172 |
var_age=df['age'].var()
print(var_age)
14.299999999999999
如何在 Pandas 数据框中查找最大值和最小值?
import pandas as pd
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David', 'Zoe'],
'age': [24, 30, 21, 29, 28],
'height': [175, 165, 180, 170, 172]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
4 |
Zoe |
28 |
172 |
max_age=df['age'].max()
max_age
30
min_age=df['age'].min()
min_age
21
如何在 Pandas 数据框中查找特定行的最大值和最小值?
# 创建示例数据
data = {
'name': ['Jack', 'Sarah', 'Mike', 'David', 'Zoe'],
'age': [24, 30, 21, 29, 28],
'height': [175, 165, 180, 170, 172]
}
df = pd.DataFrame(data)
df
|
name |
age |
height |
0 |
Jack |
24 |
175 |
1 |
Sarah |
30 |
165 |
2 |
Mike |
21 |
180 |
3 |
David |
29 |
170 |
4 |
Zoe |
28 |
172 |
max_height=df.loc[2,'height'].max()
min_height=df.loc[2,'height'].min()
max_height
180
min_height
180
如何在 Pandas 数据框中替换特定值?
data = {'name': ['Jack', 'Sarah', 'Mike', 'David']}
df = pd.DataFrame(data)
df
|
name |
0 |
Jack |
1 |
Sarah |
2 |
Mike |
3 |
David |
df["name"]=df["name"].replace(to_replace=r"ck",value="bb",regex=True)
print(df)
name
0 Jabb
1 Sarah
2 Mike
3 David
如何在 Pandas 数据框中将特定值替换为缺失值?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e'], 'C': [0, 1, 2, 3, 4]})
df
|
A |
B |
C |
0 |
1 |
a |
0 |
1 |
2 |
b |
1 |
2 |
3 |
c |
2 |
3 |
4 |
d |
3 |
4 |
5 |
e |
4 |
import numpy as np
df=df.replace(3,np.NaN)
df
|
A |
B |
C |
0 |
1.0 |
a |
0.0 |
1 |
2.0 |
b |
1.0 |
2 |
NaN |
c |
2.0 |
3 |
4.0 |
d |
NaN |
4 |
5.0 |
e |
4.0 |
如何在 Pandas 数据框中填充缺失值?
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8]})
df
|
A |
B |
0 |
1.0 |
5.0 |
1 |
2.0 |
NaN |
2 |
NaN |
7.0 |
3 |
4.0 |
8.0 |
df.fillna(value=0,inplace=True)
print(df)
A B
0 1.0 5.0
1 2.0 0.0
2 0.0 7.0
3 4.0 8.0
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8]})
df
|
A |
B |
0 |
1.0 |
5.0 |
1 |
2.0 |
NaN |
2 |
NaN |
7.0 |
3 |
4.0 |
8.0 |
df.fillna(method='ffill',inplace=True)
print(df)
A B
0 1.0 5.0
1 2.0 5.0
2 2.0 7.0
3 4.0 8.0
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8]})
df
|
A |
B |
0 |
1.0 |
5.0 |
1 |
2.0 |
NaN |
2 |
NaN |
7.0 |
3 |
4.0 |
8.0 |
df.fillna(method='bfill',inplace=True)
print(df)
A B
0 1.0 5.0
1 2.0 7.0
2 4.0 7.0
3 4.0 8.0
# 创建数据框
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, 7, 8]})
df
|
A |
B |
0 |
1.0 |
5.0 |
1 |
2.0 |
NaN |
2 |
NaN |
7.0 |
3 |
4.0 |
8.0 |
df.fillna(value={'A':-1,'B':-2},inplace=True)
df
|
A |
B |
0 |
1.0 |
5.0 |
1 |
2.0 |
-2.0 |
2 |
-1.0 |
7.0 |
3 |
4.0 |
8.0 |
如何在 Pandas 数据框中删除缺失值?
import pandas as pd
df=pd.DataFrame({'A': [1, 2, None, 4],
'B': [5, None, 7, 8],
'C': [9, 10, 11, None]})
df
|
A |
B |
C |
0 |
1.0 |
5.0 |
9.0 |
1 |
2.0 |
NaN |
10.0 |
2 |
NaN |
7.0 |
11.0 |
3 |
4.0 |
8.0 |
NaN |
df_dropna=df.dropna()
print(df_dropna)
A B C
0 1.0 5.0 9.0
如何在 Pandas 中使用聚合函数?
import pandas as pd
data = {'Name':['Tom', 'Tom', 'Mary', 'Mary', 'Jack', 'Jack'],
'Subject':['Math', 'English', 'Math', 'English', 'Math', 'English'],
'Score':[80, 70, 85, 75, 90, 95]}
data
{'Name': ['Tom', 'Tom', 'Mary', 'Mary', 'Jack', 'Jack'],
'Subject': ['Math', 'English', 'Math', 'English', 'Math', 'English'],
'Score': [80, 70, 85, 75, 90, 95]}
df=pd.DataFrame(data)
df
|
Name |
Subject |
Score |
0 |
Tom |
Math |
80 |
1 |
Tom |
English |
70 |
2 |
Mary |
Math |
85 |
3 |
Mary |
English |
75 |
4 |
Jack |
Math |
90 |
5 |
Jack |
English |
95 |
gruped=df.groupby(['Name','Subject']).mean()
gruped
|
|
Score |
Name |
Subject |
|
Jack |
English |
95.0 |
Math |
90.0 |
Mary |
English |
75.0 |
Math |
85.0 |
Tom |
English |
70.0 |
Math |
80.0 |
如何在 Pandas 中进行分组和聚合?
data = {'Name':['Tom', 'Tom', 'Mary', 'Mary', 'Jack', 'Jack'],
'Subject':['Math', 'English', 'Math', 'English', 'Math', 'English'],
'Score':[80, 70, 85, 75, 90, 95]}
df = pd.DataFrame(data)
df
|
Name |
Subject |
Score |
0 |
Tom |
Math |
80 |
1 |
Tom |
English |
70 |
2 |
Mary |
Math |
85 |
3 |
Mary |
English |
75 |
4 |
Jack |
Math |
90 |
5 |
Jack |
English |
95 |
groupted=df.groupby(['Name'])['Score'].agg(['mean','max','min','count'])
print(groupted)
mean max min count
Name
Jack 92.5 95 90 2
Mary 80.0 85 75 2
Tom 75.0 80 70 2
如何在 Pandas 中进行数据类型转换?
data = {'Name':['Tom', 'Tom', 'Mary', 'Mary', 'Jack', 'Jack'],
'Subject':['Math', 'English', 'Math', 'English', 'Math', 'English'],
'Score':['80', '70', '85', '75', '90', '95']}
df = pd.DataFrame(data)
df
|
Name |
Subject |
Score |
0 |
Tom |
Math |
80 |
1 |
Tom |
English |
70 |
2 |
Mary |
Math |
85 |
3 |
Mary |
English |
75 |
4 |
Jack |
Math |
90 |
5 |
Jack |
English |
95 |
print(df.dtypes)
Name object
Subject object
Score object
dtype: object
df['Score']=df['Score'].astype(int)
print(df.dtypes)
Name object
Subject object
Score int64
dtype: object
如何在 Pandas 中使用一位有效编码(One-Hot Encoding)?
import pandas as pd
# 创建一个数据帧
data = {'Name':['Tom', 'Mary', 'Jack', 'Tom', 'Mary'],
'Gender':['M', 'F', 'M', 'M', 'F']}
df = pd.DataFrame(data)
# 对 Gender 列进行一位有效编码
gender_encoding = pd.get_dummies(df['Gender'], prefix='Gender')
# 将编码结果添加到原始数据帧中
df = pd.concat([df, gender_encoding], axis=1)
# 输出编码结果
print(df)
Name Gender Gender_F Gender_M
0 Tom M 0 1
1 Mary F 1 0
2 Jack M 0 1
3 Tom M 0 1
4 Mary F 1 0
如何在 Pandas 中使用 groupby 函数进行数据汇总?
import pandas as pd
# 创建数据帧
data = {'Year': [2018, 2018, 2019, 2019, 2020, 2020],
'Month': [1, 2, 1, 2, 1, 2],
'Sales': [100, 200, 300, 400, 500, 600]}
df = pd.DataFrame(data)
# 使用 groupby 函数创建分组对象
grouped = df.groupby(['Year', 'Month'])
# 对分组对象进行聚合操作
result = grouped.sum()
print(result)
Sales
Year Month
2018 1 100
2 200
2019 1 300
2 400
2020 1 500
2 600
如何在 Pandas 中使用 set_index 函数进行数据索引操作
import pandas as pd
df = pd.DataFrame(
{'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C': [1, 2, 3, 4, 5, 6, 7, 8],
'D': [8, 7, 6, 5, 4, 3, 2, 1]}
)
df
|
A |
B |
C |
D |
0 |
foo |
one |
1 |
8 |
1 |
bar |
one |
2 |
7 |
2 |
foo |
two |
3 |
6 |
3 |
bar |
three |
4 |
5 |
4 |
foo |
two |
5 |
4 |
5 |
bar |
two |
6 |
3 |
6 |
foo |
one |
7 |
2 |
7 |
foo |
three |
8 |
1 |
indexed=df.set_index(['A','B'])
print(indexed)
C D
A B
foo one 1 8
bar one 2 7
foo two 3 6
bar three 4 5
foo two 5 4
bar two 6 3
foo one 7 2
three 8 1
如何在 Pandas 中使用 reset_index 函数进行索引重置操作?
import pandas as pd
df = pd.DataFrame(
{'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C': [1, 2, 3, 4, 5, 6, 7, 8],
'D': [8, 7, 6, 5, 4, 3, 2, 1]}
)
df
|
A |
B |
C |
D |
0 |
foo |
one |
1 |
8 |
1 |
bar |
one |
2 |
7 |
2 |
foo |
two |
3 |
6 |
3 |
bar |
three |
4 |
5 |
4 |
foo |
two |
5 |
4 |
5 |
bar |
two |
6 |
3 |
6 |
foo |
one |
7 |
2 |
7 |
foo |
three |
8 |
1 |
indexed=df.set_index(['A','B'])
reseted=indexed.reset_index()
indexed
|
|
C |
D |
A |
B |
|
|
foo |
one |
1 |
8 |
bar |
one |
2 |
7 |
foo |
two |
3 |
6 |
bar |
three |
4 |
5 |
foo |
two |
5 |
4 |
bar |
two |
6 |
3 |
foo |
one |
7 |
2 |
three |
8 |
1 |
reseted
|
A |
B |
C |
D |
0 |
foo |
one |
1 |
8 |
1 |
bar |
one |
2 |
7 |
2 |
foo |
two |
3 |
6 |
3 |
bar |
three |
4 |
5 |
4 |
foo |
two |
5 |
4 |
5 |
bar |
two |
6 |
3 |
6 |
foo |
one |
7 |
2 |
7 |
foo |
three |
8 |
1 |
如何在 Pandas 中使用 agg 函数进行分组聚合操作?
import pandas as pd
# 创建示例数据
df = pd.DataFrame({
'column': ['A', 'A', 'B', 'B'],
'other_column': [1, 2, 3, 4]
})
df
|
column |
other_column |
0 |
A |
1 |
1 |
A |
2 |
2 |
B |
3 |
3 |
B |
4 |
gruped=df.groupby('column')
result = gruped.agg({'other_column': 'sum'})
result
|
other_column |
column |
|
A |
3 |
B |
7 |
如何在 Pandas 中使用 dropna 函数进行数据清理操作?
import pandas as pd
import pandas as pd
data = {'Name': ['Alice', np.nan, 'Charlie', 'Diana', 'Emily'],
'Age': [25, 30, 35, 40, 45],
'Email': ['alice@gmail.com', np.nan, 'charlie@hotmail.com', 'diana@gmail.com', 'emily@hotmail.com']}
df = pd.DataFrame(data)
df
|
Name |
Age |
Email |
0 |
Alice |
25 |
alice@gmail.com |
1 |
NaN |
30 |
NaN |
2 |
Charlie |
35 |
charlie@hotmail.com |
3 |
Diana |
40 |
diana@gmail.com |
4 |
Emily |
45 |
emily@hotmail.com |
df.dropna(axis='index',how='any',inplace=False)
|
Name |
Age |
Email |
0 |
Alice |
25 |
alice@gmail.com |
2 |
Charlie |
35 |
charlie@hotmail.com |
3 |
Diana |
40 |
diana@gmail.com |
4 |
Emily |
45 |
emily@hotmail.com |
如何在 Pandas 中使用 pd.to_excel 函数进行 Excel 数据写入操作?
import pandas as pd
data = {'Name': ['Tom', 'Jerry', 'Mickey', 'Donald'],
'Age': [28, 23, 31, 25],
'Gender': ['M', 'M', 'M', 'M']}
df=pd.DataFrame(data)
df
|
Name |
Age |
Gender |
0 |
Tom |
28 |
M |
1 |
Jerry |
23 |
M |
2 |
Mickey |
31 |
M |
3 |
Donald |
25 |
M |
df.to_excel("文件路径",index=False)
如何在pandas中分别对一列数据的正数和负数进行分组聚合操作?¶
import pandas as pd
df = pd.DataFrame({'values': [1, -2, 3, -4, 5]})
df['positive']=df['values']>0
df['negative']=df['values']<0
result=df.groupby(['positive','negative']).agg({'values':'sum'})
print(result)
values
positive negative
False True -6
True False 9
如何在pandas中实现字符串的叠加操作?
import pandas as pd
df=pd.DataFrame({'A': ['hello', 'world'], 'B': ['pandas', 'numpy']})
df
|
A |
B |
0 |
hello |
pandas |
1 |
world |
numpy |
df['C']=df['A']+df['B']
print(df)
A B C
0 hello pandas hellopandas
1 world numpy worldnumpy
如何对数据框中的字符串进行模糊匹配?
import pandas as pd
data = pd.DataFrame({'name': ['Alice', 'Bob', 'Cathy', 'Daniel', 'Emily'], 'score': [85, 73, 90, 82, 79]})
data
|
name |
score |
0 |
Alice |
85 |
1 |
Bob |
73 |
2 |
Cathy |
90 |
3 |
Daniel |
82 |
4 |
Emily |
79 |
matched_rows=data['name'].str.contains('a')
result=data[matched_rows]
result
|
name |
score |
2 |
Cathy |
90 |
3 |
Daniel |
82 |
matched_rows = data['name'].str.contains('a|e')
result = data[matched_rows]
result
|
name |
score |
0 |
Alice |
85 |
2 |
Cathy |
90 |
3 |
Daniel |
82 |
如何在数据框中找到重复的行?
import pandas as pd
data=pd.read_csv('文件路径')
data
data.duplicated()
如何查找数据框中的缺失值?
import pandas as pd
data=pd.read_csv('文件路径',encoding='gbk')
data
data.isnull().sum()
如何在pandas中使用groupby和agg函数进行分组聚合操作?
import pandas as pd
df=pd.DataFrame({
'Name': ['Tom', 'Jerry', 'Tom', 'Jerry', 'Tom', 'Jerry'],
'Gender': ['M', 'M', 'M', 'M', 'M', 'M'],
'Score': [80, 90, 75, 85, 70, 95]
})
df
|
Name |
Gender |
Score |
0 |
Tom |
M |
80 |
1 |
Jerry |
M |
90 |
2 |
Tom |
M |
75 |
3 |
Jerry |
M |
85 |
4 |
Tom |
M |
70 |
5 |
Jerry |
M |
95 |
result=df.groupby('Name').agg({'Score':['mean','max']})
result
|
Score |
|
mean |
max |
Name |
|
|
Jerry |
90.0 |
95 |
Tom |
75.0 |
80 |
如何在DataFrame中使用cumsum函数进行累加计算
import pandas as pd
data = {'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data)
df
|
A |
B |
C |
0 |
1 |
10 |
100 |
1 |
2 |
20 |
200 |
2 |
3 |
30 |
300 |
3 |
4 |
40 |
400 |
4 |
5 |
50 |
500 |
cumulative_sum=df.cumsum(axis=0)
cumulative_sum
|
A |
B |
C |
0 |
1 |
10 |
100 |
1 |
3 |
30 |
300 |
2 |
6 |
60 |
600 |
3 |
10 |
100 |
1000 |
4 |
15 |
150 |
1500 |
如何将DataFrame根据某列的值进行过滤?
import pandas as pd
data = {"name": ["Alice", "Bob", "Charlie", "David", "Emily"],
"score": [80, 90, 85, 95, 92],
"gender": ["F", "M", "M", "M", "F"]}
df = pd.DataFrame(data)
df
|
name |
score |
gender |
0 |
Alice |
80 |
F |
1 |
Bob |
90 |
M |
2 |
Charlie |
85 |
M |
3 |
David |
95 |
M |
4 |
Emily |
92 |
F |
df_filted=df.loc[df['gender']=='M']
print(df_filted)
name score gender
1 Bob 90 M
2 Charlie 85 M
3 David 95 M
如何使用Pandas DataFrame中的值计算新列
import pandas as pd
data = {"name": ["Alice", "Bob", "Charlie", "David", "Emily"],
"score": [80, 90, 85, 95, 92]}
df = pd.DataFrame(data)
df
|
name |
score |
0 |
Alice |
80 |
1 |
Bob |
90 |
2 |
Charlie |
85 |
3 |
David |
95 |
4 |
Emily |
92 |
df['weighted_score']=0.4*df['score']+0.6*100
print(df)
name score weighted_score
0 Alice 80 92.0
1 Bob 90 96.0
2 Charlie 85 94.0
3 David 95 98.0
4 Emily 92 96.8
matplotlib
导入matplotlib库简写为plt
import matplotlib.pyplot as plt
绘制一个柱状图
x = [1,2,3,4,5,6,7,8]
y = [3,1,4,5,8,9,7,2]
label=['A','B','C','D','E','F','G','H']
plt.bar(x,y,tick_label = label)

绘制一个水平方向柱状图
x = [1,2,3,4,5,6,7,8]
y = [3,1,4,5,8,9,7,2]
label=['A','B','C','D','E','F','G','H']
plt.barh(x,y,tick_label = label)

绘制x=(0,10)间sin的图像
import numpy as np
x = np.arange(0,10,0.1)
y = np.sin(x)
plt.plot(x, y, label='sin(x)')
plt.ylim(-1.5,1.5)
plt.xlabel('variable x')
plt.ylabel('value y')
plt.title('三角函数')
plt.grid()
plt.axhline(y=0.8,c='r')
plt.axvspan(xmin=4, xmax=6, facecolor='r', alpha=0.3) # 垂直x轴
plt.axhspan(ymin=-0.2, ymax=0.2, facecolor='y', alpha=0.3) # 垂直y轴
plt.text(3.2, 0, 'sin(x)', color='r')
plt.annotate('maximum',xy=(np.pi/2, 1),xytext=(np.pi/2+1, 1),
color='r',
arrowprops=dict(arrowstyle='->', color='r'))
plt.legend()
<matplotlib.legend.Legend at 0x7f8228a9e2e0>

绘制的图像$sin(x),sin(x+\pi /2),sin(x+\pi)$的图像,并只显示前2者的图例
y1 = np.sin(x)
y2 = np.sin(x + np.pi * 0.5)
y3 = np.sin(x + np.pi)
plt.plot(x, y1, label = 'first')
plt.plot(x, y2, label = 'second')
plt.plot(x, y3)
plt.legend()
<matplotlib.legend.Legend at 0x7f81f8173970>

评论 (0)