2025-06-04 최종 프로젝트 셀러 분석

데이터분석 6기/본캠프

2025-06-04 최종 프로젝트 셀러 분석

seyeon1130 2025. 6. 4. 18:15

우선 우리의 목적은 셀러를 간단하게 그룹화 한 후 고객 클러스터링을 진행해서 각 셀러 그룹에 어떤 고객들이 있는지 보는 것이 목표이다.

셀러별, 판매하는 상품 가격 분류

# 셀러별로 가격대 통계 계산
seller_products = df.drop_duplicates(['seller_id', 'product_id'])[['seller_id', 'product_id', 'price']]
seller_price_stats = seller_products.groupby('seller_id')['price'].agg(['median', 'std','count']).reset_index()

가격대 별로 싼 것만 파는 셀러, 비싼 것만 파는 셀러, 다양하게 파는 셀러로 나누기 위해서 상품가격의 중앙값과 표준편차를 구한다.

plt.figure(figsize=(10,6))
plt.scatter(
    seller_price_stats['median'],
    seller_price_stats['std'],
    alpha=0.7
)

plt.xlabel('Mean Price')
plt.ylabel('Price Std')
plt.title('Seller Price Mean vs Price Std')
plt.show()

모두 0에 엄청나게 밀집한 모습이 보인다.

제품 금액 별로 그룹화하기

q3 = seller_price_stats['std'].quantile(0.75)
q3_m = seller_price_stats['median'].quantile(0.75)
print(f"가격 표준편차 상위 25% 기준: {q3:.2f}")
print(f"가격 중앙값 상위 25% 기준: {q3_m:.2f}")

중앙값과 표준편차의 상위 25%를 확인한다.

가격 표준편차 상위 25% 기준: 109.82
가격 중앙값 상위 25% 기준: 159.99

저 기준을 토대로 그룹을 세 개로 나눈다.

diverse_seller :표준편차가 상위 25% 이상인 경우
high_price_seller : 중앙값이 25% 이상인 경우
low_price_seller : 그 외

def seller_group(row):
    if row['std'] >= 110:  
        return 'diverse_seller'
    elif row['median'] >= 160:  
        return 'high_price_seller'
    else:  
        return 'low_price_seller'
seller_price_stats['seller_type'] = seller_price_stats.apply(seller_group, axis=1)
seller_price_stats['seller_type'].value_counts(normalize=True)

저렴한 상품 위주로 파는 판매자가 66%, 다양하게 파는 판매자가 19%, 고가의 제품 위주로 파는 판매자가 14%정도이다.

# 색상 지정 (예시)
color_map = {
    'high_price_seller': 'red',
    'diverse_seller': 'blue',
    'low_price_seller': 'green'
}

plt.figure(figsize=(10,6))

# 각 셀러타입별로 그리기
for seller_type, color in color_map.items():
    subset = seller_price_stats[seller_price_stats['seller_type'] == seller_type]
    plt.scatter(subset['median'], subset['std'], label=seller_type, color=color, alpha=0.7)

plt.xlabel('Mean Price')
plt.ylabel('Price Std')
plt.title('Seller Types Scatter Plot')
plt.legend()
plt.show()

0에 수렴해서 잘 보이지 않지만

plt.figure(figsize=(10,6))

# 각 셀러타입별로 그리기
for seller_type, color in color_map.items():
    subset = seller_price_stats[seller_price_stats['seller_type'] == seller_type]
    plt.scatter(subset['median'], subset['std'], label=seller_type, color=color, alpha=0.7)

plt.xscale('log')
plt.yscale('log')
plt.xlabel('Mean Price')
plt.ylabel('Price Std')
plt.title('Seller Types Scatter Plot')
plt.legend()
plt.show()

로그로 변환하면 잘 보인다.

분류 완료!

'데이터분석 6기 > 본캠프' 카테고리의 다른 글

2025-06-05 최종 프로젝트 셀러 분류 (0)	2025.06.05
2025-06-05 spark 가상환경/virtual box, ubuntu, moba (1)	2025.06.05
2025-06-04 최종 프로젝트 데이터 전처리 (1)	2025.06.04
2025-06-03 공휴일 스터디 (1)	2025.06.03
2025-05-30 경쟁사 분석/고이비토 (0)	2025.05.30

현재글2025-06-04 최종 프로젝트 셀러 분석

seyeon1130 님의 블로그

2025-06-04 최종 프로젝트 셀러 분석

셀러별, 판매하는 상품 가격 분류

제품 금액 별로 그룹화하기

'데이터분석 6기 > 본캠프' 카테고리의 다른 글

'데이터분석 6기/본캠프'의 다른글

티스토리툴바

2025-06-04 최종 프로젝트 셀러 분석

셀러별, 판매하는 상품 가격 분류

제품 금액 별로 그룹화하기

'데이터분석 6기 > 본캠프' 카테고리의 다른 글

'데이터분석 6기/본캠프'의 다른글

관련글

티스토리툴바