[Crawling] 이불 판매 리뷰 수집

Notice

Recent Posts

Recent Comments

Link

« 2024/09 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

DeseoDeSeo

[Crawling] 이불 판매 리뷰 수집 본문

Python

[Crawling] 이불 판매 리뷰 수집

deseodeseo 2023. 8. 25. 12:41

페이지에 요청하기 위한 용도

import requests as req

html 데이터를 만들기 위한 용도

from bs4 import BeautifulSoup as bs

Response 403 출력되면 head 추가하기!

< 해당 홈페이지에서 f12 > 네트워크 > f5> 유형이 document찾아서 가장 하단의 user-agent복붙 >

head = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'}
url ='https://review4.cre.ma/bodyluv.kr/products/reviews?product_code=437&iframe_id=crema-product-reviews-2&widget_style=&app=0&parent_url=https%3A%2F%2Fbodyluv.kr%2Fproduct%2F%25EB%25B0%2594%25EB%2594%2594%25EB%259F%25BD-%25EB%2594%25A5%25EC%258A%25AC%25EB%25A6%25BD-%25EC%25BF%25A8-%25EC%259D%25B4%25EB%25B6%2588-v2-%25EC%2595%2588%25ED%258B%25B0%25EB%25B2%2584%25EA%25B7%25B8%2F437%2Fcategory%2F1%2Fdisplay%2F2%2F%23prdReview&nonmember_token=&secure_device_token=V248c157f8274c48ccb2b17f5da3686da5e1ecea643d2a478c8b234b4001828e3d053557ee4235b684a62ac66209cd5d64&iframe=1'
res = req.get(url,headers = head)
res

요청받은 정보 중 에서 html 정보 가져오기

res.text # 문자열 형식으로 가져옴
soup = bs(res.text, "lxml")
soup

여러개가 나온다 > ok
리뷰 하나만 나온다 > ok
아무 값도 없다 ? > nope
- > 리뷰를 수집할 때 많이 발생
= 한 페이지안에서 여러개의 페이지로 구성되어있을 때 많이 발생=> 새로고침할 때마다 많은 자원이 소모됨.

iframe
기존 페이지 안에 새로운 페이지를 삽입하는 태그

- 기존 페이지에서는 iframe 안에 있는 값을 가져올 수 없음.

- iframe 안에 있는 src속성의 url을 접근해야지 리뷰를 가져올 수 있음.

review = soup.select('div.review_list_v2__review_lcontent > div > div.review_list_v2__content_section > div > div.review_list_v2__content.review_content__collapsed > div > div > div.review_list_v2__message.js-collapsed-review-content.js-translate-text')
for i in review:
    print(i.text)

글의 시작에 enter + space 부분 有

이스케이프 코드
\n : 개행 , \t: 탭, \' \\ \" ..

strip : 문자열의 맨 앞과 맨 뒤의 공백 제거
- 아래의 코드를 실행시키면 앞부분의 공백 제거됨.

review[0]. text.strip()

< 어디서 새로운 리뷰가 시작되는지 볼 수 있도록 cnt를 print >

cnt =1
for i in review  :
    print(cnt)
    print(i.text.strip())
    cnt +=1

< 1~ 9 페이지까지 리뷰 수집하기 >

for page in range(1,10):
    url = f'https://review4.cre.ma/bodyluv.kr/products/reviews?app=0&iframe=1&iframe_id=crema-product-reviews-2&page={page}&parent_url=https%3A%2F%2Fbodyluv.kr%2Fproduct%2F%25EB%25B0%2594%25EB%2594%2594%25EB%259F%25BD-%25EB%2594%25A5%25EC%258A%25AC%25EB%25A6%25BD-%25EC%25BF%25A8-%25EC%259D%25B4%25EB%25B6%2588-v2-%25EC%2595%2588%25ED%258B%25B0%25EB%25B2%2584%25EA%25B7%25B8%2F437%2Fcategory%2F1%2Fdisplay%2F2%2F%23footer&product_code=437&secure_device_token=V2326a279a050a13359d72e1267ed9a2f0bc591dbf374baadc98cde1d55845c8480efca2d91d44f1a8935f8976e217e203&widget_env=100&widget_style='#res = req,get(url)
    #print(url)
    res = req.get(url)
    soup = bs(res.text, "lxml")
    review = soup.select('div.review_list_v2__review_lcontent > div > div.review_list_v2__content_section > div > div.review_list_v2__content.review_content__collapsed > div > div > div.review_list_v2__message.js-collapsed-review-content.js-translate-text')
    cnt =1
    print(page ,"페이지")
    for i in review  :
        print(cnt)
        print(i.text.strip())
        cnt +=1

'Python' 카테고리의 다른 글

[Machine Learning] BMI 학습 (0)	2023.08.25
[Machine Learning] 탐색적 데이터 분석, 모델링 (0)	2023.08.25
[ Machine Learning ] Ex01_and연산_학습하기 (0)	2023.08.24
[ Machine Learning ] 머신러닝 (1)	2023.08.24
[Crawling] 음원 순위 데이터 수집 (0)	2023.08.24

'Python' Related Articles

DeseoDeSeo

[Crawling] 이불 판매 리뷰 수집 본문

[Crawling] 이불 판매 리뷰 수집

< 어디서 새로운 리뷰가 시작되는지 볼 수 있도록 cnt를 print >

< 1~ 9 페이지까지 리뷰 수집하기 >

'Python' 카테고리의 다른 글

티스토리툴바