๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐ŸŒฑBackend

ํŒŒ์ด์ฌ FastAPI๋ฅผ ์ด์šฉํ•œ API ํ”„๋กœ์ ํŠธ ๋งŒ๋“ค๊ธฐ (1) - ์Šคํฌ๋ž˜ํ•‘ API ๋งŒ๋“ค๊ธฐ

by discphy 2024. 11. 26.
๋ฐ˜์‘ํ˜•

์ด์ „ ๊ธ€์—์„œ ์ฑ„์šฉ ์ •๋ณด์˜ ๋ชจ๋ธ(ORM)์„ ์„ค๊ณ„ํ•˜๊ณ ,
๊ฐ„๋‹จํ•˜๊ฒŒ ๋“ฑ๋ก & ์กฐํšŒํ•˜๋Š” API ๊นŒ์ง€ ๋งŒ๋“ค์–ด ๋ณด์•˜๋‹ค.

์ด์ „ ๊ธ€ : ํŒŒ์ด์ฌ FastAPI๋ฅผ ์ด์šฉํ•œ API ํ”„๋กœ์ ํŠธ ๋งŒ๋“ค๊ธฐ (0) - ์†Œ๊ฐœ ๋ฐ ์˜ˆ์ œ

๋‹ค์Œ์œผ๋กœ, ์ฑ„์šฉ ์ •๋ณด๋ฅผ ์ž๋™์œผ๋กœ ์Šคํฌ๋žฉํ•˜๊ณ  DB์— ์ €์žฅํ•˜๋Š” ์˜ˆ์ œ๋ฅผ ์•Œ์•„ ๋ณด๋„๋ก ํ•˜์ž.

์ฑ„์šฉ ์ •๋ณด์˜ HTML ๊ตฌ์กฐ - ์›ํ‹ฐ๋“œ

์›ํ‹ฐ๋“œ์—์„œ ์›ํ•˜๋Š” ๊ฒ€์ƒ‰์–ด(๊ธฐ์ˆ ์Šคํƒ?)๋กœ ์กฐํšŒํ•œ ๊ฒฐ๊ณผ ํŽ˜์ด์ง€์•ˆ์— ์žˆ๋Š” ํฌ์ง€์…˜ ํƒญ์˜ ์ฑ„์šฉ ์ •๋ณด๋ฅผ ์Šคํฌ๋ž˜ํ•‘ ํ•  ๊ฒƒ์ด๋‹ค.
URL ํŒจํ„ด์€ https://www.wanted.co.kr/search?query={keyword}&tab=position ์ด๋ ‡๊ฒŒ ๊ตฌ์„ฑ์ด ๋˜์–ด์žˆ๋Š”๋ฐ
์›ํ•˜๋Š” ํ‚ค์›Œ๋“œ๋งŒ URL์˜ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ „๋‹ฌํ•˜๋ฉด ๋  ๊ฒƒ ๊ฐ™๋‹ค.

๋‹ค์Œ ์ด๋ฏธ์ง€๋Š” โ€œflutterโ€ ํ‚ค์›Œ๋“œ๋กœ ๊ฒ€์ƒ‰ํ•œ ํ™”๋ฉด์ด๋‹ค. - https://www.wanted.co.kr/search?query=flutter&tab=position

์ด 47๊ฐœ์˜ ์ฑ„์šฉ ์ •๋ณด๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ ์œ„์˜ ํ™”๋ฉด์—์„œ๋Š” ํŽ˜์ด์ง•์ด ๋”ฐ๋กœ ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์Šคํฌ๋กค์„ ๋‚ด๋ฆด ๋•Œ ์ถ”๊ฐ€์ ์ธ ์ฑ„์šฉ ์ •๋ณด๊ฐ€ ๋กœ๋“œ๋œ๋‹ค.
์ฑ„์šฉ ์ •๋ณด๊ฐ€ ์ „์ฒด ๋กœ๋“œ๋˜๋ฉด ์Šคํฌ๋กค์„ ๋‚ด๋ ค๋„ ์•„๋ฌด๋Ÿฐ ์ด๋ฒคํŠธ๊ฐ€ ์—†์–ด์ง„๋‹ค.

๊ทธ๋Ÿฌ๋ฉด, ๊ฐ ๊ฐ์˜ ์ฑ„์šฉ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ HTML element ๊ตฌ์กฐ๋ฅผ ์•Œ์•„๋ณด์ž.
๋ธŒ๋ผ์šฐ์ €์˜ ๊ฐœ๋ฐœ์ž ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณด๋‹ค ์‰ฝ๊ฒŒ ๊ตฌ์กฐ๋ฅผ ์‰ฝ๊ฒŒ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ๊ฐ์˜ ์ฑ„์šฉ์ •๋ณด๋Š” class๋ช…์ด JobCard_container__FqChn์ธ divํƒœ๊ทธ๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๋‹ค.

ํ•ด๋‹น div ์•ˆ์—
ํฌ์ง€์…˜์€ class ๋ช…์ด JobCard_title__ddkwM์ธ strongํƒœ๊ทธ์ด๊ณ ,
ํšŒ์‚ฌ์ด๋ฆ„์€ class ๋ช…์ด JobCard_companyName__vZMqJ์ธ spanํƒœ๊ทธ์ด๋‹ค. (์ƒ๊ฐ๋ณด๋‹ค ์‹ฌํ”Œํ•˜๋‹ค.. )

์Šคํฌ๋ž˜ํ•‘ ์ฝ”๋“œ ์ž‘์„ฑ

HTML ๊ตฌ์กฐ๋Š” ํŒŒ์•… ํ–ˆ์œผ๋‹ˆ ์‹ค์ œ๋กœ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋Š” ์Šคํฌ๋ž˜ํ•‘ ์ฝ”๋“œ๋ฅผ ํŒŒ์ด์ฌ์œผ๋กœ ์ž‘์„ฑํ•ด๋ณด์ž

1. ํŒจํ‚ค์ง€ ์„ค์น˜


# HTML source๋ฅผ ํŒŒ์‹ฑํ•˜๋Š” ํŒจํ‚ค์ง€ ์„ค์น˜ 
pip install beautifulsoup4 

# ๋ธŒ๋ผ์šฐ์ € ์ž๋™ํ™” ํŒจํ‚ค์ง€ 
pip install playwright

์ด์ „์— ์ž‘์„ฑํ–ˆ๋˜ ๊ธ€ ์ค‘ ํŒŒ์ด์ฌ ์›น ์Šคํฌ๋ž˜ํ•‘์— ์‚ฌ์šฉํ•œ ๋ธŒ๋ผ์šฐ์ € ํŒจํ‚ค์ง€๋Š” Selenium ์ด์˜€์ง€๋งŒ ์ด๋ฒˆ ์˜ˆ์ œ์—์„œ๋Š” Playwright ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.
์ฐพ์•„๋ณด๋‹ˆ, Playwright๊ฐ€ Selenium ๋ณด๋‹ค ๋งŽ์€ ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์ง€์›ํ•˜์—ฌ ํฌ๋กœ์Šค ๋ธŒ๋ผ์šฐ์ € ํ…Œ์ŠคํŠธ๋ฅผ ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ๊ณผ ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ๋„ ๋›ฐ์–ด๋‚˜์„œ ์ด๋ฒˆ์— ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

2. ์Šคํฌ๋ž˜ํ•‘์œผ๋กœ ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ ํด๋ž˜์Šค ์ •์˜


# employ_schema.py ์ˆ˜์ •

... (์ƒ๋žต) ...

# (์ถ”๊ฐ€) ์Šคํฌ๋ž˜ํ•‘์œผ๋กœ ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ ํด๋ž˜์Šค ์ •์˜
class EmployScrap:
    def __init__(self, keyword, company_name, position):
        self.keyword = keyword
        self.company_name = company_name
        self.position = position

3. ์ฑ„์šฉ์ •๋ณด ์Šคํฌ๋ž˜ํ•‘ ์‹ ๊ทœ ์ž‘์„ฑ


# employ_scrap.py ์ž‘์„ฑ

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from domain.employ.employ_schema import EmployScrap
import time

# ํŠน์ • ํ‚ค์›Œ๋“œ๋กœ ์ฑ„์šฉ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์™€ EmployScrap ๊ฐ’๋“ค์„ ๋ฆฌํ„ดํ•˜๋Š” ํ•จ์ˆ˜ ์ •์˜
def get_employ_by_wanted(keyword):
    # Playwright ์‹œ์ž‘
    p = sync_playwright().start()

    # Chromium ๋ธŒ๋ผ์šฐ์ € ์‹คํ–‰
    browser = p.chromium.launch(headless=False)

    # ์ƒˆ๋กœ์šด ํŽ˜์ด์ง€ ์—ด๊ธฐ
    page = browser.new_page()

    # ํ‚ค์›Œ๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฑ„์šฉ ์ •๋ณด ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ํŽ˜์ด์ง€ - ํฌ์ง€์…˜ ํƒญ์œผ๋กœ ์ด๋™
    page.goto(f"https://www.wanted.co.kr/search?query={keyword}&tab=position")

    # ํŽ˜์ด์ง€๊ฐ€ ๋กœ๋“œ๋  ๋•Œ๊นŒ์ง€ ์Šคํฌ๋กค ๋‹ค์šด ๋ฐ˜๋ณต
    for x in range(5):
        time.sleep(3)  # 3์ดˆ ๋Œ€๊ธฐ
        page.keyboard.down("End")  # End ํ‚ค๋ฅผ ๋ˆŒ๋Ÿฌ ํŽ˜์ด์ง€์˜ ๋๊นŒ์ง€ ์Šคํฌ๋กค ๋‹ค์šด

    # ํŽ˜์ด์ง€์˜ HTML ๋‚ด์šฉ ๊ฐ€์ ธ์˜ค๊ธฐ
    content = page.content()

    # ๋ธŒ๋ผ์šฐ์ € ๋‹ซ๊ธฐ
    browser.close()

    # Playwright ์ข…๋ฃŒ
    p.stop()

    # BeautifulSoup์„ ์‚ฌ์šฉํ•˜์—ฌ HTML ๋‚ด์šฉ ํŒŒ์‹ฑ
    soup = BeautifulSoup(content, "html.parser")

    # ์ฑ„์šฉ ์ •๋ณด๊ฐ€ ๋‹ด๊ธด ์š”์†Œ๋“ค์„ ์ฐพ์•„์„œ jobs ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
    jobs = soup.find_all("div", class_="JobCard_container__FqChn")

    # ์ฑ„์šฉ ์ •๋ณด๋ฅผ ์ €์žฅํ•  ๋ฆฌ์ŠคํŠธ ์ดˆ๊ธฐํ™”
    jobs_db = []
    for job in jobs:
        # ์ฑ„์šฉ ์ •๋ณด์—์„œ ํฌ์ง€์…˜๊ณผ ํšŒ์‚ฌ ์ด๋ฆ„ ์ถ”์ถœ
        position = job.find("strong", class_="JobCard_title__ddkwM").text
        company_name = job.find("span", class_="JobCard_companyName__vZMqJ").text

        # ์ถ”์ถœ๋œ ์ •๋ณด๋ฅผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅํ•  ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ jobs_db ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€
        jobs_db.append(EmployScrap(keyword, company_name, position))

    # ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅํ•  ์ฑ„์šฉ ์ •๋ณด๊ฐ€ ๋‹ด๊ธด ๋ฆฌ์ŠคํŠธ ๋ฐ˜ํ™˜
    return jobs_db

4. ์Šคํฌ๋ž˜ํ•‘ ๋ฐ์ดํ„ฐ ์ฟผ๋ฆฌ ํ•จ์ˆ˜ ์ •์˜ ์ถ”๊ฐ€


# employ_crud.py ์ˆ˜์ •

from sqlalchemy import and_ # ์ถ”๊ฐ€
from domain.employ.employ_schema import EmployCreate, EmployScrap # ์ถ”๊ฐ€

... (์ƒ๋žต) ...

# ์ฑ„์šฉ ์ •๋ณด ์‚ญ์ œ By ํ‚ค์›Œ๋“œ, ํ”Œ๋žซํผ
def employ_delete(db: Session, keyword: str, platform: str):
    db.query(Employ) \
        .filter(and_(Employ.keyword == keyword, Employ.platform == platform)) \
        .delete(synchronize_session=False)
    db.commit()

# ์Šคํฌ๋ž˜ํ•‘ ์ฑ„์šฉ ์ •๋ณด ๋“ฑ๋ก
def employ_scrap_renew(db: Session, create_request: EmployScrap, platform: str):
    db.add(Employ(platform=platform,
                  keyword=create_request.keyword,
                  company_name=create_request.company_name,
                  position=create_request.position,
                  create_date=datetime.now()))
    db.commit()

5. ์ฑ„์šฉ ์ •๋ณด ๊ฐฑ์‹  ๋ผ์šฐํ„ฐ ์ถ”๊ฐ€


# employ_router.py ์ˆ˜์ •

from domain.employ import employ_schema, employ_crud, employ_scrap ์ถ”๊ฐ€ 

... (์ƒ๋žต) ...

# ์ฑ„์šฉ ์ •๋ณด ์›ํ‹ฐ๋“œ ๊ฐฑ์‹  API
@router.put("/scrap/wanted/{keyword}")
def employ_scrap_wanted_create(keyword: str, db: Session = Depends(get_db)):
    platform = "WANTED"

    # ์Šคํฌ๋žฉ ํ•œ ์›ํ‹ฐ๋“œ์˜ ์ฑ„์šฉ ์ •๋ณด
    employs = employ_scrap.get_employ_by_wanted(keyword)

    if employs:
        # ๊ฐฑ์‹ ์ „, ํ‚ค์›Œ๋“œ์™€ ํ”Œ๋žซํผ์œผ๋กœ ๊ธฐ์กด ์ฑ„์šฉ ์ •๋ณด ์‚ญ์ œ
        employ_crud.employ_delete(db, keyword, platform)

        for employ in employs:
            # ์Šคํฌ๋žฉํ•œ ์ฑ„์šฉ ์ •๋ณด ์ €์žฅ
            employ_crud.employ_scrap_renew(db, employ, platform)

์ฝ”๋“œ ์ž‘์„ฑ์€ ์™„๋ฃŒ ํ•˜์˜€๊ณ , ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ํ„ฐ๋ฏธ๋„์— ์ž…๋ ฅํ•˜์—ฌ ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•ด๋ณด์ž.

uvicorn main:app --reload

http://127.0.0.1:8000/docs ์— ์ ‘์†ํ•˜๋ฉด ์ฑ„์šฉ ์ •๋ณด ๊ฐฑ์‹  API๊ฐ€ ์ถ”๊ฐ€ ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋‹ค์Œ ์ด๋ฏธ์ง€ ์ฒ˜๋Ÿผ keyword์— โ€œnodeโ€๋ผ๊ณ  ์ž…๋ ฅ ํ›„ Execute ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด API๊ฐ€ ํ˜ธ์ถœ๋˜๋ฉฐ,

์Šคํฌ๋ž˜ํ•‘ ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค. (๋ธŒ๋ผ์šฐ์ € ์ฐฝ์ด ๋œจ๋ฉด์„œ ํŽ˜์ด์ง€ ๋กœ๋“œ ์ดํ›„, ์Šคํฌ๋กค ๋‹ค์šด ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒ๋œ๋‹ค.)

์Šคํฌ๋ž˜ํ•‘์ด ์™„๋ฃŒ๋˜๋ฉด ์ด์ „์— ๋งŒ๋“  ์กฐํšŒ API ๋ฅผ ํ†ตํ•ด keyword๊ฐ€ node์ธ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋‹ค์Œ์€, ์‹ค์ œ SQLite ์— ์ €์žฅํ•œ DB ์ฟผ๋ฆฌ ์กฐํšŒ ๊ฒฐ๊ณผ์ด๋‹ค. API์— ์กฐํšŒ๋œ ๋ฐ์ดํ„ฐ๋„ ๊ฐ™์ด ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค.

๋งˆ๋ฌด๋ฆฌ

์ด๋ ‡๊ฒŒ ๊ฒŒ์‹œ๊ธ€ 2๊ฐœ๋กœ ์งง๊ฒŒ๋‚˜๋งˆ!? FastAPI์— ๋Œ€ํ•ด์„œ ๊ณต๋ถ€ํ•ด๋ณด๊ณ  ์‹ค์ œ ํ† ์ด ํ”„๋กœ์ ํŠธ๋„ ์ž‘์„ฑํ•ด๋ณด์•˜๋‹ค.
ํ”„๋กœ์ ํŠธ๋ฅผ ๊ตฌํ˜„ํ•˜๋ฉด์„œ ์ ์ ˆํ•˜๊ฒŒ ํŒจํ‚ค์ง€๋ฅผ ์ž˜ ํ™œ์šฉํ•˜๋ฉด ์†Œ์Šค๊ฐ€ ๋” ๊ฐ„๊ฒฐํ•ด์ง€๊ณ  ๊ฐ€๋…์„ฑ๋„ ์ข‹์•„์ง„๋‹ค.

์‚ฌ์ด๋“œ ํ”„๋กœ์ ํŠธ๋‚˜ ํ† ์ด ํ”„๋กœ์ ํŠธ์—์„œ ๊ฐ„๋‹จํ•œ API ๋ฐฑ์—”๋“œ๋ฅผ ๊ตฌ์„ฑํ•ด์•ผ๋œ๋‹ค๋ฉด FastAPI๋ฅผ ์‹ค์ œ๋กœ ๋„์ž…ํ•ด๋ณด๊ณ  ์‚ฌ์šฉํ•ด๋„ ๋‚˜์˜์ง€ ์•Š์„ ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ๋‹ค.
๋งˆ์ง€๋ง‰์€ ์—ญ์‹œ๋‚˜ ํŒŒ์ด์ฌ์˜ ์žฌ๋ฐŒ๋Š” ์งค๊ณผ ๊นƒํ—ˆ๋ธŒ ์ฃผ์†Œ์™€ ํ•จ๊ป˜ ๊ธ€์„ ๋งˆ๋ฌด๋ฆฌ ํ•ด๋ณผ๊นŒ ํ•œ๋‹ค.

์ถœ์ฒ˜ : https://www.inflearn.com/pages/inflearnsnack-6-20220802

https://github.com/discphy/fastapi-scrap

[์ฐธ๊ณ ]

https://nomadcoders.co/python-for-beginners

https://wikidocs.net/book/8531

๋ฐ˜์‘ํ˜•