๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐ŸŒฑBackend

ํŒŒ์ด์ฌ ์›น ์Šคํฌ๋ž˜ํ•‘ - ๋ฉœ๋ก  ์Œ์› ์ถ”์ถœ

by discphy 2024. 3. 29.
๋ฐ˜์‘ํ˜•

์‹œ์ž‘ํ•˜๊ธฐ์— ์•ž์„œ, ํŒŒ์ด์ฌ์„ ์ตœ๊ทผ์— ๊ณต๋ถ€ํ•˜๋ฉด์„œ ์Šคํฌ๋ž˜ํ•‘์ด๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์ฒ˜์Œ ๋“ค์—ˆ๋‹ค.
ํฌ๋กค๋ง์€ ๋“ค์–ด๋ดค๋Š”๋ฐ.. ์Šคํฌ๋ž˜ํ•‘์€ ๋ฌด์—‡์ธ๊ฐ€..?

ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ, ํฌ๋กค๋ง ์—…๋ฌด๋ฅผ ๋งก์€ ์ ์ด ๊ฝค ์žˆ๋Š”๋ฐ ๋‚ด๊ฐ€ ํ–ˆ๋˜ ๊ฑด ๋Œ€๋ถ€๋ถ„ ์Šคํฌ๋ž˜ํ•‘์ด์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค..
๊ทธ๋ฆฌ๊ณ , ์Šคํฌ๋ž˜ํ•‘์˜ ์œ ๋ฆฌํ•œ ์–ธ์–ด๊ฐ€ ํŒŒ์ด์ฌ์ด๋ผ๊ณ  ํ•ด์„œ ์ด๋ ‡๊ฒŒ ๊ธ€์„ ์ž‘์„ฑํ•˜๊ฒŒ(?) ๋˜์—ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ์„ ์ • ๊ณผ์ •


์œ ํŠœ๋ธŒ ํ”„๋ฆฌ๋ฏธ์—„์„ ์œ ๋ฃŒ ๊ตฌ๋…์„ ํ•˜๊ฒŒ๋˜๋ฉด ์œ ํŠœ๋ธŒ ๋ฎค์ง์„ ๋ฌด๋ฃŒ๋กœ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฏธ ๋ฉœ๋ก  ์ŠคํŠธ๋ฆฌ๋ฐ ์„œ๋น„์Šค๋ฅผ ์œ ๋ฃŒ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋˜ ๋‚˜์—๊ฒŒ ๊ณ ๋ฏผ์ด ์ฐพ์•„์™”๋‹คโ€ฆ
์•ฝ 2016๋…„๋ถ€ํ„ฐ 7~8๋…„๊ฐ„ 3,000๊ณก์˜ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ๊ฐ€ ๋ฉœ๋ก ์— ์ €์žฅ๋˜์–ด์žˆ๋‹ค.

์œ„์˜ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ๋ฅผ ์˜ค์ฐจ ์—†์ด ์œ ํŠœ๋ธŒ ๋ฎค์ง์œผ๋กœ ์˜ฎ๊ธฐ๊ฒŒ ๋œ๋‹ค๋ฉดโ€ฆ
๋งŽ์€ ์ด์ (์ ˆ์•ฝ? ๋™๊ธฐํ™”? ๋“ฑ)์ด ์žˆ์„ ๊ฒƒ ๊ฐ™์•„ ๋ฉœ๋ก ์˜ ์Œ์›๋“ค์„ ์œ ํŠœ๋ธŒ ๋ฎค์ง์œผ๋กœ ์ด๊ด€ํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ž‘์„ฑํ•˜๊ธฐ๋กœ ๊ฒฐ์‹ฌํ•˜์˜€๋‹ค.

ํ”„๋กœ์ ํŠธ ์š”๊ตฌ ์‚ฌํ•ญ


  • ๋‚˜์˜ ๋ฉœ๋ก  ๊ณ„์ •์— ํฌํ•จ๋˜์–ด์žˆ๋Š” ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์ถ”์ถœ ๊ธฐ๋Šฅ
  • ์ถ”์ถœํ•œ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์˜ ์Œ์› ์ถ”์ถœ ๊ธฐ๋Šฅ
  • ์ถ”์ถœํ•œ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์™€ ์Œ์›์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์—‘์…€ ํŒŒ์ผ ์ž‘์„ฑ ๋ฐ ์ €์žฅ ๊ธฐ๋Šฅ
  • ์ž‘์„ฑํ•œ ์—‘์…€์˜ ์Œ์›์„ ์œ ํŠœ๋ธŒ ์žฌ์ƒ๋ชฉ๋ก์— ์ €์žฅ

์œ ํŠœ๋ธŒ์˜ ๊ฒฝ์šฐ, ํ•œ๊ณ„๊ฐ€ ์žˆ์–ด ๊ตฌํ˜„ํ•˜์ง€ ๋ชปํ•˜์˜€๋‹คโ€ฆ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ฐ‘์—์„œ ์„ค๋ช…ํ•˜๊ฒ ๋‹ค.. ๋‹ค์Œ ํŽธ์— ๊ณ„์†(!?)

๋ฉœ๋ก ์˜ ์›น ๊ตฌ์กฐ ํƒ์ƒ‰


  1. ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์กฐํšŒ ํŽ˜์ด์ง€ URL : https://www.melon.com/mymusic/playlist/mymusicplaylist_list.htm?memberKey=56389814

    • https://www.melon.com/mymusic/playlist/mymusicplaylist_list.htm ์˜ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ memberKey๋ฅผ ๋ฐ›๋Š”๋‹ค.
    • ๋ฉœ๋ก  ์œ ์ €์˜ ํ”„๋กœํ•„์„ ๊ณต์œ  ๋ฐ›์•„ memberKey์˜ ๊ฐ’์„ ์–ป๋Š”๋‹ค.
  2. ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์˜ ํฌํ•จ๋œ ์Œ์› ์กฐํšŒ ํŽ˜์ด์ง€ URL : https://www.melon.com/mymusic/playlist/mymusicplaylistview_inform.htm?plylstSeq=533264243

    • https://www.melon.com/mymusic/playlist/mymusicplaylistview_inform.htm ์˜ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ plystSeq๋ฅผ ๋ฐ›๋Š”๋‹ค.
    • ์œ„์˜ ํŽ˜์ด์ง€์—์„œ ๊ฐ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ๋ฅผ ํด๋ฆญํ•˜๋ฉด plystSeq์˜ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  3. ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์กฐํšŒ / ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์˜ ํฌํ•จ๋œ ์Œ์› ์กฐํšŒ ๊ฐ๊ฐ์˜ ํ•œ ํŽ˜์ด์ง€๋‹น ๋…ธ์ถœ ๋˜๋Š” ์•„์ดํ…œ ๊ฐœ์ˆ˜

    • ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์กฐํšŒ : ํ•œ ํŽ˜์ด์ง€ ๋‹น ์ด 20๊ฐœ
    • ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์˜ ํฌํ•จ๋œ ์Œ์› ์กฐํšŒ : ํ•œ ํŽ˜์ด์ง€ ๋‹น ์ด 50๊ฐœ
  4. ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์กฐํšŒ / ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ์˜ ํฌํ•จ๋œ ์Œ์› ์กฐํšŒ ํŽ˜์ด์ง€ ์ด๋™

    • ํŽ˜์ด์ง€ ๋ฒ„ํŠผ์— ๋ถ™์–ด์žˆ๋Š” ์‹คํ–‰ ํ•จ์ˆ˜ : javascript:pageObj.sendPage(offset)์œผ๋กœ ํŽ˜์ด์ง€ ์ด๋™์„ ํ•œ๋‹ค.

์ผ๋ฐ˜์ ์ธ ํŽ˜์ด์ง€์˜ ๊ฒฝ์šฐ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ page์™€ offset์˜ ์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์–ด ํŒจํ„ด ์˜ˆ์ธก์ด ์‰ฌ์šฐ๋‚˜, ๋ฉœ๋ก ์˜ ๊ฒฝ์šฐ ํŒจํ„ด์„ ์œ ์ถ” ํ•  ์ˆ˜ ์—†์–ด ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

ํ”„๋กœ๊ทธ๋žจ ๊ตฌํ˜„ - melon.py


[์ฃผ์š” ํŒจํ‚ค์ง€]

  • webdriver : ์›น ๋ธŒ๋ผ์šฐ์ €์˜ ์ด๋™ / ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰ / ์š”์†Œ ์ฐพ๋Š” ๊ธฐ๋Šฅ
  • BeautifulSoup : HTML ์ •๋ณด ์ˆ˜์ง‘
  • pandas : ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ์กฐ์ž‘

[์ƒ์ˆ˜]

# ์ƒ์ˆ˜ ์ •์˜
WAIT_TIME = 1 # ์›น ๋“œ๋ผ์ด๋ฒ„ ๋Œ€๊ธฐ ์‹œ๊ฐ„ 
TODAY_DATE = datetime.today().strftime('%Y%m%d') # ์—‘์…€ ํŒŒ์ผ ์ด๋ฆ„์— ๋“ค์–ด๊ฐ€๋Š” ์˜ค๋Š˜ ๋‚ ์งœ 
MUSIC_COLUMNS = ['์ œ๋ชฉ', '์•„ํ‹ฐ์ŠคํŠธ', '์•จ๋ฒ”'] # ์—‘์…€ ํ—ค๋” ์ปฌ๋Ÿผ 
PLAYLIST_URL = 'https://www.melon.com/mymusic/playlist/mymusicplaylist_list.htm' # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ URL
MUSIC_URL = 'https://www.melon.com/mymusic/playlist/mymusicplaylistview_inform.htm' # ์Œ์› URL 
EXCEL_PATH = "../excel" # ์—‘์…€ ํŒŒ์ผ ๊ฒฝ๋กœ

[ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ตฌํ˜„]

  • ๋ฉ”์ธ

    • ํšŒ์› ์ „์ฒด ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์Šคํฌ๋ž˜ํ•‘

        # ํšŒ์› ํ‚ค๋กœ ์ „์ฒด ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์Œ์•… ๊ฐ€์ ธ์˜ค๊ธฐ
        def member(member_key):
            driver = init() # ์›น ๋“œ๋ผ์ด๋ฒ„ ์„ธํŒ…
      
            driver.get(PLAYLIST_URL + '?memberKey=' + member_key) # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์กฐํšŒ ํŽ˜์ด์ง€ ์ด๋™ 
            playlist_total_count = int(driver.find_element(By.CSS_SELECTOR, '.no').text) # ํŽ˜์ด์ง€ ์ด๋™์— ํ•„์š”ํ•œ "์ด35๊ฐœ"์˜ ์š”์†Œ์˜ "35" ์ฆ‰ ์ˆซ์ž ๊ฐ’์„ ๊ฐ€์ ธ์˜จ๋‹ค. 
      
            playlist_seqs = get_playlist_seqs(driver, playlist_total_count) # ์ „์ฒด ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ plystSeq ๊ฐ’์„ ๊ฐ€์ ธ์˜จ๋‹ค. 
            data_frame_list = scrape_music_data(driver, playlist_seqs) # ์Œ์› ์ •๋ณด ์ถ”์ถœ
      
            write_excel(data_frame_list, 'member_' + member_key + '_' + TODAY_DATE + '.xlsx') # ์—‘์…€ ์ž‘์„ฑ
      
            driver.quit()
    • ๋‹จ์ผ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์Šคํฌ๋ž˜ํ•‘

        # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ํ‚ค๋กœ ์Œ์•… ๊ฐ€์ ธ์˜ค๊ธฐ
        def playlist(playlist_key):
            driver = init()
      
            playlist_seqs = [playlist_key] # ๋‹จ์ผ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ 
            data_frame_list = scrape_music_data(driver, playlist_seqs) # ์Œ์› ์ •๋ณด ์ถ”์ถœ
      
            write_excel(data_frame_list, 'playlist_' + playlist_key + '_' + TODAY_DATE + '.xlsx') # ์—‘์…€ ์ž‘์„ฑ
      
            driver.quit() 
  • ์›น ๋“œ๋ผ์ด๋ฒ„ ์„ธํŒ…

      # Selenium ๋“œ๋ผ์ด๋ฒ„ ์„ธํŒ…
      def init():
          chrome_options = Options()
          chrome_options.add_argument("--headless") # Chrome ๋ธŒ๋ผ์šฐ์ € ๋ฐฑ๊ทธ๋ผ์šด๋“œ ์‹คํ–‰ ์˜ต์…˜ : ์†๋„ ๊ฐœ์„  
          driver = webdriver.Chrome(options=chrome_options)
    
          return driver
  • ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ํ‚ค ๊ฐ€์ ธ์˜ค๊ธฐ

      # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ํ‚ค ๊ฐ€์ ธ์˜ค๊ธฐ
      def get_playlist_seqs(driver, playlist_total_count):
          playlist_seqs = []
          for offset in range(1, playlist_total_count + 1, 20): # 20์”ฉ ํŽ˜์ด์ง€ ์ˆ˜ ๋งŒํผ ๋ฐ˜๋ณต
              driver.execute_script("javascript:pageObj.sendPage('" + str(offset) + "')") # ํŽ˜์ด์ง€ ์ด๋™ ์Šคํฌ๋ฆฝํŠธ ํ˜ธ์ถœ 
              time.sleep(WAIT_TIME)
    
              playlist_links = driver.find_elements(By.CSS_SELECTOR, 'dt a') # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์ƒ์„ธ ํŽ˜์ด์ง€ ์ด๋™ url ๊ฐ€์ ธ์˜ค๊ธฐ 
              for link in playlist_links:
                  playlist_seq = re.findall(r'\d+', link.get_attribute('href'))[1] # URL์—์„œ plystSeq ๊ฐ’๋งŒ ์ถ”์ถœ 
                  playlist_seqs.append(playlist_seq) # ๋ฐฐ์—ด์— ์ €์žฅ
    
          return playlist_seqs
  • ์Œ์› ๋ฐ์ดํ„ฐ ์…‹ ์ถ”์ถœ

      # ์Œ์•… ์Šคํฌ๋ž˜ํผ
      def scrape_music_data(driver, playlist_seqs):
          data_frame_list = []
          for playlist_seq in playlist_seqs:
              driver.get(MUSIC_URL + '?plylstSeq=' + playlist_seq) # plystSeq๋กœ ์Œ์› ์กฐํšŒ ํŽ˜์ด์ง€ ์ด๋™ 
              time.sleep(WAIT_TIME)
    
              playlist_title = driver.find_element(By.CSS_SELECTOR, '.more_txt_title').text # ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ํƒ€์ดํ‹€ ์š”์†Œ ๊ฐ€์ ธ์˜ค๊ธฐ (์—‘์…€ ์‹œํŠธ ๋ช…)
              music_total = int(re.search(r'\d+', driver.find_element(By.CSS_SELECTOR, '.title .cnt').text).group()) # ํŽ˜์ด์ง€ ์ด๋™ ์œ„ํ•œ "์ˆ˜๋ก๊ณก(12)"์˜ "12"๋ฅผ ์ถ”์ถœ 
    
              music_data = []
    
              for offset in range(1, music_total, 50): # 50์”ฉ ํŽ˜์ด์ง€ ์ˆ˜ ๋งŒํผ ๋ฐ˜๋ณต
                  driver.execute_script("javascript:pageObj.sendPage('" + str(offset) + "')") # ํŽ˜์ด์ง€ ์ด๋™ ์Šคํฌ๋ฆฝํŠธ ํ˜ธ์ถœ 
                  time.sleep(WAIT_TIME)
    
                  soup = BeautifulSoup(driver.page_source, 'lxml') # HTML ํŒŒ์‹ฑ ์œ„ํ•œ BeautifulSoup ์ „ํ™˜
                  tr_tags = soup.find_all('tr')
    
                  for tr in tr_tags:
                      td_tags = tr.find_all('td', class_='t_left') 
                      if td_tags and td_tags[0].find(class_='fc_gray'):
                          title = td_tags[0].find(class_='fc_gray').text.strip() # ์Œ์›์˜ ์ œ๋ชฉ ์ถ”์ถœ
                          artist = td_tags[1].find(id='artistName').text.strip() # ์Œ์›์˜ ์•„ํ‹ฐ์ŠคํŠธ๋ช… ์ถ”์ถœ
                          album = td_tags[2].find(class_='fc_mgray').text.strip() # ์Œ์›์˜ ์•จ๋ฒ”๋ช… ์ถ”์ถœ
                          print("Title : ", title, " / ", "Artist : ", artist, " / ", "Album : ", album) # ์ถ”์ถœํ•œ ๋ฐ์ดํ„ฐ ์ถœ๋ ฅ
                          music_data.append([title, artist, album])
    
              df = pd.DataFrame(music_data, columns=MUSIC_COLUMNS) # pandas ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
              data_frame_list.append({'sheet': playlist_title, 'data': df}) 
    
          return data_frame_list
  • ์—‘์…€ ์ž‘์„ฑ

      # ์—‘์…€ ์“ฐ๊ธฐ
      def write_excel(data_frame_list, filename):
          if not os.path.exists(EXCEL_PATH): 
              os.makedirs(EXCEL_PATH) # excel ๋””๋ ‰ํ† ๋ฆฌ ์ƒ์„ฑ
    
          with pd.ExcelWriter("excel/" + filename) as writer:
              for data_frame in data_frame_list:
                  sheet_name = data_frame.get('sheet') # ์‹œํŠธ ๋ช… ๊ฐ€์ ธ์˜ค๊ธฐ 
    
                  m = re.compile(r'[\\*?:/\[\]]').search(sheet_name) # ํ—ˆ์šฉ๋˜์ง€ ์•Š๋Š” ์‹œํŠธ ๋ช… ๊ฒ€์‚ฌ
                  if m:
                      sheet_name = '์•Œ ์ˆ˜ ์—†์Œ'
    
                  data_frame.get('data').to_excel(excel_writer=writer, sheet_name=sheet_name, index=False) # ์—‘์…€ ํŒŒ์ผ ์ž‘์„ฑ

[ํšŒ์› ์ „์ฒด ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์Šคํฌ๋ž˜ํ•‘]

  • ์‹คํ–‰ ํŒŒ์ผ - member.py
  • from melon import member # https://www.melon.com/mymusic/playlist/mymusicplaylist_list.htm?memberKey=56389814 member('56389814')
  • ์‹คํ–‰ ์Šคํฌ๋ฆฝํŠธ
  • python ./member.py
  • ๊ฒฐ๊ณผ ์—‘์…€ ํŒŒ์ผ

ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ๋ณ„๋กœ ์‹œํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

[๋‹จ์ผ ํ”Œ๋ ˆ์ด๋ฆฌ์ŠคํŠธ ์Šคํฌ๋ž˜ํ•‘]

  • ์‹คํ–‰ ํŒŒ์ผ - playlist.py
from melon import playlist

# https://www.melon.com/mymusic/playlist/mymusicplaylistview_inform.htm?plylstSeq=503394650
playlist('503394650')
  • ์‹คํ–‰ ์Šคํฌ๋ฆฝํŠธ
python ./playlist.py
  • ๊ฒฐ๊ณผ ์—‘์…€ ํŒŒ์ผ

์ด๋ ‡๊ฒŒ ๊ฐ„๋‹จํ•˜๊ฒŒ ์›น ์Šคํฌ๋ž˜ํ•‘์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด๋ดค๋‹ค.
๋งŒ์•ฝ Java๋กœ ์ž‘์„ฑ ํ–ˆ์œผ๋ฉด ๋” ๋ณต์žกํ•œ ์ฝ”๋“œ์ด์ง€ ์•Š์•˜์„๊นŒ...?๋ผ๋Š” ์ƒ๊ฐ์„ ํ•ด๋ณธ๋‹ค.

์œ ํŠœ๋ธŒ ํ•œ๊ณ„


์›๋ž˜ ํ”„๋กœ์ ํŠธ์˜ ์š”๊ตฌ์‚ฌํ•ญ์—๋Š” ์œ ํŠœ๋ธŒ ๋ฎค์ง์œผ๋กœ ์Šคํฌ๋ž˜ํ•‘์œผ๋กœ ์ถ”์ถœํ•œ ์Œ์›์„ ์ด๊ด€์„ ํ•ด์•ผ๋˜๋Š” ์ž‘์—…์ด ์žˆ์ง€๋งŒ ์•„๋ž˜์˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ด์œ ๋กœ ์ž ๊น ๋ณด๋ฅ˜(?) ํ•˜๊ฒ ๋‹ค.

  1. ์œ ํŠœ๋ธŒ ํŽ˜์ด์ง€์˜ ์Šคํฌ๋ž˜ํ•‘์˜ ํ•œ๊ณ„ : HTML ์ƒ๋ช…์ฃผ๊ธฐ๊ฐ€ ๋ถˆํŠน์ •ํ•˜๋ฉฐ, ์ •ํ™•ํ•œ ์Œ์›์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ์— ์žˆ์–ด ์–ด๋ ค์›€์ด ์žˆ๋‹ค.

์œ ํŠœ๋ธŒ์˜ ์ •์‹ ์Œ์›์„ ์ฐพ์„๋ ค๋ฉด keyword๋ฅผ ([์ œ๋ชฉ] [์•„ํ‹ฐ์ŠคํŠธ๋ช…] [์•จ๋ฒ”๋ช…] ****topic)์ด๋ผ๊ณ  ๋ช…์‹œ ํ•˜๋ฉด ์ •์‹์Œ์›์ด ์ƒ๋‹จ์— ๋…ธ์ถœ๋œ๋‹ค. โ†’ ํ•ญ์ƒ ๊ทธ๋ ‡์ง€๋Š” ์•Š๋‹ค. ์Œ์›๋„ ๋ณ„๋กœ ์—†๊ณ โ€ฆ

  1. ์œ ํŠœ๋ธŒ API๋ฅผ ํ†ตํ•ด ๊ฒ€์ƒ‰ ๋ฐ ์žฌ์ƒ๋ชฉ๋ก ์ƒ์„ฑ / ์Œ์› ์ถ”๊ฐ€ ๋“ฑ์„ ๊ตฌํ˜„ํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ํ•˜๋ฃจ์˜ ์ตœ๋Œ€ 200๊ณก๋งŒ ์žฌ์ƒ๋ชฉ๋ก์— ์ €์žฅ ํ•  ์ˆ˜ ์žˆ๋Š” API ์ œํ•œ ๋•Œ๋ฌธ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

์ด๋Ÿฌํ•œ ํ•œ๊ณ„ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ์•„์ด๋””์–ด๋ฅผ ๊ณ ์•ˆ ์ค‘์— ์žˆ๋‹ค...

ํ”„๋กœ์ ํŠธ ๋ฐฐํฌ


ํ•ด๋‹น ํ”„๋กœ์ ํŠธ๋ฅผ ํŒจํ‚ค์ง€ ํ™”ํ•˜์—ฌ Pypi์— ๋ฐฐํฌ ํ•˜๋ ค๊ณ  ํ–ˆ์œผ๋‚˜ ์•„๋ž˜ ์ด๋ฏธ์ง€ ์ฒ˜๋Ÿผ ํ˜„์žฌ๋Š” ๋ถˆ๊ฐ€ํ•˜๋‹ค.
๋‹ค์Œ์— ์•Œ์•„๋ณด๋„๋ก ํ•˜์žโ€ฆ.

๋ณด์™„


ํŒŒ์ด์ฌ ํ”„๋กœ๊ทธ๋žจ์„ ์ฒ˜์Œ ์ž‘์„ฑํ•˜๋‹ค๋ณด๋‹ˆ, ์ž์ฃผ ์‚ฌ์šฉํ•˜๋˜ Java์— ๋น„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€์˜ ์–ด๋ ค์›€(์ž๋™์™„์„ฑ? ํƒ€์ž…? ๋ฌธ๋ฒ• ๋“ฑ)์ด ์žˆ์—ˆ์œผ๋‚˜ ๋›ฐ์–ด๋‚œ ์ƒ์‚ฐ์„ฑ์— ๋†€๋ผ์› ๋‹ค.
์•ž์œผ๋กœ AI ์‹œ๋Œ€์— ๋งž๊ฒŒ ํŒŒ์ด์ฌ์˜ ๋งค๋ ฅ์„ ๋” ๋А๋ผ๊ธฐ ์œ„ํ•˜์—ฌ ๊พธ์ค€ํžˆ ํ•™์Šตํ•  ์˜ˆ์ •์ด๋‹ค.

์œ„์˜ ํ”„๋กœ์ ํŠธ๋Š” ํ•จ์ˆ˜ํ˜• ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ์ž‘์„ฑํ•˜์˜€๋‹ค. ํŒŒ์ด์ฌ๋„ OOP ๊ฐ์ฒด์ง€ํ–ฅ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ์œ„์˜ ์ฝ”๋“œ๋ฅผ ๋ฆฌํŒฉํ† ๋ง์„ ํ•ด ๋ณผ ์ƒ๊ฐ์ด๋‹ค.
๋˜ ํ˜„์žฌ๋Š” ๋ถˆ๊ฐ€ํ•˜์ง€๋งŒ, ํ”„๋กœ์ ํŠธ ๋ฐฐํฌ๋„ ํ•œ๋ฒˆ ๊ฒฝํ—˜์„ ํ•ด๋ณด๊ณ  ์‹ถ๋‹ค.

์ฐจํ›„์— ๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๐Ÿคฃ, ์œ ํŠœ๋ธŒ ์Œ์› ์ด๊ด€๊นŒ์ง€ํ•˜์—ฌ ์›๋ž˜ ํ”„๋กœ์ ํŠธ์˜ ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑ ํ•ด์•ผ๋˜์ง€ ์•Š์„ ๊นŒ ์‹ถ๋‹ค..

์œ„์˜ ์†Œ์Šค๋Š” ๊นƒํ—ˆ๋ธŒ์— ๋ณ„๋„๋กœ ์ฒจ๋ถ€ ํ•ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•˜์‹  ๋ถ„์€.. ๊ฐ€์ ธ๋‹ค ์“ฐ์…”๋„ ๋ฉ๋‹ˆ๋‹ค.. ๐Ÿ™ƒ

https://github.com/discphy/melon-scrap

๋!

๋ฐ˜์‘ํ˜•