Skip to content

ev01ing/smart-extractor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

html smart extractor

更新日志

2017.11.29

  • 在原作者的基础上增加了一个post接口,直接传入网页内容,具体开API的2

Introduction

A micro-service for extract main content from url.

Usage

  1. Package

    $ mvn clean package
    
  2. Run

    $ java -jar target/smart-extractor.jar
    $ open http://localhost:8080
    

API

  1. GET Extract http://localhost:8080/extract?url={url}

    $ curl -i -X GET http://localhost:8080/extract\?url\=https://medium.com/@benjaminhardy/8-things-every-person-should-do-before-8-a-m-cc0233e15c8d
    
  2. POST Extract http://localhost:8080/extract python示例代码

    import requests
    url = "https://www.baidu.com"
    r = requests.get(url)
    payload = {"content": r.content}
    r_e = requests.post("http://localhost:8080/extract", data=payload)
    json_str = r_e.content
    print json_str

Build Docker Image

$ mvn clean package
$ mvn package docker:build

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 80.7%
  • Smarty 10.4%
  • HTML 8.9%