07/22
2014
python-readability 웹 서비스로 만들기
readability 가 필요해서 우선 자바로 만든것들을 찾아봤지만...
다 맘에 안들어 다른 언어로 찾아보니..
파이선으로 만든것이 정말 결과물이 좋다!!
https://github.com/buriy/python-readability
python-readability설치는 github보고 따라했다.
yum groupinstall "Development tools"
yum install python
yum install gcc
yum install python-devel libxml2-devel libxslt-devel
python setup.py install
pip install readability-lxml
음 이정도 실행한것 같은데;;
설치가 완료되면 요렇게 실행하면 된다.
$ python -m readability.readability -u http://millky.com/home/byuri/10001000
음 잘 동작한다 ㅎㅎ
자바에서 사용하기 위해 웹으로 한번 감싸보았다.
https://github.com/cv/html2text-service 참고로 만들었다.
실행
$ python ~/python-readability-master/dist/web.py
flask가 없어 에러나면 설치해야 함
$ pip install flask
- web.py
from flask import Flask, request, make_response
from readability.readability import Document
import os, urllib
app = Flask(__name__)
app.config['DEBUG'] = True
def text_response(output):
response = make_response(output, 200)
response.mimetype = 'text/plain'
return response
@app.route("/")
def get():
url = request.args.get('url')
if not url:
return """
<!doctype html>
<html>
<head>
<title>python-readability web</title>
</head>
<body>
<form action="/" method="get">
<p>URL: <input type="text" name="url" /> <button type="submit">Go</button></p>
</form>
<form action="/" method="post">
<textarea name="html" rows="20" cols="80"></textarea>
<p><button type="submit">Go</button>
</form>
</body>
</html>
"""
else:
req = urllib.urlopen(url)
text = req.read()
readable_article = Document(text).summary()
return text_response(readable_article)
@app.route("/", methods=['POST'])
def post():
readable_article = Document(text).summary()
return text_response(readable_article)
if __name__ == "__main__":
port = int(os.environ.get("PORT", 5000))
app.run(host='0.0.0.0', port=port)
참고자료
http://lab.arc90.com/2009/03/02/readability/