4.0 #161

danigosa · 2019-06-15T19:57:47Z

This is a grossly, ridiculously deliberate non-backwards compatible fork. Features:

drop Py2.7 support, making code slimmer
refactored tests with pytest and travis-ci
refactored development environment, a simple script to launch
usage of modern http-parser (2.9.2) and it's url parser instead a custom one
added some gunicornish parameters to mitigate (a little) DoS attacks
added Gunicorn worker, to easily take advantage of multi-core and easily integrate to standard deployments (and of course adding all gunicorn perks).
added refactored benchmarks included legacy current 3.0.x branch

The fork was born as an internal project in our team to make bjoern actually usable in professional/commercial environments, we loved the idea of a purely single-threaded wsgi that was lightweight and with tiny codebase, no more no less. One major problem was it was using a version of the parser dating from the past decade whilst current one has added tons of bugfixes and security patches. The other big problem was the lack of automated testing, and a real test suite, lacking continuous integration. Last but not least, code had the burden of Python2 compatibility, which harmed maintainability, more if we think that the end of life for CPython2 is close.

After several improvements I decided to publish changes as a fork, and just for you to know I leave this pull request that is, as I said in the first line, absolutely incompatible with the master branch, maybe some of the changes can be of some use for you like testing, CI or the gworker, maybe it makes sense to iron the master with this version and leave current master as 3.x branch, in any case these are up to the genuine creator of the lib, maybe this fork can be separately published as Bjoern3(for the python3-only thing). I know this can be an overwhelming PR, or not, you can just skip the whole thing out 😄 internally we are gonna use this branch version in production, so some commits can be dropping on in the future, if merged then I can contribute with our findings and improvements.

I can also gladly integrate open PRs to 3.x into this one, like #155 or #99

You can see the branch here: https://github.com/danigosa/bjoern/tree/4.0
You can see Travis CI log here: https://travis-ci.com/danigosa/bjoern

Caveats

The main caveat in comparison with current version is that this version is slightly slower (5-25%). This can mainly be due to:

latest version of the http-parser is roughly slower (as it has a decade of features more)
the url parser from http-parser is slower (it happens)
some minor features are added like the checking limits for body length, number of headers and header field length, that slow the thing down a little

Last thought

The performance of the gunicorn's gworker on a multi-core with --workers as the number of cores is awesome, several times than running a single current version in a multi-core machine. Having a gunicorn worker is a game changer as it is a de facto deployment standard for WSGI.

WIP WIP WIP WIP WIP tests tests not passing on post tests passing POST logs integrated logs integrated RC RC RC refactor tests added Falcon test added tests final changes make PID robust make PID robust release 4.0.1 WIP not passing tests when body params are large enough WIP not passing tests when body params are large enough problems with http-parser content-length and writing big bodies to _BytesIO problems with http-parser content-length and writing big bodies to _BytesIO - fixed problems on_body - added tests a releasing version 4.0.2 WIP FMT benchmarks for python3.6 release release release added test for keep alive behaviour added test for keep alive behaviour added test for all kind of errors added test not callable added test pure requests release 4.0.4 release 4.0.5 WIP WIP WIP WIP WIP WIP wip submodules WIP WIP added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker added gunicorn worker release 4.0.6 release 4.0.6 release 4.0.6 release 4.0.7 WIP wip wip wip release 4.0.7 wip release 4.0.7 release 4.0.7 release 4.0.8 wip release 4.0.9 release 4.0.9 remove warnings backport jonashaag#160 jonashaag#158 backport bjoern#3.0.1 release 4.0.10 release 4.0.10 add mem inspectors add mem inspectors WIP pypy support New legacy benchmarks New legacy benchmarks removed logs for performance release 4.0.11

improve changelog

improve changelog added travis ci

# Conflicts: # .travis-ci.yml

improve changelog added travis ci releasing 4.0.0rc1 releasing 4.0.0rc1 releasing 4.0.0rc1 releasing 4.0.0rc1

# Conflicts: # README.md

jonashaag · 2019-06-15T22:25:09Z

😳

Will have a closer look soon

merge server run refactor

jonashaag · 2019-06-17T13:34:56Z

Had a short look. Great code. I probably won't integrate all of the changes. I'd like to start by merging some of the changes from the C code. Can you quickly summarise what you've changed there? Obviously, url parsing and dropping Py2. But there's a lot of other changes as well.

danigosa · 2019-06-17T15:25:46Z

Main changes in C code apart from dropping Py2k and the use of http_parser's url parser:

Improve ServerInfo: more C, less Python C-API, pseudo-singleton/shared for all concurrent request handlers
Improve PyTupleArgs: more C, less Python C-API, avoid unnecessary calls to CallMethod... if the parser gives direct C-types
Added body_max_len, max_header_fields, max_header_field_len, checking in on_body, on_header, on_header_field, these are minimal checks to avoid primal DoS (mainly derived from Gunicorn's measures)
Use of http_parser's HTTP_STATUS_##name instead of custom http error codes/messages
Minor checks like HTTP_STATUS_HTTP_VERSION_NOT_SUPPORTED for HTTP2 plausible requests, HTTP_STATUS_LENGTH_REQUIRED for HTTP1.0 without Content-Length, HTTP_STATUS_URI_TOO_LONG for bad URLs (or custom crafted)
A lot of commits are amendments so git history is not very helpful, last rc2 version is from out part final in terms of features expectations

Non-C main changes apart from gworker.BjoernWorker:

Minor improvements in socket like TCP-NODELAY or KEEPALIVE
Pass in fileno:host:port as int:str:int instead of calling getsocketname in Python C-API
Minor logging

General changes:

Heavy refactor of code: C in src, Python in bjoern.__init__.py
Translation of most of the tests to Pytest
TravisCI

Possible improvements:

Add test for unix sockets
Refactor the use (or abuse) of Python C-API:
- ThreadInfo can be free of PyObjects
- A Request free of PyObject's would delay GIL_XX until the start_response phase, concurrent WSGI request handlers won't be GIL'ed
  - Instead of io module, use a simple Linked List that only appends, traverse it when actually writing to a io buffer at last moment (GIL Locked)
  - Instead of PyDict for headers, use a Linked List that only appends if no overriding keys in place, or a tree with tsearch(3) if overriding in place, not sure which structure is necessary, if it's just appending headers before passing them to start_response, at last moment wrapped as PyObject (GIL Locked), a simple linked list like read buffer would work
  - Apart from avoiding GIL'ing too much the code, a more strict C request handling has more sense than using the Python C-API (which changes over time, in specs, implementation and deprecations). Using Python C-API just for passing by objects to the WSGI app as it's mandatory for the target Python app is a long-term stability strategy (C has had 2-3 huge features in the last 2 decades, sticking to C99 is a strong code standing option)

# Conflicts: # src/request.c

# Conflicts: # src/common.h

jonashaag · 2019-07-04T12:23:14Z

@danigosa I'm going to close this PR; you're basically building a new project off of bjoern (which is awesome!)

My suggestion is that you rename your bjoern fork to something other than bjoern.

At your current amount of time you're investing I'm sure your fork will be the much better version of bjoern soon ;) Looking forward to it!

danigosa · 2019-07-05T16:46:23Z

Sounds good! if any feature or addition I did or may do you want feedback on how to integrate it, don't hesitate to ask for it :)

danigosa and others added 22 commits May 21, 2019 06:59

WIP

19fdcb9

release 4.0.0rc1

8b5c1e2

improve changelog

release 4.0.0rc1

83c2262

improve changelog added travis ci

release 4.0.0rc1

2cfef91

improve changelog added travis ci

Merge remote-tracking branch 'origin/4.0' into 4.0

20b3559

# Conflicts: # .travis-ci.yml

Merge remote-tracking branch 'origin/4.0' into 4.0

adca4b8

# Conflicts: # .travis-ci.yml

Merge remote-tracking branch 'origin/4.0' into 4.0

067a46f

Merge remote-tracking branch 'origin/4.0' into 4.0

23669ab

Merge remote-tracking branch 'origin/4.0' into 4.0

b30473a

releasing 4.0.0rc1

74e49bc

releasing 4.0.0rc1

a0148b8

Merge remote-tracking branch 'origin/4.0' into 4.0

ca995dd

releasing 4.0.0rc1

9ed37ef

release 4.0.0rc1

5d8469a

improve changelog added travis ci releasing 4.0.0rc1 releasing 4.0.0rc1 releasing 4.0.0rc1 releasing 4.0.0rc1

Merge remote-tracking branch 'origin/4.0' into 4.0

e2191e6

# Conflicts: # README.md

fix readme

a9f49d4

fix readme

17dc8f0

Merge remote-tracking branch 'origin/4.0' into 4.0

39901f2

# Conflicts: # README.md

fix readme

629c864

fmt

e7fed9b

fmt

21e3247

Dani G added 7 commits June 16, 2019 10:39

- prioritize EPOLL

0b8673c

refactor server info (so it has not to be in every request)

b17d73a

- make it Python4/5 friendly

afe4a4d

- make it Python4/5 friendly

a8bc16f

fix CI

8290163

WIP

2375ba1

fix CI

9d1ccc1

Dani G and others added 10 commits June 16, 2019 13:41

WIP - server run simplyfied

07c71bd

Merge branch 'server_run_refactor' into server_info_refactor

66498d6

fix CI

d8a3d96

merge server run refactor

fix CI

af047e2

fix CI

e3f6815

Python3.5 support

4bfe94e

release 4.0.0rc2

a4b79e6

Update README.md

b6cc7f4

Update README.md

56fac6f

Update README.md

65c940f

Dani G added 12 commits June 18, 2019 13:39

WIP: PARSER->field to C

53144a6

WIP: PARSER->field to C (optimal, C99)

21d05dd

Merge remote-tracking branch 'origin/thread_info' into thread_info

16c5b2d

# Conflicts: # src/request.c

- protect mallocs

a113ec7

WIP

801f2c9

- status and client_addr are C strings

cedfa9a

- status and client_addr are C strings

8764108

WIP

b216c82

rc4

eed2ae5

rc4

39c4bd8

Merge remote-tracking branch 'origin/rc3' into rc4

c7afa18

# Conflicts: # src/common.h

rc5

a32e212

jonashaag closed this Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0 #161

4.0 #161

danigosa commented Jun 15, 2019 •

edited

Loading

jonashaag commented Jun 15, 2019

jonashaag commented Jun 17, 2019 •

edited

Loading

danigosa commented Jun 17, 2019 •

edited

Loading

jonashaag commented Jul 4, 2019

danigosa commented Jul 5, 2019

4.0 #161

4.0 #161

Conversation

danigosa commented Jun 15, 2019 • edited Loading

jonashaag commented Jun 15, 2019

jonashaag commented Jun 17, 2019 • edited Loading

danigosa commented Jun 17, 2019 • edited Loading

jonashaag commented Jul 4, 2019

danigosa commented Jul 5, 2019

danigosa commented Jun 15, 2019 •

edited

Loading

jonashaag commented Jun 17, 2019 •

edited

Loading

danigosa commented Jun 17, 2019 •

edited

Loading