[GH-ISSUE #5] Replace gpt-3-encoder with gpt-tokenizer or similar #4

Closed
opened 2026-03-03 13:52:06 +03:00 by kerem · 1 comment
Owner

Originally created by @0xdevalias on GitHub (Nov 13, 2023).
Original GitHub issue: https://github.com/jehna/humanify/issues/5

Currently gpt-3-encoder is used:

But it might make more sense to use a library that also supports GPT-4 as well, for example:

  • https://github.com/niieani/gpt-tokenizer
    • JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features.

    • gpt-tokenizer is a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments.

    • As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM.

    • No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)

    • Historical note: This package started off as a fork of latitudegames/GPT-3-Encoder, but version 2.0 was rewritten from scratch.

  • https://gpt-tokenizer.dev/

Currently gpt-3-encoder is referenced in a few places:

See Also

Originally created by @0xdevalias on GitHub (Nov 13, 2023). Original GitHub issue: https://github.com/jehna/humanify/issues/5 Currently `gpt-3-encoder` is used: - https://github.com/latitudegames/GPT-3-Encoder - > Javascript BPE Encoder Decoder for GPT-2 / GPT-3 But it might make more sense to use a library that also supports GPT-4 as well, for example: - https://github.com/niieani/gpt-tokenizer - > JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features. - > `gpt-tokenizer` is a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments. - > As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM. - > No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation) - > Historical note: This package started off as a fork of [latitudegames/GPT-3-Encoder](https://github.com/latitudegames/GPT-3-Encoder), but version 2.0 was rewritten from scratch. - https://gpt-tokenizer.dev/ Currently `gpt-3-encoder` is referenced in a few places: - https://github.com/jehna/humanify/blob/main/package.json#L21 - https://github.com/jehna/humanify/blob/main/src/openai/split-file.ts#L1 ## See Also - https://github.com/jehna/humanify/issues/4
kerem closed this issue 2026-03-03 13:52:06 +03:00
Author
Owner

@jehna commented on GitHub (Aug 12, 2024):

Fixed now at v2 (not counting tokens anymore)

<!-- gh-comment-id:2284767234 --> @jehna commented on GitHub (Aug 12, 2024): Fixed now at v2 (not counting tokens anymore)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/humanify#4
No description provided.