diff --git a/INSTALL.md b/INSTALL.md deleted file mode 100644 index c2856b2..0000000 --- a/INSTALL.md +++ /dev/null @@ -1,100 +0,0 @@ -# Quick Installation & Getting Started - -## ๐Ÿš€ Quick Start - -### 1. Install Dependencies -```bash -npm install -``` - -### 2. Get Google Gemini API Key -1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey) -2. Create a new API key -3. Copy the generated key - -### 3. Set Up Your API Key - -**Option A: Environment Variable (Recommended)** -```bash -# Windows PowerShell -$env:GEMINI_API_KEY="your-api-key-here" - -# Windows Command Prompt -set GEMINI_API_KEY=your-api-key-here - -# Linux/Mac -export GEMINI_API_KEY="your-api-key-here" -``` - -**Option B: Use with Command** -```bash -node bin/cli.js translate -i file.md -l Spanish --key your-api-key-here -``` - -### 4. Test with Sample File - -Try translating the included sample file: - -```bash -# Set your API key first, then run: -node bin/cli.js translate -i examples/sample.md -l Spanish - -# Or use npm script: -npm run demo -``` - -## ๐Ÿ“š Available Commands - -```bash -# Get help -node bin/cli.js --help - -# List supported languages -node bin/cli.js languages - -# Show setup guide -node bin/cli.js setup - -# Translate a file -node bin/cli.js translate -i input.md -l TargetLanguage -o output.md -``` - -## ๐ŸŽฏ Examples - -### Basic Translation -```bash -node bin/cli.js translate -i README.md -l French -``` - -### Custom Output File -```bash -node bin/cli.js translate -i docs/guide.md -l German -o docs/guide_de.md -``` - -### Using API Key Argument -```bash -node bin/cli.js translate -i file.md -l Japanese --key AIzaSyC... -``` - -## ๐Ÿ› ๏ธ Make it Global (Optional) - -To use `md-translate` from anywhere: - -```bash -npm link -``` - -Then you can use: -```bash -md-translate translate -i file.md -l Spanish -md-translate languages -md-translate setup -``` - -## ๐Ÿ” Troubleshooting - -- **API Key Error**: Make sure your Gemini API key is valid and set correctly -- **File Not Found**: Check that your input file path is correct -- **Network Issues**: Ensure you have a stable internet connection - -For more detailed information, see the main [README.md](README.md) file. \ No newline at end of file diff --git a/README.md b/README.md index 8896241..049cf96 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Markdown Translator -A powerful command-line tool that uses Google Gemini AI to translate markdown and MDX files from English to any specified language while preserving formatting and structure. +A powerful command-line tool that uses Claude AI to translate markdown and MDX files from English to any specified language while preserving formatting and structure. ## Usage at StarRocks @@ -20,7 +20,7 @@ This code and most of the README are from the team at [PlayCanvas](https://githu -o, --output Output file path (for single file translation) -d, --output-dir Output directory (for batch translation or single file) - -k, --key Google Gemini API key (or set GEMINI_API_KEY env var) + -k, --key Anthropic API key (or set ANTHROPIC_API_KEY env var) --flat Use flat structure in output directory (default: preserve structure) --suffix Custom suffix for output files (default: language @@ -33,7 +33,7 @@ This code and most of the README are from the team at [PlayCanvas](https://githu The translator now uses the AST pipeline by default. -When `--trace` is enabled, the tool logs one JSON trace record per ID and includes the full `sourceText` and `translatedText` values. The only masking applied is replacing any accidental `GEMINI_API_KEY` occurrences with `***`. +When `--trace` is enabled, the tool logs one JSON trace record per ID and includes the full `sourceText` and `translatedText` values. The only masking applied is replacing occurrences of the actual API key value with `***`. ### Interpreting AST parse failures @@ -53,10 +53,10 @@ How to read the outcome: ## Quick Start 1. cd into the root of this repo -2. Get a Gemini API Key -3. Export your Gemini API Key like so: +2. Get an Anthropic API Key +3. Export your Anthropic API Key like so: ```sh - export GEMINI_API_KEY="" + export ANTHROPIC_API_KEY="" ``` 4. Install the prerequisites: ```sh @@ -74,19 +74,19 @@ How to read the outcome: ## Example use on your workstation with the StarRocks repo ```sh - # Export your Gemini API key - export GEMINI_API_KEY="AIxxxxxxxxxxxxxxxx" + # Export your Anthropic API key + export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxxxxxx" - # in the markdown-translator repo directory install the translator globally on your system: + # in the doc-translator repo directory install the translator globally on your system: npm install npm link # now in the starrocks/starrocks repo dir # view the options: - md-translate translate -h + doc-translate translate -h # Example, translate the English architecture doc to Japanese: - md-translate translate -s en -i docs/en/introduction/Architecture.md -l ja -o docs/ja/introduction/Architecture.md + doc-translate translate -s en -i docs/en/introduction/Architecture.md -l ja -o docs/ja/introduction/Architecture.md ``` ## Example use in GitHub PRs @@ -108,14 +108,14 @@ How to read the outcome: - ๐Ÿ—๏ธ **Structure preservation** - Maintain directory structure or flatten output as needed - ๐Ÿ“Š **Progress tracking** - Real-time progress indication with spinners for single files and batches - ๐ŸŽจ **Beautiful CLI** - Colorful, user-friendly command-line interface -- โšก **Fast processing** - Optimized for speed with high-performance Gemini model +- โšก **Fast processing** - Optimized for speed with high-performance Claude model ## Installation ### Prerequisites - Node.js 16.0.0 or higher -- Google Gemini API key ([Get one here](https://aistudio.google.com/app/apikey)) +- Anthropic API key ([Get one here](https://console.anthropic.com/)) > **Note**: This tool uses ES modules (ESM) and requires Node.js 16+ for full compatibility. @@ -139,9 +139,9 @@ node bin/cli.js ## Setup -### 1. Get Google Gemini API Key +### 1. Get Anthropic API Key -1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey) +1. Visit [Anthropic Console](https://console.anthropic.com/) 2. Create a new API key 3. Copy the generated key @@ -150,13 +150,13 @@ node bin/cli.js **Option A: Environment Variable (Recommended)** ```bash -export GEMINI_API_KEY="your-api-key-here" +export ANTHROPIC_API_KEY="your-api-key-here" ``` **Option B: Command Line Argument** ```bash -md-translate translate -i file.md -l Spanish --key your-api-key-here +doc-translate translate -i file.md -l Spanish --key your-api-key-here ``` ## Usage @@ -165,16 +165,16 @@ md-translate translate -i file.md -l Spanish --key your-api-key-here ```bash # Translate README.md to Spanish -md-translate translate -i README.md -l Spanish +doc-translate translate -i README.md -l Spanish # Translate with custom output file -md-translate translate -i docs/guide.md -l French -o docs/guide_fr.md +doc-translate translate -i docs/guide.md -l French -o docs/guide_fr.md # Translate using API key argument -md-translate translate -i file.md -l German --key your-api-key +doc-translate translate -i file.md -l German --key your-api-key # Translate with AST mode (default) -md-translate translate -i examples/External_table.md -l Japanese +doc-translate translate -i examples/External_table.md -l Japanese ``` ### Batch Processing @@ -183,16 +183,16 @@ The tool supports batch processing of multiple markdown files using glob pattern ```bash # Translate all .md files in current directory -md-translate translate -i "*.md" -l Spanish -d ./spanish/ +doc-translate translate -i "*.md" -l Spanish -d ./spanish/ # Translate all markdown files in docs folder and subfolders -md-translate translate -i "docs/**/*.md" -l French -d ./translations/ +doc-translate translate -i "docs/**/*.md" -l French -d ./translations/ # Batch translate with flat structure (no subdirectories) -md-translate translate -i "content/**/*.md" -l German -d ./output/ --flat +doc-translate translate -i "content/**/*.md" -l German -d ./output/ --flat # Batch translate with custom suffix -md-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja" +doc-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja" ``` ### Available Commands @@ -200,7 +200,7 @@ md-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja" #### `translate` - Translate a markdown or MDX file ```bash -md-translate translate [options] +doc-translate translate [options] Options: -i, --input Input file path or glob pattern (required) @@ -208,7 +208,7 @@ Options: -l, --language Target language (required) -o, --output Output file path (for single file translation) -d, --output-dir Output directory (for batch translation or single file) - -k, --key Google Gemini API key (optional) + -k, --key Anthropic API key (optional) --flat Use flat structure in output directory (default: preserve structure) --suffix Custom suffix for output files (default: language name) --log-chunk-metadata Log API metadata for each chunk @@ -218,19 +218,19 @@ Options: #### `languages` - List supported languages ```bash -md-translate languages +doc-translate languages ``` #### `setup` - Show setup guide ```bash -md-translate setup +doc-translate setup ``` #### `--help` - Show help ```bash -md-translate --help +doc-translate --help ``` ## Supported Languages @@ -252,7 +252,7 @@ The tool supports 40+ languages including: #### Example 1: Basic Translation ```bash -md-translate translate -i README.md -l es +doc-translate translate -i README.md -l es ``` **Output**: Creates `README_spanish.md` with Spanish translation @@ -260,7 +260,7 @@ md-translate translate -i README.md -l es #### Example 2: Custom Output Path ```bash -md-translate translate -i docs/api.md -l fr -o docs/fr/api.md +doc-translate translate -i docs/api.md -l fr -o docs/fr/api.md ``` **Output**: Creates `docs/fr/api.md` with French translation @@ -268,7 +268,7 @@ md-translate translate -i docs/api.md -l fr -o docs/fr/api.md #### Example 3: Using API Key Argument ```bash -md-translate translate -i guide.md -l German --key AIzaSyC... +doc-translate translate -i guide.md -l German --key AIzaSyC... ``` #### Example 4: Large File Translation @@ -276,7 +276,7 @@ md-translate translate -i guide.md -l German --key AIzaSyC... The tool automatically handles large files by splitting them into chunks: ```bash -md-translate translate -i large-document.md -l ja +doc-translate translate -i large-document.md -l ja ``` ### Batch Translation @@ -284,7 +284,7 @@ md-translate translate -i large-document.md -l ja #### Example 5: Translate All Markdown Files ```bash -md-translate translate -i "*.md" -l Spanish -d ./spanish/ +doc-translate translate -i "*.md" -l Spanish -d ./spanish/ ``` **Output**: Translates all `.md` files in current directory to `./spanish/` folder @@ -292,7 +292,7 @@ md-translate translate -i "*.md" -l Spanish -d ./spanish/ #### Example 6: Recursive Translation with Structure Preservation ```bash -md-translate translate -i "docs/**/*.md" -l French -d ./translations/ +doc-translate translate -i "docs/**/*.md" -l French -d ./translations/ ``` **Output**: Translates all markdown files in `docs/` and preserves directory structure in `./translations/` @@ -317,7 +317,7 @@ translations/ #### Example 7: Flat Structure Batch Translation ```bash -md-translate translate -i "content/**/*.md" -l German -d ./output/ --flat +doc-translate translate -i "content/**/*.md" -l German -d ./output/ --flat ``` **Output**: Translates all files but places them in a flat structure (no subdirectories) @@ -342,7 +342,7 @@ output/ #### Example 8: Custom Suffix ```bash -md-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja" +doc-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja" ``` **Output**: Uses "ja" instead of "japanese" as the file suffix @@ -377,7 +377,7 @@ The tool provides detailed progress feedback for both single file and batch proc ``` โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•— โ•‘ Markdown Translator โ•‘ -โ•‘ Powered by Google Gemini AI โ•‘ +โ•‘ Powered by Claude AI โ•‘ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• ๐Ÿ“‹ Translation Details: @@ -400,7 +400,7 @@ The tool provides detailed progress feedback for both single file and batch proc ``` โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•— โ•‘ Markdown Translator โ•‘ -โ•‘ Powered by Google Gemini AI โ•‘ +โ•‘ Powered by Claude AI โ•‘ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• ๐Ÿ“‹ Batch Translation Details: @@ -434,7 +434,7 @@ The tool provides clear error messages for common issues: ### Project Structure ``` -markdown-translator/ +doc-translator/ โ”œโ”€โ”€ bin/ โ”‚ โ””โ”€โ”€ cli.js # CLI entry point โ”œโ”€โ”€ src/ @@ -454,7 +454,7 @@ This project uses **ES modules (ESM)** for modern JavaScript development: ### Key Dependencies -- `@google/generative-ai` - Google Gemini AI SDK +- `@anthropic-ai/sdk` - Anthropic Claude AI SDK - `commander` - Command-line interface framework - `chalk` - Terminal styling - `ora` - Progress spinners @@ -477,8 +477,8 @@ This project is licensed under the MIT License - see the LICENSE file for detail ### API Key Issues - Ensure your API key is valid and active -- Check that you have sufficient quota in your Google Cloud account -- Verify the API key has access to the Gemini API +- Check that you have sufficient quota in your Anthropic account +- Verify the API key is active in the Anthropic Console ### Large File Processing @@ -504,7 +504,7 @@ This project is licensed under the MIT License - see the LICENSE file for detail If you encounter any issues or have questions: 1. Check the troubleshooting section above -2. Run `md-translate setup` for configuration help +2. Run `doc-translate setup` for configuration help 3. Create an issue on the project repository --- diff --git a/bin/cli.js b/bin/cli.js index 9021c95..fa9e9bd 100755 --- a/bin/cli.js +++ b/bin/cli.js @@ -16,13 +16,13 @@ const program = new Command(); const banner = ` โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•— โ•‘ Markdown Translator โ•‘ -โ•‘ Powered by Google Gemini AI โ•‘ +โ•‘ Powered by Claude AI โ•‘ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• `; program -.name('md-translate') -.description('Translate markdown files using Google Gemini AI') +.name('doc-translate') +.description('Translate markdown files using Claude AI') .version('1.0.0'); program @@ -33,7 +33,7 @@ program .option('-s, --source ', 'Source language (default: English)') .option('-o, --output ', 'Output file path (for single file translation)') .option('-d, --output-dir ', 'Output directory (for batch translation or single file)') -.option('-k, --key ', 'Google Gemini API key (or set GEMINI_API_KEY env var)') +.option('-k, --key ', 'Anthropic API key (or set ANTHROPIC_API_KEY env var)') .option('--flat', 'Use flat structure in output directory (default: preserve structure)') .option('--suffix ', 'Custom suffix for output files (default: language name)') .option('--log-chunk-metadata', 'Log API metadata for each chunk') @@ -43,11 +43,10 @@ program try { // Get API key from options or environment - const apiKey = options.key || process.env.GEMINI_API_KEY; + const apiKey = options.key || process.env.ANTHROPIC_API_KEY; if (!apiKey) { - console.error(chalk.red('โŒ Error: Google Gemini API key is required.')); - console.log(chalk.yellow('Set GEMINI_API_KEY environment variable or use --key option')); - console.log(chalk.blue('Get your API key from: https://aistudio.google.com/app/apikey')); + console.error(chalk.red('โŒ Error: Anthropic API key is required.')); + console.log(chalk.yellow('Set ANTHROPIC_API_KEY environment variable or use --key option')); process.exit(1); } @@ -223,9 +222,8 @@ program } catch (error) { console.error(chalk.red(`\nโŒ Error: ${error.message}`)); - if (error.message.includes('API_KEY_INVALID')) { - console.log(chalk.yellow('Please check your Google Gemini API key')); - console.log(chalk.blue('Get your API key from: https://aistudio.google.com/app/apikey')); + if (error.message.includes('API_KEY_INVALID') || error.message.includes('authentication')) { + console.log(chalk.yellow('Please check your Anthropic API key')); } process.exit(1); } @@ -254,30 +252,30 @@ program console.log(row); } - console.log(chalk.yellow('\n๐Ÿ’ก Tip: You can also use any other language name that Gemini supports')); + console.log(chalk.yellow('\n๐Ÿ’ก Tip: You can also use any other language name that Claude supports')); }); program .command('setup') -.description('Setup guide for Google Gemini API key') +.description('Setup guide for Anthropic API key') .action(() => { console.log(chalk.cyan(banner)); console.log(chalk.blue('๐Ÿ”ง Setup Guide:')); console.log(''); - console.log(chalk.yellow('1. Get your Google Gemini API key:')); - console.log(chalk.gray(' Visit: https://aistudio.google.com/app/apikey')); + console.log(chalk.yellow('1. Get your Anthropic API key:')); + console.log(chalk.gray(' Visit: https://console.anthropic.com/')); console.log(''); console.log(chalk.yellow('2. Set your API key (choose one):')); console.log(chalk.gray(' Option A - Environment variable:')); - console.log(chalk.white(' export GEMINI_API_KEY="your-api-key-here"')); + console.log(chalk.white(' export ANTHROPIC_API_KEY="your-api-key-here"')); console.log(''); console.log(chalk.gray(' Option B - Command line argument:')); - console.log(chalk.white(' md-translate translate -i file.md -l Spanish --key your-api-key-here')); + console.log(chalk.white(' doc-translate translate -i file.md -l Spanish --key your-api-key-here')); console.log(''); console.log(chalk.yellow('3. Start translating:')); - console.log(chalk.white(' md-translate translate -i README.md -l Spanish')); + console.log(chalk.white(' doc-translate translate -i README.md -l Spanish')); console.log(''); - console.log(chalk.blue('๐Ÿ“š For more help: md-translate --help')); + console.log(chalk.blue('๐Ÿ“š For more help: doc-translate --help')); }); // Error handling diff --git a/package-lock.json b/package-lock.json index 98c3668..7c111c5 100644 --- a/package-lock.json +++ b/package-lock.json @@ -9,7 +9,7 @@ "version": "1.0.0", "license": "MIT", "dependencies": { - "@google/generative-ai": "^0.24.1", + "@anthropic-ai/sdk": "^0.95.2", "@playcanvas/eslint-config": "2.1.0", "chalk": "^5.4.1", "commander": "^14.0.0", @@ -33,6 +33,36 @@ "node": ">=16.0.0" } }, + "node_modules/@anthropic-ai/sdk": { + "version": "0.95.2", + "resolved": "https://registry.npmjs.org/@anthropic-ai/sdk/-/sdk-0.95.2.tgz", + "integrity": "sha512-Egddwo3sheo1PzUrMkZnH6VkQYwS0h/b/i8vSK8Ta9M45UQipAMeDFH57dYuDAfXMEUUGeKw6CMlremgMZgrSQ==", + "license": "MIT", + "dependencies": { + "json-schema-to-ts": "^3.1.1", + "standardwebhooks": "^1.0.0" + }, + "bin": { + "anthropic-ai-sdk": "bin/cli" + }, + "peerDependencies": { + "zod": "^3.25.0 || ^4.0.0" + }, + "peerDependenciesMeta": { + "zod": { + "optional": true + } + } + }, + "node_modules/@babel/runtime": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/runtime/-/runtime-7.29.2.tgz", + "integrity": "sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==", + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, "node_modules/@es-joy/jsdoccomment": { "version": "0.50.2", "resolved": "https://registry.npmjs.org/@es-joy/jsdoccomment/-/jsdoccomment-0.50.2.tgz", @@ -180,14 +210,6 @@ "node": "^18.18.0 || ^20.9.0 || >=21.1.0" } }, - "node_modules/@google/generative-ai": { - "version": "0.24.1", - "resolved": "https://registry.npmjs.org/@google/generative-ai/-/generative-ai-0.24.1.tgz", - "integrity": "sha512-MqO+MLfM6kjxcKoy0p1wRzG3b4ZZXtPI+z2IE26UogS2Cm/XHO+7gGRBh6gcJsOiIVoH93UwKvW4HdgiOZCy9Q==", - "engines": { - "node": ">=18.0.0" - } - }, "node_modules/@humanfs/core": { "version": "0.19.1", "resolved": "https://registry.npmjs.org/@humanfs/core/-/core-0.19.1.tgz", @@ -318,6 +340,12 @@ "resolved": "https://registry.npmjs.org/@rtsao/scc/-/scc-1.1.0.tgz", "integrity": "sha512-zt6OdqaDoOnJ1ZYsCYGt9YmWzDXl4vQdKTyJev62gFhRGKdx7mcT54V9KIjg+d2wi9EXsPvAPKe7i7WjfVWB8g==" }, + "node_modules/@stablelib/base64": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@stablelib/base64/-/base64-1.0.1.tgz", + "integrity": "sha512-1bnPQqSxSuc3Ii6MhBysoWCg58j97aUjuCSZrGSmDxNqtytIi0k8utUenAwTZN4V5mXXYGsVUI9zeBqy+jBOSQ==", + "license": "MIT" + }, "node_modules/@types/debug": { "version": "4.1.12", "resolved": "https://registry.npmjs.org/@types/debug/-/debug-4.1.12.tgz", @@ -1458,6 +1486,12 @@ "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz", "integrity": "sha512-DCXu6Ifhqcks7TZKY3Hxp3y6qphY5SJZmrWMDrKcERSOXWQdMhU9Ig/PYrzyw/ul9jOIyh0N4M0tbC5hodg8dw==" }, + "node_modules/fast-sha256": { + "version": "1.3.0", + "resolved": "https://registry.npmjs.org/fast-sha256/-/fast-sha256-1.3.0.tgz", + "integrity": "sha512-n11RGP/lrWEFI/bWdygLxhI+pVeo1ZYIVwvvPkW7azl/rOy+F3HYRZ2K5zeE9mmkhQppyv9sQFx0JM9UabnpPQ==", + "license": "Unlicense" + }, "node_modules/fault": { "version": "2.0.1", "resolved": "https://registry.npmjs.org/fault/-/fault-2.0.1.tgz", @@ -2346,6 +2380,19 @@ "resolved": "https://registry.npmjs.org/json-buffer/-/json-buffer-3.0.1.tgz", "integrity": "sha512-4bV5BfR2mqfQTJm+V5tPPdf+ZpuhiIvTuAB5g8kcrXOZpTT/QwwVRWBywX1ozr6lEuPdbHxwaJlm9G6mI2sfSQ==" }, + "node_modules/json-schema-to-ts": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/json-schema-to-ts/-/json-schema-to-ts-3.1.1.tgz", + "integrity": "sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==", + "license": "MIT", + "dependencies": { + "@babel/runtime": "^7.18.3", + "ts-algebra": "^2.0.0" + }, + "engines": { + "node": ">=16" + } + }, "node_modules/json-schema-traverse": { "version": "0.4.1", "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-0.4.1.tgz", @@ -4016,6 +4063,16 @@ "resolved": "https://registry.npmjs.org/spdx-license-ids/-/spdx-license-ids-3.0.21.tgz", "integrity": "sha512-Bvg/8F5XephndSK3JffaRqdT+gyhfqIPwDHpX80tJrF8QQRYMo8sNMeaZ2Dp5+jhwKnUmIOyFFQfHRkjJm5nXg==" }, + "node_modules/standardwebhooks": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/standardwebhooks/-/standardwebhooks-1.0.0.tgz", + "integrity": "sha512-BbHGOQK9olHPMvQNHWul6MYlrRTAOKn03rOe4A8O3CLWhNf4YHBqq2HJKKC+sfqpxiBY52pNeesD6jIiLDz8jg==", + "license": "MIT", + "dependencies": { + "@stablelib/base64": "^1.0.0", + "fast-sha256": "^1.3.0" + } + }, "node_modules/stdin-discarder": { "version": "0.2.2", "resolved": "https://registry.npmjs.org/stdin-discarder/-/stdin-discarder-0.2.2.tgz", @@ -4245,6 +4302,12 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/ts-algebra": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/ts-algebra/-/ts-algebra-2.0.0.tgz", + "integrity": "sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==", + "license": "MIT" + }, "node_modules/tsconfig-paths": { "version": "3.15.0", "resolved": "https://registry.npmjs.org/tsconfig-paths/-/tsconfig-paths-3.15.0.tgz", diff --git a/package.json b/package.json index 45e3899..e6baca1 100644 --- a/package.json +++ b/package.json @@ -1,11 +1,11 @@ { - "name": "markdown-translator", + "name": "doc-translator", "version": "1.0.0", - "description": "A command line tool to translate markdown and MDX files using Google Gemini AI", + "description": "A command line tool to translate markdown and MDX files using Claude AI", "type": "module", "main": "index.js", "bin": { - "md-translate": "bin/cli.js" + "doc-translate": "bin/cli.js" }, "scripts": { "start": "node bin/cli.js", @@ -19,14 +19,15 @@ "markdown", "docusaurus", "translation", - "gemini", + "claude", + "anthropic", "ai", "cli" ], "author": "Your Name", "license": "MIT", "dependencies": { - "@google/generative-ai": "^0.24.1", + "@anthropic-ai/sdk": "^0.95.2", "@playcanvas/eslint-config": "2.1.0", "chalk": "^5.4.1", "commander": "^14.0.0", diff --git a/src/configs/system_prompt.txt b/src/configs/system_prompt.txt index 9804fe7..d629184 100644 --- a/src/configs/system_prompt.txt +++ b/src/configs/system_prompt.txt @@ -71,9 +71,28 @@ ${never_translate} 3. **PRESERVE MARKDOWN TABLES**: Do not change the structure of Markdown tables. Keep the columns and rows exactly as provided. - Each row must have exactly the same number of `|`-separated cells as the header row. + - Every table row must begin AND end with a `|` character. For example: `| cell1 | cell2 |` โ€” never omit the trailing `|`. - Do NOT insert `|` characters inside cell content โ€” a `|` inside a cell is interpreted as a column separator and will break the table structure. If a cell contains a link followed by more text, keep it all in the same cell: `| text [link](url) more text |`, never `| text |[link](url) more text |`. -4. **PRESERVE DETAILS BLOCKS**: When content appears inside `
` blocks, do not add or remove any indentation. Preserve the exact leading whitespace of every line exactly as it appears in the source. For example, if the source has: +4. **PRESERVE ADMONITION BLOCK INDENTATION**: When a `:::note`, `:::tip`, `:::important`, `:::caution`, or `:::warning` block is indented (e.g., because it appears inside a list item), ALL content inside the block โ€” including the closing `:::` โ€” must preserve the same leading indentation as the opening `:::`. For example, if the source has: + ``` + 4. Step four. + + :::note + Some note text. + ::: + ``` + The translation must keep the same indentation for the content and the closing `:::`: + ``` + 4. ใ‚นใƒ†ใƒƒใƒ—4ใ€‚ + + :::note + ๆณจๆ„ไบ‹้ …ใฎใƒ†ใ‚ญใ‚นใƒˆใ€‚ + ::: + ``` + Never remove leading spaces from content or the closing `:::` of an indented admonition block. + +5. **PRESERVE DETAILS BLOCKS**: When content appears inside `
` blocks, do not add or remove any indentation. Preserve the exact leading whitespace of every line exactly as it appears in the source. For example, if the source has: ```
Title diff --git a/src/translator.js b/src/translator.js index 3a878cf..02c29c8 100644 --- a/src/translator.js +++ b/src/translator.js @@ -1,6 +1,6 @@ import path from 'path'; -import { GoogleGenerativeAI } from '@google/generative-ai'; +import Anthropic from '@anthropic-ai/sdk'; import chalk from 'chalk'; import fs from 'fs-extra'; import { glob } from 'glob'; @@ -8,20 +8,14 @@ import { glob } from 'glob'; class MarkdownTranslator { constructor(apiKey) { if (!apiKey) { - throw new Error('Google Gemini API key is required'); + throw new Error('Anthropic API key is required'); } this.apiKey = apiKey; - this.genAI = new GoogleGenerativeAI(apiKey); + this.client = new Anthropic({ apiKey }); this.neverTranslateTerms = []; - this.modelName = 'gemini-2.5-flash'; - this.model = this.genAI.getGenerativeModel({ - model: this.modelName, - generationConfig: { - temperature: 0 - } - }); + this.modelName = 'claude-sonnet-4-6'; console.log(chalk.gray(`Using model: ${this.modelName} (temperature: 0)`)); @@ -243,14 +237,33 @@ class MarkdownTranslator { } extractChunkMetadata(response) { - const candidate = response?.candidates?.[0]; return { - finishReason: candidate?.finishReason || null, - safetyRatings: candidate?.safetyRatings || null, - promptFeedback: response?.promptFeedback || null, - usageMetadata: response?.usageMetadata || null, - candidates: response?.candidates?.length ?? null + finishReason: response?.stop_reason || null, + safetyRatings: null, + promptFeedback: null, + usageMetadata: response?.usage || null, + candidates: null + }; + } + + getResponseText(response) { + return response.content + .filter(block => block.type === 'text') + .map(block => block.text) + .join(''); + } + + async callModel(userPrompt, systemPrompt) { + const params = { + model: this.modelName, + max_tokens: 8096, + temperature: 0, + messages: [{ role: 'user', content: userPrompt }] }; + if (systemPrompt) { + params.system = systemPrompt; + } + return await this.client.messages.create(params); } translateFile() { diff --git a/src/translator_ast_mvp.js b/src/translator_ast_mvp.js index 9dd8dd2..23b0c41 100644 --- a/src/translator_ast_mvp.js +++ b/src/translator_ast_mvp.js @@ -37,12 +37,17 @@ class AstMarkdownTranslator extends MarkdownTranslator { return `__MTX_CODE_${id}__`; } - buildNeverTranslatePlaceholder(id) { - return `__MTX_NEVER_${id}__`; + buildNeverTranslatePlaceholder(term) { + let hash = 0; + for (let i = 0; i < term.length; i++) { + hash = ((hash << 5) - hash) + term.charCodeAt(i); + hash |= 0; + } + return `__MTX_NEVER_${Math.abs(hash).toString(16).padStart(8, '0')}__`; } isFullyProtected(text) { - return typeof text === 'string' && /^(\s*__MTX_NEVER_\d+__\s*)+$/.test(text); + return typeof text === 'string' && /^(\s*__MTX_NEVER_[0-9a-f]+__\s*)+$/.test(text); } isSkippableTextParent(parentType) { @@ -64,6 +69,8 @@ class AstMarkdownTranslator extends MarkdownTranslator { let output = text; const sortedTerms = [...this.neverTranslateTerms].sort((a, b) => b.length - a.length); + const termToPlaceholder = new Map(replacements.map(r => [r.value, r.placeholder])); + const placeholderToTerm = new Map(replacements.map(r => [r.placeholder, r.value])); for (const term of sortedTerms) { if (!term) { @@ -73,16 +80,61 @@ class AstMarkdownTranslator extends MarkdownTranslator { const escapedTerm = this.escapeForRegex(term); const pattern = new RegExp(escapedTerm, 'g'); - output = output.replace(pattern, () => { - const placeholder = this.buildNeverTranslatePlaceholder(replacements.length + 1); + // Resolve placeholder, detecting hash collisions + let placeholder = termToPlaceholder.get(term); + if (!placeholder) { + placeholder = this.buildNeverTranslatePlaceholder(term); + if (placeholderToTerm.has(placeholder) && placeholderToTerm.get(placeholder) !== term) { + let counter = 2; + const base = placeholder.slice(0, -2); + while (placeholderToTerm.has(`${base}_${counter}__`)) { + counter++; + } + placeholder = `${base}_${counter}__`; + } + } + + // Only register if term actually appears in this text + const replaced = output.replace(pattern, placeholder); + if (replaced === output) { + continue; + } + + if (!termToPlaceholder.has(term)) { + termToPlaceholder.set(term, placeholder); + placeholderToTerm.set(placeholder, term); replacements.push({ placeholder, value: term }); - return placeholder; - }); + } + + output = replaced; } return output; } + validateNeverTranslatePlaceholders(translatedEntries, replacements) { + const validPlaceholders = new Set(replacements.map(r => r.placeholder)); + const pattern = /__MTX_NEVER_\w+__/g; + const warnings = []; + + for (const entry of translatedEntries) { + if (typeof entry.text !== 'string') { + continue; + } + const matches = entry.text.match(pattern); + if (!matches) { + continue; + } + for (const match of matches) { + if (!validPlaceholders.has(match)) { + warnings.push(`entry id ${entry.id}: corrupted placeholder ${match}`); + } + } + } + + return warnings; + } + protectNeverTranslateEntries(entries) { if (!Array.isArray(entries) || entries.length === 0) { return { entries, replacements: [] }; @@ -440,7 +492,7 @@ class AstMarkdownTranslator extends MarkdownTranslator { const systemPrompt = this.renderSystemPrompt(sourceLanguage, targetLanguage); const payload = JSON.stringify(items); - const taskPrompt = + const userPrompt = `Translate each item's text from ${sourceLanguage} to ${targetLanguage}.\n\n` + 'Response format requirements:\n' + '1) Return ONLY a JSON array.\n' + @@ -449,19 +501,18 @@ class AstMarkdownTranslator extends MarkdownTranslator { '4) Do not add or remove items.\n' + '5) Do not include explanations or markdown code fences.\n' + '6) Tokens matching __MTX_CODE___ are protected placeholders for inline code. Keep them exactly unchanged. Do not translate, split, remove, or rename them.\n' + - '7) Tokens matching __MTX_NEVER___ are protected placeholders for never-translate terms. Keep them exactly unchanged. Do not translate, split, remove, or rename them.\n\n' + + '7) Tokens matching __MTX_NEVER___ (where is an 8-character hexadecimal string like __MTX_NEVER_3fa8c201__) are protected placeholders for never-translate terms. Copy each token character-for-character into your output. Do not alter, simplify, renumber, or replace the hex hash with any other value.\n\n' + `Input JSON:\n${payload}`; - if (systemPrompt) { - return `${systemPrompt}\n\n### TASK ###\n${taskPrompt}`; - } - - return taskPrompt; + return { system: systemPrompt || null, user: userPrompt }; } createAstTranslationRepairPrompt(items, targetLanguage, sourceLanguage, parseErrorMessage) { - const basePrompt = this.createAstTranslationPrompt(items, targetLanguage, sourceLanguage); - return `${basePrompt}\n\nYour previous response could not be parsed as JSON (${parseErrorMessage}). Return STRICT valid JSON only.`; + const { system, user } = this.createAstTranslationPrompt(items, targetLanguage, sourceLanguage); + return { + system, + user: `${user}\n\nYour previous response could not be parsed as JSON (${parseErrorMessage}). Return STRICT valid JSON only.` + }; } parseJsonArrayFromModelText(text) { @@ -539,32 +590,32 @@ class AstMarkdownTranslator extends MarkdownTranslator { } async requestParsedAstItems(items, targetLanguage, sourceLanguage) { - const prompt = this.createAstTranslationPrompt(items, targetLanguage, sourceLanguage); - const result = await this.model.generateContent(prompt); - const response = await result.response; + const { system, user } = this.createAstTranslationPrompt(items, targetLanguage, sourceLanguage); + const response = await this.callModel(user, system); const metadata = this.extractChunkMetadata(response); + const text = this.getResponseText(response); try { return { - translatedItems: this.parseJsonArrayFromModelText(response.text()), + translatedItems: this.parseJsonArrayFromModelText(text), metadata, repairMetadata: null, parseWarnings: [] }; } catch (initialParseError) { - const repairPrompt = this.createAstTranslationRepairPrompt( + const { system: repairSystem, user: repairUser } = this.createAstTranslationRepairPrompt( items, targetLanguage, sourceLanguage, initialParseError.message ); - const repairResult = await this.model.generateContent(repairPrompt); - const repairResponse = await repairResult.response; + const repairResponse = await this.callModel(repairUser, repairSystem); const repairMetadata = this.extractChunkMetadata(repairResponse); + const repairText = this.getResponseText(repairResponse); try { return { - translatedItems: this.parseJsonArrayFromModelText(repairResponse.text()), + translatedItems: this.parseJsonArrayFromModelText(repairText), metadata, repairMetadata, parseWarnings: [`initial parse failed: ${initialParseError.message}`] @@ -764,7 +815,7 @@ class AstMarkdownTranslator extends MarkdownTranslator { console.warn(chalk.yellow( `[table] Extra column at line ${i + 1} of ${outputPath} ` + `(expected ${expectedCols}, got ${cells.length}). ` + - `Review manually: node ~/GitHub/markdown-translator/bin/cli.js translate ` + + `Review manually: node ~/GitHub/doc-translator/bin/cli.js translate ` + `-s en -i ${inputPath} -l ja -o ${outputPath}` )); warned = true; @@ -774,6 +825,64 @@ class AstMarkdownTranslator extends MarkdownTranslator { return warned; } + fixAdmonitionIndentation(content) { + const lines = content.split('\n'); + const result = []; + const admonitionStack = []; + let inCodeBlock = false; + let codeFenceChar = null; + let codeFenceLen = 0; + + for (const line of lines) { + const fenceMatch = line.match(/^\s*([`~]{3,})/); + if (fenceMatch) { + const fenceChar = fenceMatch[1][0]; + const fenceLen = fenceMatch[1].length; + if (!inCodeBlock) { + inCodeBlock = true; + codeFenceChar = fenceChar; + codeFenceLen = fenceLen; + } else if (fenceChar === codeFenceChar && fenceLen >= codeFenceLen) { + inCodeBlock = false; + codeFenceChar = null; + codeFenceLen = 0; + } + result.push(line); + continue; + } + + if (inCodeBlock) { + result.push(line); + continue; + } + + const openMatch = line.match(/^(\s*):::\w/); + const closeMatch = !openMatch && line.match(/^(\s*):::[ \t]*$/); + + if (openMatch) { + admonitionStack.push(openMatch[1].length); + result.push(line); + } else if (closeMatch && admonitionStack.length > 0) { + const expectedIndent = admonitionStack[admonitionStack.length - 1]; + const actualIndent = closeMatch[1].length; + result.push(actualIndent < expectedIndent + ? `${' '.repeat(expectedIndent)}:::` + : line); + admonitionStack.pop(); + } else if (admonitionStack.length > 0 && line.trim() !== '') { + const expectedIndent = admonitionStack[admonitionStack.length - 1]; + const actualIndent = line.length - line.trimStart().length; + result.push(actualIndent < expectedIndent + ? `${' '.repeat(expectedIndent)}${line.trimStart()}` + : line); + } else { + result.push(line); + } + } + + return result.join('\n'); + } + restoreInlineCodePlaceholders(content, inlineCodePlaceholders) { let output = content; @@ -990,6 +1099,62 @@ class AstMarkdownTranslator extends MarkdownTranslator { `fallback_chunks=${fallbackChunkCount} fallback_items=${fallbackItemCount}`) ); + const neverTranslateWarnings = this.validateNeverTranslatePlaceholders( + translatedEntries, + neverTranslateReplacements + ); + + if (neverTranslateWarnings.length > 0) { + for (const warning of neverTranslateWarnings) { + console.warn(chalk.yellow(`[never-translate] ${warning}`)); + } + + const corruptedIds = new Set( + neverTranslateWarnings.map(w => parseInt(w.match(/entry id (\d+)/)?.[1], 10)).filter(Boolean) + ); + const corruptedEntries = translatedEntries.filter(e => corruptedIds.has(e.id)); + + if (corruptedEntries.length > 0) { + console.log(chalk.yellow(`[never-translate] retrying ${corruptedEntries.length} entr${corruptedEntries.length === 1 ? 'y' : 'ies'} with corrupted placeholders`)); + const validPlaceholders = neverTranslateReplacements.map(r => r.placeholder).join(', '); + const { system } = this.createAstTranslationPrompt([], targetLanguage, sourceLanguage); + const retryUserPrompt = + `Translate each item's text from ${sourceLanguage} to ${targetLanguage}.\n\n` + + 'CRITICAL: The following tokens are protected placeholders that must be copied exactly as-is into your output:\n' + + `${validPlaceholders}\n\n` + + 'Do not alter, simplify, renumber, or substitute these tokens in any way.\n\n' + + 'Response format: Return ONLY a JSON array. Keep each id exactly as-is. Do not add or remove items. Do not include explanations or markdown code fences.\n\n' + + `Input JSON:\n${JSON.stringify(corruptedEntries)}`; + + const retryResponse = await this.callModel(retryUserPrompt, system); + const retryText = this.getResponseText(retryResponse); + + try { + const retryTranslatedItems = this.parseJsonArrayFromModelText(retryText); + const retryMerged = this.mergeAstTranslationItems(corruptedEntries, retryTranslatedItems); + const retryWarnings = this.validateNeverTranslatePlaceholders(retryMerged.merged, neverTranslateReplacements); + + if (retryWarnings.length < neverTranslateWarnings.length) { + const retryById = new Map(retryMerged.merged.map(e => [e.id, e.text])); + for (let i = 0; i < translatedEntries.length; i++) { + if (retryById.has(translatedEntries[i].id)) { + translatedEntries[i] = { ...translatedEntries[i], text: retryById.get(translatedEntries[i].id) }; + } + } + const resolved = neverTranslateWarnings.length - retryWarnings.length; + console.log(chalk.green(`[never-translate] retry resolved ${resolved}/${neverTranslateWarnings.length} corrupted placeholder${resolved === 1 ? '' : 's'}`)); + for (const w of retryWarnings) { + console.warn(chalk.yellow(`[never-translate] still corrupted after retry: ${w}`)); + } + } else { + console.warn(chalk.yellow('[never-translate] retry did not improve placeholder fidelity, keeping original')); + } + } catch { + console.warn(chalk.yellow('[never-translate] retry response could not be parsed, keeping original')); + } + } + } + const restoredNeverTranslateEntries = this.restoreNeverTranslateEntries( translatedEntries, neverTranslateReplacements @@ -997,6 +1162,7 @@ class AstMarkdownTranslator extends MarkdownTranslator { let translatedContent = this.restoreTranslatedContent(skeleton, restoredNeverTranslateEntries); translatedContent = this.restoreInlineCodePlaceholders(translatedContent, inlineCodePlaceholders); + translatedContent = this.fixAdmonitionIndentation(translatedContent); if (this.isEnglishTarget(targetLanguage)) { return this.normalizeEnglishInlineCodeSpacing(translatedContent); }