
あなたが言語を学ぶのが好きなら(またはあなたがそれらを教えるのが好きなら)、あなたはおそらく並行読書のような言語を学ぶそのような方法に出くわしたでしょう。それはあなたが文脈に没頭するのを助け、語彙を増やしそして学習を楽しくします。私の意見では、文法と音声学の基礎がすでに習得されているので、誰も教科書と教師をキャンセルしていないので、ロシア語のテキストと並行して原文のテキストを読む価値があります。しかし、読書に関しては、自分の好みに合ったもの、またはすでに馴染みのあるものや愛されているものを選びたいと思うでしょう。そのようなバージョンの並行本を誰もリリースしていないため、これはしばしば不可能です。また、英語を学んでいないが、従来の日本語やハンガリー語を学んでいる場合、並行翻訳で興味深い資料を見つけることはまったく困難です。
今日、私たちはこの状況を是正するための決定的な一歩を踏み出します。
. " " .
( , ):
TO KILL A MOCKINGBIRD by Harper Lee DEDICATION for Mr. Lee and Alice in consideration of Love & Affection Lawyers, I suppose, were children once. Charles Lamb PART ONE 1 When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem’s fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He couldn’t have cared less, so long as he could pass and punt.
, , - . 1 , , . , , . ; , . - .
, :
, .
, . :
- . (, ..), . .
- , - .
- , (, , ), , .
lingtrain-aligner, python, . , . .
, . . , 50- , — . , , . , , .
, :
- .
- .
- razdel .
- , .
, .
.
, . , . .
| %%%%%title. | ||
| %%%%%author. | ||
| %%%%%h1. %%%%%h2. %%%%%h3. %%%%%h4. %%%%%h5. | ||
| %%%%%divider. | ||
| %%%%%. |
: [.,:,!?] , .
- ( , , , ).
- .
- (H1 , H5 ). , .
- , , ( ).
, . , :
TO KILL A MOCKINGBIRD%%%%%title. by Harper Lee%%%%%author. %%%%%divider. PART ONE%%%%%h1. 1%%%%%h2. When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem’s fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He couldn’t have cared less, so long as he could pass and punt. ...
%%%%%author. %%%%%title. %%%%%divider. %%%%%h1. 1%%%%%h2. , , . , , . ; , . - . ...
"" (" ", " " ..) h1, h2. .
Colab
Colab . , . . html .
, .
:
pip install lingtrain-aligner
:
from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper
:
text1_input = "harper_lee_ru.txt" text2_input = "harper_lee_en.txt" with open(text1_input, "r", encoding="utf8") as input1: text1 = input1.readlines() with open(text2_input, "r", encoding="utf8") as input2: text2 = input2.readlines()
SQLite ( ) lang_from lang_to. , :
db_path = "db/book.db" lang_from = "ru" lang_to = "en" models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"] model_name = models[0]
:
splitter.get_supported_languages()
, , xx, . sentence_transformer_multilingual 50+ , sentence_transformer_multilingual_labse 100+ .
:
text1_prepared = preprocessor.mark_paragraphs(text1) text2_prepared = preprocessor.mark_paragraphs(text2)
:
splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from, leave_marks=True) splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to, leave_marks=True)
aligner.fill_db(db_path, splitted_from, splitted_to)
. batch_size, window, . , . . , , .
batch_ids = [0,1,2,3] aligner.align_db(db_path, \ model_name, \ batch_size=100, \ window=30, \ batch_ids=batch_ids, \ save_pic=False, embed_batch_size=50, \ normalize_embeddings=True, \ show_progress_bar=True )
! , . vis_helper. 400, , batch_size=400. , , batch_size=50, 4 -.
vis_helper.visualize_alignment_by_db(db_path, output_path="alignment_vis.png", \ lang_name_from=lang_from, \ lang_name_to=lang_to, \ batch_size=400, \ size=(800,800), \ plt_show=True)

. , . :
- .
- , . , , , .
- .
- . " " , . , , . .
. , .
. , , — , , .
. .
. , , , . , 10,11,12 15,16,17 . . , . . resolver.
:
conflicts_to_solve, rest = resolver.get_all_conflicts(db_path, min_chain_length=2, max_conflicts_len=6)
conflicts to solve: 46 total conflicts: 47
conflicts_to_solve , , rest .
:
resolver.get_statistics(conflicts_to_solve) resolver.get_statistics(rest)
('2:3', 11) ('3:2', 10) ('3:3', 8) ('2:1', 5) ('4:3', 3) ('3:5', 2) ('6:4', 2) ('5:4', 1) ('5:3', 1) ('2:4', 1) ('5:6', 1) ('4:5', 1) ('8:7', 1)
, 2:3 3:2, , , .
:
resolver.show_conflict(db_path, conflicts_to_solve[10])
124 , . 125 , , — . 126 . 122 The Radley Place jutted into a sharp curve beyond our house. 123 Walking south, one faced its porch; the sidewalk turned and ran beside the lot.
, 125 126 , [124]-[122] [125,126]-[123]. ? , . , , , . :
- [124,125]-[122] // [126]-[123]
- [124]-[122] // [125,126]-[123]
, , — 2 ( ) 6. , . , , .
:
steps = 3
batch_id = -1 #
for i in range(steps):
conflicts, rest = resolver.get_all_conflicts(db_path, min_chain_length=2+i, max_conflicts_len=6*(i+1), batch_id=batch_id)
resolver.resolve_all_conflicts(db_path, conflicts, model_name, show_logs=False)
vis_helper.visualize_alignment_by_db(db_path, output_path="img_test1.png", batch_size=400, size=(800,800), plt_show=True)
if len(rest) == 0:
break
:

:

book.db. .
, , . :
resolver.fix_start(db_path, model_name, max_conflicts_len=20)
resolver.fix_end(db_path, model_name, max_conflicts_len=20)
reader.
from lingtrain_aligner import reader
, , :
paragraphs_from, paragraphs_to, meta = reader.get_paragraphs(db_path, direction="from")
direction ["from", "to"] . (, ) .
create_book():
reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html")
:

html . , pdf, .
. , . template.
reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="pastel_fill")

reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="pastel_start")

, , .
template="custom" styles. CSS , .
, :
my_style = [ '{}', '{"background": "#fafad2"}', ] reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="custom", styles=my_style)

span' :
my_style = [ '{"background": "linear-gradient(90deg, #FDEB71 0px, #fff 150px)", "border-radius": "15px"}', '{"background": "linear-gradient(90deg, #ABDCFF 0px, #fff 150px)", "border-radius": "15px"}', '{"background": "linear-gradient(90deg, #FEB692 0px, #fff 150px)", "border-radius": "15px"}', '{"background": "linear-gradient(90deg, #CE9FFC 0px, #fff 150px)", "border-radius": "15px"}', '{"background": "linear-gradient(90deg, #81FBB8 0px, #fff 150px)", "border-radius": "15px"}' ] reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="custom", styles=my_style)

[2] Google Colab.
[3] Sentence Transformers .
[4] Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation