Friday, December 31, 2021

Easy books to learn Russian - 1: Children’s Literature

Introduction

I am a Russian language learner. This year, which ends today, I have started reading Russian literature. Russian literature is famous, phenomenal and very interesting, but much of it is way too difficult for beginners.

Searching for easy literature I encountered that many sites list the classics, Chekhov, Tolstoy, Pushkin, and even though these authors did write some easier works, or works for children - it's still old, it contains many words not used anymore, and often still hard...

What I liked was this advice here on Quora: "However, for a simple, yet good language I would advice you to read children literature of Soviet era (don't worry there is no brainwashing propaganda to turn you into Commie)."

As a sort of poll, I asked many Russian colleagues, teachers and acquaintances for their favourite children’s books. Surprisingly, they often named translations. For example: Pinocchio, The Three Musketeers, Tarzan. Nice, probably, but not my target.

Also my Russian colleagues advised me to read “Dunno on the Moon” - and I tried, around two years ago, but it was then still too difficult. But this year I read it! And I’ve read more books, a bit more than ten. This blog contains my top list.

Soviet Children’s Literature

So I started reading Soviet Children's literature and I liked it! Some books are a bit harder than others, and sometimes I needed translations, but in the end I managed to read and understand these books and to enjoy them! Below is the report.

Apart from Soviet writers, I also read some translations which are well-known in Russia. For example: Karlsson is originally Swedish, but there are two popular Soviet animated films. Most Russian watched them and know it.

This link gives some of the Soviet books I read, and some others. I recently encountered it, it was not my guideline.

Readability Index

I was looking for a tool to measure the level of a book, and found some measures such as the Automated Readability Index. It’s a cool index which can be applied to any text - without even knowing in which language it is written.

This Readability Index (and it’s variants) gave surprising results, such as the fact that “Dunno on the Moon” is harder to read than Dostoevski's Crime and Punishment! Therefore (and because I’m a programmer) I made a variant which does not have that problem. I’ll describe it later in another blog. I called it "Text Readability Index".

Ranked list of recommended books

Based on this figures, and on my own experience, I present a list here a list with ordered recommendations:

Title Author Page count % of top 2500 words Text Readability Index Automated Readability Index
Uncle Fedya, His Dog, and His CatEduard Uspensky6574 %2.703.27
Vitya Maleev at School and at HomeNikolay Nosov16078 %6.465.28
The Adventures of Dunno and his FriendsNikolay Nosov13270 %5.617.12
The Drummer's FateArkady Gaidar10873 %6.285.72
Charlie and the Chocolate FactoryRoald Dahl8769 %6.667.60
Karlsson on the RoofAstrid Lindgren8972 %6.596.62
Dunno in Sun CityNikolay Nosov25571 %9.019.13
Chuk and GekArkady Gaidar3173 %8.176.45
Harry Potter and the Philosopher's StoneJ. K. Rowling33270 %9.118.30
Dunno on the MoonNikolay Nosov44769 %10.2810.08

This list is ordered by the ratings and by my own feeling. I will not describe all the books in detail, links are provided above, and there are many reviews available of these books.

What I can tell is that I've become a fan of Nosov: all these four books are great! Especially Dunno on the Moon is an epic work - and never translated into Dutch. What a miss... Dunno on the Moon was considerably harder to read than Dunno in Sun City, which is also a great book. Self-driving cars, robot vacuum cleaners, it describes current world - but it was written in 1958.

The book of Uspensky is also a must read, it is well known in Russia, one of my colleagues could still recite some citations by heart (mainly from the cat Matrushkin).

Gaidar is sometimes a bit harder to read, but I loved the poetic story of Chuk and Gek (there is a useful English translation too which can be downloaded from archive.org). The Drummer's Fate was a easier to read, cool story! There is a Soviet film based on this book, with the same title.

From the translated work, Karlsson is also cool. It is a trilogy and I read the other two books too - but I think the first one is by far the best. As said above there is a famous Soviet animation film. A must see.

Charlie and the Chocolate Factory is a book I had never read before and I wanted to read it, so why not in Russian... And it was worth it!

And (of course) there is Harry Potter, I actually started with that one, at the beginning of the year, not really knowing Nosov yet, but it was still quite hard. But now, at the end of the year, I can more easily read it. Nice proof of progress!

Wednesday, November 10, 2021

C++20 concepts and traits

Intro

I like writing libraries. In 2007 I refactored the heart of our library using types into a library using templates. This looked a bit like:

struct point { double x, y; };

and then some functions using these types (there was also a polygon, etc).

When I was satisfied with the results I submitted this as a preview to Boost. The general reaction was: interesting, but it is not concept based, and therefore not generic enough.

So I had to make the types concept based, but I didn't know what it was. And I found out that concepts were not yet supported by the language - so how could I use them... That became all clear, with help of reviewers and joiners. The library was refactored again into a concept based library, it was accepted and it is now known as Boost.Geometry.

Now (14 years later) the C++ language has support for concepts.

This blog is targeted to the approach Boost.Geometry uses. It uses traits to adapt normal structs or classes to concepts. Functions in the library use those concepts. 

The example in  this blog

We start again with a point, and a function calculating the distance:

struct point { double x, y; };
template <typename Point>
double distance_using_point(const Point& p1, const Point& p2)
{
  auto sqr = [](const double v) -> double { return v * v; };
  return std::sqrt(sqr(p1.x - p2.x) + sqr(p1.y - p2.y));
}


This function is template based but not generic. It only works for any point type having .x and .y member variables. But suppose it is not a member variable but a function .x(). Or you are using a std::array with two elements [0] and [1] (denoting x, y).

Traits as free functions

We will use traits for that. There are more ways to define and use traits. A first, concise way (not recommended! see below for a better version) is to just define free templated functions:
 
namespace point_traits
{
  template <typename P> double get_x(const P& p);
  template <typename P> double get_y(const P& p);
}

and these functions can then be specialized for the type:
 
namespace point_traits
{
  template<> inline double get_x<point>(const point& p) { return p.x; }
  template<> inline double get_y<point>(const point& p) { return p.y; }
}

And this works! Suppose you are to write a distance function, using these traits, it could look like this:

template <typename P1, typename P2>
double
calculate_distance(const P1& p1, const P2& p2)
{
  auto sqr = [](const double x) -> double { return x * x; };
  return std::sqrt(sqr(point_traits::get_x<P1>(p1) - point_traits::get_x<P2>(p2))
                 + sqr(point_traits::get_y<P1>(p1) - point_traits::get_y<P2>(p2)));
}

If you use a std::array, then you can specialize it too:

namespace point_traits
{
  template<> inline double get_x<std::array<double, 2>>(const std::array<double, 2>& p) { return p[0]; }
  template<> inline double get_y<std::array<double, 2>>(const std::array<double, 2>& p) { return p[1]; }
}
 

And you can calculate the distance of two different types, for example:

  point mp{1, 1};
  std::array<double, 2> sa{2, 2};
  std::cout << calculate_distance(mp, sa) << std::endl;


It writes: 1.41421 as expected.

 

 

Evaluation of traits-as-free-functions 


These traits give the example already a kind of concept based look. And it works, but there are some problems:

  • functions cannot be specialized partially - this can be inconvenient
  • suppose we want floats too, and long doubles, and we want to have that coordinate type defined in a meta-function. Where to define it?
  • suppose we forget a traits adaptation: there is not appearing any compiler error message! Because it knows the generic templated function, and will not complain. It gives a linker error.
  • it is (therefore) not suitable for C++20 concepts 

The linker error, in case the specialization for point is forgotten or not right, will look like:

undefined reference to `double point_traits::get_x<point>(point const&) 
undefined reference to `double point_traits::get_y<point>(point const&) 
 
The linker just can't find these definitions. Compilation was fine. You can live with this, but there is a better alternative.

Concepts 

Now we will try to make a C++ 20 concept using these traits:

template <typename P> 
concept IsPoint = requires (P p) 
{ 
  point_traits::get_x<P>(p);
  point_traits::get_y<P>(p);
};

template <typename P1, typename P2>
requires IsPoint<P1> && IsPoint<P2>
double calculate_distance(const P1& p1, const P2& p2)
{
  // the same implementation
}


So we added, in bright magenta:
  1. a new concept IsPoint. That concept tries to use the functions in namespace point_traits. If they are not there, the type P does not fulfill the concept.
  2. a require clause decorating the distance function, stating that both point types should fulfill the IsPoint concept.

How cool is that!

However, even using this concept, there is the same linker error message. There is no neat compiler message about concepts, which we wished and maybe expected...

 

Traits as a struct 

To fix this, the traits can be modeled as structures. So we adapt the code, we introduce a structure (here replacing the namespace - but that is only for the sake of the sample - these struct traits are often placed in an own namespace). And the structure is short!

template <typename P> struct point_traits {};

So it is an empty structure. The magic happens in the specializations:

template<> struct point_traits<point> 
{
  static double get_x(const point& p) { return p.x; }
  static double get_y(const point& p) { return p.y; }
};

template<> struct point_traits<std::array<double, 2>> 
{
  static double get_x(const std::array<double, 2>& p) { return p[0]; }
  static double get_y(const std::array<double, 2>& p) { return p[1]; }
};

It looks nearly the same, just that now the struct is templatized with point or the std::array, instead of the free functions. And the free functions are now static member functions.

The concept is also similar, it's just the <P> part moving a bit left to the structure:

template <typename P> 
concept IsPoint = requires (P p) 
{ 
  point_traits<P>::get_x(p); 
  point_traits<P>::get_y(p); 
};

And so does the final generic concept-based distance function:

template <typename P1, typename P2>
requires IsPoint<P1> && IsPoint<P2>
double calculate_distance(const P1& p1, const P2& p2)
{
  auto sqr = [](const double x) -> double { return x * x; };
  return std::sqrt(sqr(point_traits<P1>::get_x(p1) - point_traits<P2>::get_x(p2)) 
                 + sqr(point_traits<P1>::get_y(p1) - point_traits<P2>::get_y(p2)));
}

And if we now forget a specialization? We get a neat error message. In clang (13) it reads:

prog.cc:62:16: error: no matching function for call to 'calculate_distance'
  std::cout << calculate_distance(mp, sa) << std::endl;
               ^~~~~~~~~~~~~~~~~~
prog.cc:45:8: note: candidate template ignored: constraints not satisfied [with P1 = point, P2 = std::array<double, 2>]
double calculate_distance(const P1& p1, const P2& p2)
       ^
prog.cc:43:10: note: because 'point' does not satisfy 'IsPoint'
requires IsPoint<P1> && IsPoint<P2>
         ^
prog.cc:35:20: note: because 'point_traits<P>::get_x(p)' would be invalid: no member named 'get_x' in 'point_traits<point>'
  point_traits<P>::get_x(p); 
                   ^
1 error generated. 
 
You can use C++20 and concepts, for example, in Wandbox.

Samples from libraries

Boost.Geometry uses this approach, also generalized for coordinate type and dimension. Here is the adaptation of the get function for std::array (in namespace boost::geometry::traits). It's partially specialized.
 
template <typename CoordinateType, std::size_t DimensionCount, std::size_t Dimension>
struct access<std::array<CoordinateType, DimensionCount>, Dimension>
{
    static inline CoordinateType get(std::array<CoordinateType, DimensionCount> const& a)
    {
        return a[Dimension];
    }
};

MapBox uses a similar approach, here is the adaptation for our point (in namespace mapbox::util):

template <> struct nth<0, point> 
{
  inline static auto get(const point &t) { return t.x; }
};
template <> struct nth<1, point> 
{
  inline static auto get(const point &t) { return t.y; }
};

Summary

The blog above shows, shortly, how to define traits, preferably as structs, and how to define a concept and use this concept in a free function distance.

Wednesday, May 5, 2021

C++20 Concepts

C++20 Concepts


This week (2021) I'm at the C++Now conference. The conference is normally in Aspen, Colorado (I was there more than 10 years ago), but this year it's online, for obvious reasons.

Yesterday there were two excellent interesting talks from Jeff Garland about concepts in C++20, how it works, what you can do with it, how you can use them and how you can write them.

Some compilers (notably clang) don't yet support concepts, but you can already play with toy projects online using https://godbolt.org, where your code is compiled and run on the fly.

So here is my toy project, which I wrote during the talks (there were two consecutive talks):

 

#include <array>
#include <type_traits>
#include <iostream>

template <typename C> concept ValidCoordinateType = std::is_arithmetic_v<C>;
template <int D> concept ValidDimension = D >= 2 and D <= 3;
template <int D> concept HasZ = D >= 3;

template <typename C, int D>
requires(ValidDimension<D> and ValidCoordinateType<C>)
struct mypoint
{
  mypoint() = default;
 
  mypoint(C x, C y) 
    : coors{x, y} {}
 
  // Only available for 3D
  mypoint(C x, C y, C z) requires HasZ<D>
    : coors{x, y, z} {}
 
  auto x() const { return coors[0]; }
  auto y() const { return coors[1]; }

  // Avoids compilation for Dim < 3
  auto z() const requires HasZ<D> { return coors[2]; }
 
private :
   std::array<C, D> coors;
};

struct mytype {};

int main()
{
  mypoint<double, 2> two(1, 2);
  mypoint<float, 3> three(3, 4, 5);
  std::cout << "Hi " << two.x() << " " << two.y() << " " << three.z() << "\n";

  // These declarations will not compile
  //mypoint<mytype, 2> p1;
  //mypoint<double, 4> p2;
 
  // This line will not compile
  //std::cout << two.z(); // Fails because .z() is not available

  return 0;
}

So this code is using C++20 concepts. You can see it clearly at and around the keywords concept and requires. It is really cool, especially compared with what we had to do with C++03 (Boost.Geometry was written in C++03 and only a few releases ago went to C++14) to achieve this.

Some things could earlier be done with static_assert too (such as checking the number of dimensions), but now the compiler neatly warns that that is better written using a concept.

Other things needed SFINAE earlier (such as the enabling  / disabling the functions for the Z coordinate).

If the last declarations in main are uncommented, you get a neat compiler error report (for instance for p2):

<source>: In function 'int main()':
<source>:40:20: error: template constraint failure for 'template<class C, int D> requires (ValidDimension<D>) && (ValidCoordinateType<C>) struct mypoint'
40 | mypoint<double, 4> p2; // Fails because 4 coordinates are not allowed by the concept
| ^
<source>:40:20: note: constraints not satisfied
<source>:6:26: required for the satisfaction of 'ValidDimension<D>' [with D = 4]
<source>:6:56: note: the expression 'D <= 3 [with D = 4]' evaluated to 'false'
6 | template <int D> concept ValidDimension = D >= 2 and D <= 3;
| ~~^~~~

Wow, it's only a few lines! And so clear! In the past we got hundreds of hard to interpret error messages about templates...

And if you uncomment the line streaming z for a 2d coordinate, you get:

<source>: In function 'int main()':
<source>:43:22: error: no matching function for call to 'mypoint<double, 2>::z()'
43 | std::cout << two.z();
| ^
<source>:23:8: note: candidate: 'auto mypoint<C, D>::z() const requires HasZ<D> [with C = double; int D = 2]'
23 | auto z() const requires HasZ<D>
| ^
<source>:23:8: note: constraints not satisfied
<source>: In instantiation of 'auto mypoint<C, D>::z() const requires HasZ<D> [with C = double; int D = 2]':
<source>:43:22: required from here
<source>:7:26: required for the satisfaction of 'HasZ<D>' [with D = 2]
<source>:7:35: note: the expression 'D >= 3 [with D = 2]' evaluated to 'false'
7 | template <int D> concept HasZ = D >= 3;
| ~~^~~~

Thursday, December 31, 2020

Playing one sentence from an audio file

 

I’m using Lazarus to develop some tools for myself for language immersion. What I want is to hear the same sentence again and again, until I can fully understand it, I know what it means, I can recognize the individual words (and in which form they are), and I can reproduce it. That takes (depending on the language, the sentence, its length, and level) a lot of repetitions. I usually prefer seven.

DuoLingo is a great app - but you have to press that sound button again and again. Besides that, the voices are artifical. Also in Rosetta Stone you have to press the sound button over and over. Apart from that, Rosetta Stone has the advantage that the voices are human, and that you don’t see the translation (that is usually an advantage, but can in some cases be a drawback too).

Anyway, I’ve a program which plays sentences again and again. I feed the program with sentences from books. I split the book text into sentences (with a home made tool) and I add translations.

Yes, translations: sometimes I need them to understand what the reader is saying, and in most (literary) books a vocabulary is used that exceeds my current level. And the books I use are usually older because I get the voices from sites like archive.org or librevox, and these are books in the public domain.

I’m using deepl.com for translations. Their translations are acceptable to understand what’s going on. There can be errors but I can correct them. Deepl translations are, in my experience, better than translations from yandex or google.

So my program reads aloud sentences again and again, but how? Using bass.dll! That’s a cool library which is free (for personal use) and works on Windows and on Linux (and more), and works with Lazarus (and more). Its documentation is good but lacks some examples. For instance, I want to repeat that sentence over and over...

Playing a sentence (as a part of a large mp3) is possible with bass.dll and Lazarus. You can play from a certain position and add a timer, and stop the sound on the timer event. But timers are in general cumbersome and imprecise. It works most of the time but... if you’re computer is busy with some other task, you sometimes hear the start of the next sentence. And even if that’s just the start (say a “ST”), it’s annoying. On Linux, if I run it on my Windows computer in a virtual box, it’s even worse: most of the time it exceeds the scheduled time. So a timer can’t be used for this purpose.

Bass.dll offers another way to play a part of an mp3 and that is a sync (BASS_ChannelSetSync). That works but... if I also get sound levels in between (and I want that too), it often crashes. It apparently can’t do that during the loop back and we can’t exactly predict when that happens. I tried to fix that but I couldn’t, so I gave up. Besides this, the code was more complex because of the callbacks involved.

But there are more possibilities. I found two ways which both work. The first way is to write a WAV file either on memory or even on disk, and play that WAV. There is a source code example converting to WAV (to disk) so that was easy to adapt, and in memory it works the same way. The second way is to use a bass “sample” and play that back. There were no complete examples of that, but combining some other examples I got it working. This is the method I finally selected because it is, code wise, simple, and precise (without any timer involved) and reliable (no crashes observed).

A minimal sample doing this is presented here below and I hope it is useful for anyone who wants to do a similar thing. The code here below is Lazarus (free pascal) code, but it will be straightforward to use it in another language.

procedure PlayPart;
const bufferSize = 10000;
var
inputChannel: HSTREAM;
info: BASS_CHANNELINFO;
outputChannel : HSTREAM;
sample : HSAMPLE;

buffer : array [1..bufferSize] of byte;
b1, b2 : QWORD;
bytesRead : DWORD;
memo : TMemoryStream;

begin
// You should open the channel in the decode mode
inputChannel := BASS_StreamCreateFile(false, PChar('c:\data\books\Gaidar\DrummersFate\audio\01.mp3'), 0, 0, BASS_STREAM_DECODE);
if inputChannel <= 0 then exit;

BASS_ChannelGetInfo(inputChannel, info);

// Specify here the begin and end times
b1 := BASS_ChannelSeconds2Bytes(inputChannel, 223.0);
b2 := BASS_ChannelSeconds2Bytes(inputChannel, 225.33);

BASS_ChannelSetPosition(inputChannel, b1, BASS_POS_BYTE);

memo := TMemoryStream.Create;
try
while (BASS_ChannelIsActive(inputChannel) = 1) and (b1 < b2) do
begin
bytesRead := BASS_ChannelGetData(inputChannel, @buffer, min(bufferSize, b2 - b1));
memo.Write(buffer, bytesRead);
inc(b1, bytesRead);
end;
BASS_StreamFree(inputChannel);

// Create a sample, with the input sound specifications, and max one simultaneous playback.
// Specify a flag to automatically repeat it.
// If that is not necessary, a flag of 0 is fine for this example.
sample := BASS_SampleCreate(memo.size, info.freq, info.chans, 1, BASS_SAMPLE_LOOP);

// Fill it with the read data
BASS_SampleSetData(sample, memo.memory);
finally
memo.free;
end;

// Create a channel (this is required) and play it.
outputChannel := BASS_SampleGetChannel(sample, true);
BASS_ChannelPlay(outputChannel, true);
end;

This is the whole code (except for initialization of the bass library itself, but that should always be done, once, and is straightforward).

It will play this sentence:



As far as I know, it's not possible to save this part as an mp3 (as published here). But you can save it as a WAV file. There are numerous tools to extract parts from a large mp3 and save them. What I needed was to play a part, with software. And for that I use bass and Lazarus.

Wednesday, December 31, 2014

Vectorization with square point buffers

Vectorization with square point buffers


This year (2014) the buffer algorithm of Boost.Geometry was finally released in Boost 1.56

With buffer we mean a zone around a geometry. This is the GIS term, also used by OGC. In other environments it is often known as inflate / deflate, or shrink and expand, or Minkowski sum.

In GIS the buffer (with a positive distance) creates a larger version of e.g. a polygon. If a negative distance is specified, a smaller version is created.

As the documentation shows, various strategies can be used to:
  • control the buffer-distance (either symmetric or asymmetric)
  • control the joins (rounded or miter)
  • control the ends for linestrings (rounded or flat)
  • control the sides (straight)
  • control the zones around points (or degenerated linestrings) (circular or square)
Strategies are provided, but they can also be provided by the library user, e.g. to create some fancy effects.

This short blog gives an example of the strategies for zones around points, using the square version. This strategy has the nice property that can be used to convert grids or images to vectors (polygons).

strategy::buffer::point_square


See also the Wiki page. Pixels are converted to polygons. Suppose we add all centers of all filled pixels (e.g. the black pixels) into a multi point. We then buffer that multi-point with a distance of half the pixel size. And we use the strategy to create square buffers around each point. Then all adjacent gridcells will be connected. A polygon (or a multi-polygon) will be created, which is exactly what we wanted...

Suppose we have the following image (from this source) of a black cat.


Cat (pixels)

We create a multi-point of it, that is done manually:

MULTIPOINT(1 0,2 0,3 0,4 0,5 0,6 0,7 0,   0 1,1 1,2 1,3 1,4 1,5 1,7 1,8 1,  
0 2,1 2,2 2,3 2,4 2,5 2,8 2, 0 3,1 3,2 3,3 3,4 3,5 3,8 3,   0 4,1
4,2 4,3 4,4 4,5 4,8 4, 0 5,1 5,2 5,3 5,4 5,5 5,8 5,   1 6,2 6,3 6,4
6,7 6,8 6,   1 7,2 7,3 7,4 7,7 7, 1 8,2 8,3 8,4 8,7 8,   2 9,3
9,   6 9,7 9,   1 10,2 10,3 10,4 10,   1 11,2 11,3 11,4
11,   1 12,2 12,3 12,4 12,   1 13,4 13)

If we use the next code fragment:

    typedef boost::geometry::model::d2::point_xy<double> point;
    typedef boost::geometry::model::polygon<point> polygon;
    boost::geometry::strategy::buffer::distance_symmetric<double> distance_strategy(0.5);
    boost::geometry::strategy::buffer::join_round join_strategy;
    boost::geometry::strategy::buffer::end_round end_strategy;
    boost::geometry::strategy::buffer::point_square point_strategy;
    boost::geometry::strategy::buffer::side_straight side_strategy;
    boost::geometry::model::multi_polygon<polygon> result;
    boost::geometry::model::multi_point<point> mp;
    boost::geometry::read_wkt(cat, mp); // the cat std::string is somewhere defined
    boost::geometry::buffer(mp, result,
                distance_strategy, side_strategy,
                join_strategy, end_strategy, point_strategy);


The resulting polygon will look like this:

Cat (polygon)