1 - Introduction to formal language

The theory of formal languages {}{}was born within the domain of linguistics as a means of understanding the syntactic regularities of natural languages. In computer science, formal languages {}{}are used inter alia as a basis for defining programming languages.  Language theory is both a mathematical, computer and linguistic branch, a formal language is a set of symbol strings with a set of rules that are specific to it.  The alphabet of a formal language is the set of symbols or letters ... from which the chains of language can be formed; Often it is necessary to be finite. The strings formed from this alphabet are called words and the words that belong to a particular formal language are sometimes called well-formed words or well-formed formulas. A formal language is often defined by means of a formal grammar such as a regular grammar or a grammar out of context, also called its formation rule.  The domain of formal language theory mainly studies the purely syntactic aspects of these languages, that is, their internal structures.

2 - Alphabets, Words and Languages

begin{example} $Sigma ={0,1,2,...}=% %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion $  is an alphabet , $484,275,107,...$  are words end{example}  begin{notation} The set of words on $Sigma $  will be noted  $Sigma ^{ast }$ end{notation}  begin{notation} The word that does not contain any letter is called, the empty word denoted  $wedge $ or $varepsilon .$ end{notation}  begin{remark} We are interested only in the form of a word which may be arbitrary and we omit the meaning of the word example: avhbrtys, jorbqhegd, ... are words. end{remark}
subsection{Formal Language}  begin{definition} A formal language is set of words  ( ie a subset of  $Sigma ^{ast }$ $)$ end{definition}  begin{example} ${$ $CRMEF,Oujda,$ $Hay,$ $AlMassira$ $}$ is language. end{example}  begin{example} ${Lambda ,a,aa,aaa,...}={a^{n}/nin  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion }$ is infinite language generated by a finite alphabet $Sigma ={a}.$ end{example}

3 - Operations on words and languages

section{Operations on words and languages}  subsection{Words concatenation}  begin{definition} The concatenation of two words  $x$  and  $y$  is the word noed by  $x.y $  or simply by $xy$ defined by : newline $x.y=xy$  ie  if  $x=x_{1}x_{2}...x_{r}$  and  $y=y_{1}y_{2}...y_{s}$  then  $x.y=x=x_{1}x_{2}...x_{r}$ $y_{1}y_{2}...y_{s}$ end{definition}  begin{example} If $ x=langu$  and  $y=age$  then $x.y=language$ end{example}
subsection{Concatenation of  langauges}  begin{definition} Let E and F be two languages of lthe alphabet $Sigma .$ The concatenation  of  $E$  and  $F$  is the language defined by : newline $E.F={x.y/(x,y)in Etimes F}$ end{definition}  begin{example} $E={u,v}$ ,  $F={uv,vu}$  $E.F={uuv,uvu,vuv,vvu}$ end{example}  $E.F$  Will be noted later by  $EF$  begin{definition} A language is said to be closed for the law "."  If it is stable for concatenation ( i.e $forall x,yin E$  we have: $x.yin E$ end{definition}  begin{example} ${x^{n}/nin  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion }$  is closed language $  $but the language ${x^{n}y^{m}/(n,m)in  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion times  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion $ $}$ is not closed. end{example}

begin{definition} Let $E$  be a language, the smallest closed language for the concatenation containing $E$ and ${wedge }$ Is called the closure of $E$  and is noted by  $E^{ast }.$ $E^{ast }$ is also called Kleene closure or Kleene star. end{definition}  begin{remark} $E^{ast }=underset{kgeq 0}{cup E^{k}}$  ( where  $E^{k}$ is the concatenation $E.E...E$ ) end{remark}  begin{example} Let  $E$  be a language and $ x$  a word of $E$  then we have:newline $1)$ $ {x}^{ast }={x^{n}/nin  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion }$newline $2)$  ${xx}^{ast }={x^{2n}/nin  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion }$newline $3)$  ${xx,xxx}^{ast }={x^{n}/ngeq 2}cup {Lambda }$newline $4)$  ${0,1}^{ast }$  Is the set of all natural integers containing only the numbers $0$  or  $1.$newline $5)$  $varnothing ^{ast }={Lambda }$ end{example}

4 - Combinatorics on words

section{Combinatorics of words}  subsection{Length of words, prefixes and suffixes}  begin{definition} The length of a word $x$ denoted $|x|$ , is the number of symbols that compose it. Example if $x=arbre$ alors $ |x|=5$ end{definition}  begin{notation} Let $x$ be a word of $Sigma ^{ast }$ and $sigma $ be a symbol of $Sigma , $ the number of occurrences of $sigma $ in $x$ is denoted  by $ |x|_{sigma }.$ end{notation}  begin{example} $|arbre|_{r}=2,$  $|arbre|_{a}=1$ , $|abbcb|_{b}=3$ end{example}  begin{definition} Let  $x=x_{1}x_{2}...x_{r}$ be a word of $Sigma ^{ast }$ $ $the words $% :wedge ,$ $x_{1},$ $x_{1}x_{2},...,x_{1}x_{2}...x_{r}$ [ respectively $% wedge ,$ $x_{r},$ $x_{r-1}x_{r},$ $x_{r-2}x_{r-1}x_{r}$ ] Are called the prefixes of $x$ [ respectively the suffixes of $x]$ end{definition}
subsection{Parikh function's}  Let $Sigma $ a finite and ordered alphabet, And can therefore be considered as a n-tuple $Sigma =(sigma _{1},sigma _{2},...,sigma _{n})$ the map $% psi :sum^{ast }rightarrow  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion ^{n}$  ,$xlongmapsto (|x|_{sigma _{1}},|x|_{sigma _{2}},...,|x|_{sigma _{n}})$ is called textbf{Parikh function's}  begin{remark} For $n>1$ the parikh function's is not injective. end{remark}  begin{notation} for any word $x=x_{1}x_{2}...x_{r}$  we put  $% widetilde{x}=x_{r}x_{r-1}...x_{1}$ , and for any language $E$  we put : $% widetilde{E}={widetilde{x}/xin E}$ , $widetilde{x}$ is called the mirror image of $x$, and  $widetilde{E}$ the mirror image of  $E.$ end{notation}  begin{definition} We call palindrome, every word $x$, verifying : $widetilde{x}=x$. end{definition}  begin{example} laval,  radar  are palindromes. end{example}

5 - Regular Language

section{Regular Language }  Let L be a language over alphabet $Sigma $ We say that L is regular or rational if L can be obtained recursively by:  1) from the only basic languages:newline $-$ language $emptyset $newline $-$ language ${wedge }$  ( Language formed only by the textbf{empty} symbol$)$newline $-$ Language of the form ${a}$ where $ain Sigma $newline $2)$By using only the following operations :newline $-$ Unions operationsnewline $-$ Concatenationnewline $-$ Kleene's star  begin{example} The language ${x^{n}/nin  %TCIMACRO{U{2115} }% %BeginExpansion mathbb{N} %EndExpansion }$  regular end{example}  begin{example} The language $L$ of the words of even length is rational since $L=(Sigma ^{2})^{ast }$ end{example}  begin{remark} Every finite language is regular end{remark}  begin{proposition} If $L$ is a regular language then $L^{ast }$ is also. end{proposition}  begin{proposition} If $L_{1}$ $et L_{2}$ are regular then  $L_{1}cdot L_{2}$ , $L_{1}cup L_{2}$  and $L_{1}cap L_{2}$  are regular. end{proposition}

6 - Residual Language

section{Residual Language}  begin{definition} Let $L$ be a language over alphabet $Sigma $ and $u$ a word of  $Sigma ^{ast }$, we get new language by putting $u^{-1}L=L/u={vin Sigma ^{ast }/uvin L}.$ $L/u$ is called the residual language of the language $L$ with respect to word $u.$ end{definition}  begin{example} $L={uuv,uw,vv,uu}$ alors $L/u={uv,w,u}$ end{example}  begin{theorem} (Myhill Nerode theorem's) A language is recognizable if and only if it has only a finite number of residuals. end{theorem}

7 - Topology on formal language

section{Topology on formal language}  subsection{Metric over language}  For any language $L$ on an alphabet $Sigma $ we define a distance $% d:Ltimes Llongrightarrow  %TCIMACRO{U{211d} }% %BeginExpansion mathbb{R} %EndExpansion _{+}$  $(u,v)longmapsto 2^{|uwedge v|}$ where  $uwedge v$ is the longest common prefix, moreover this distance is ultrametric $d(u,w)leq max (d(u,v),d(v,w))$  subsection{Topology inductive limit on Kleene closure $X^{ast }$}  We define a topology on  $X^{(k)}=X.X...X$ (The language on the alphabet $X$ formed of the words of length $k$) by conventioning that a part $O$ of $% X^{(k)}$ is open, if it is, any union  of the sets of the form : $% O_{1}.O_{2}...O_{k}$ ( Concatenation of $O_{i}$ ) where $O_{1},...,O_{k}$ are opens subsets of  $X.$ Then we equip the closing Kleene$X^{ast }$ of the inductive limit topology associated at the family of canonical injections $i_{k}:X^{(k)}hookrightarrow X^{ast }$  begin{remark} $X$  can be considered as a topological subspace of $X^{(k)}$ and consequently a topological subspace of $X^{ast } $by means of continuous injections $i:Xhookrightarrow X^{(k)}$ $ xlongmapsto x.varpi ...varpi $  where $varpi $ is the empty word. end{remark}  begin{remark} Every language $L$ on the alphabet $X$ is considered as a topological subspace of $X^{ast }$ for the induced topology. end{remark}

Younes Derfoufi

Leave a Reply