Kulinganisha na kulinganua, ama, kuenenza pt. 2

 

Linga - compare, match, harmonize: ~ nguo - try on clothes, be measured for clothes. (verb)

Lingana - be equal, be similar, match. (adjective)

Linganifu - matching; corresponding; regular; symmetrical. (adjective/adverb)

Linganisha - compare, equate, correlate; harmonize; musical key. (verb)

Linganua - differentiate, make a contrast; distinguish. (reversive of verb)

Etymology

-lingana (infinitive kulingana)

From Proto-Bantu *-dɪ̀ngana.

Entry 995 at Bantu Lexical Reconstructions 3. https://www.africamuseum.be/

Kulinganisha na kulinganua, also, kuenenza mean compare & contrast.

 

Can algorithms handle compare & contrast?

Understanding context, and measuring the source and magnitude of words, are important in doing compare & contrast analysis to judge impact. This requires common sense and heuristics. Word sense means the sense in which a word is used. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence. These meanings are usually built over a long period of time as a convention and standard, but never arbitrarily by sudden whims of individuals or groups.

Free speech is a cherished value among many human societies. However, with advanced methods of information integration, this value has been under assault by agents with various motives, using such integrations to victimize individuals they don’t like. The most prevalent form of this assault today is social media bans. This may seem benign at the moment but has the potential to morph into other forms of autocracy that could ban people from accessing loans based on frivolous non-financial factors or even ban people from accessing housing, travel documents, government services, health services, water, electricity and so forth. Some of these are fundamental human rights that should never be denied under any circumstance.

The European Union approved a regulation which requires that citizens have a “right to explanation” in relation to any algorithmic decision-making. The European Union General Data Protection Regulation (enacted 2016, taking effect 2018) provides a legally disputed form of a right to an explanation, stated as such in Recital 71: "[the data subject should have] the right ... to obtain an explanation of the decision reached".

However, the extent to which the regulations themselves provide a "right to explanation" is heavily debated. There are two main strands of criticism. There are significant legal issues with the right as found in Article 22 — as recitals are not binding, and the right to an explanation is not mentioned in the binding articles of the text, having been removed during the legislative process. In addition, there are significant restrictions on the types of automated decisions that are covered — which must be both "solely" based on automated processing and have legal or similarly significant effects — which significantly limits the range of automated systems and decisions to which the right would apply. In particular, the right is unlikely to apply in many of the cases of algorithmic controversy that have been picked up in the media.

In the United States, the same regulation is being used in processing “credit scores” for loans by “The Consumer Financial Protection Bureau” formed after the 2007-08 financial crash. Under the Equal Credit Opportunity Act (Regulation B of the Code of Federal Regulations), Title 12, Chapter X, Part 1002, §1002.9, creditors are required to notify applicants who are denied credit with specific reasons for the detail.

Creditors comply with this regulation by providing a list of reasons (generally at most 4, per interpretation of regulations), consisting of a numeric reason code (as identifier) and an associated explanation, identifying the main factors affecting a credit score. An example might be:

“32: Balances on bankcard or revolving accounts too high compared to credit limits.”

Number 32 is the numeric reason code. Other reasons would have different number codes.

Word embedding for differentiation.

In natural language processing (NLP) for machine learning, a word embedding is a representation of a word. Word and phrase embeddings are used to boost the performance in NLP tasks such as syntax analysis and sentiment analysis. Typically, the representation of the word is a numerically-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. The term “vector space” simply means context within which the word is being used. This vector space is also expressed numerically in vector form. Therefore, the vector of the word is measured in relation to the vector of the context (vector space). Generally, word embedding vectors are defined by the context in which those words appear. Put simply, “a word is characterized by the company it keeps”. To generate these vectors, a number of unsupervised algorithmic techniques have been proposed which includes applying neural networks and constructing a co-occurrence matrix. To fine tune the results, it is proposed that this should be followed by supervised techniques like dimensionality reduction, probabilistic distribution models and even explicit representation and consideration of words appearing in a context which would require direct human input.

Currently existing word embedding techniques do not benefit from the rich semantic information present in structured or semi-structured text. Instead they are trained over a large corpus, such as a Wikipedia "dump" , social media posts, or collection of news articles, where any structure is ignored.

Moreover, the dimensionality of word sense is very high even in a very small community of people. This would make algorithmic calculations inadequate in doing such measurements since “concentration of measure” techniques meant to collapse dimensionality can hardly measure up to cultural dynamism, social relations, slangs, and even gaffes like spoonerisms, malapropisms, catachresis. Contemporary examples being united states president Joe Biden and world heavyweight champion Mike Tyson with their catalogue of irregular verbal gaffes. It becomes abit tricky to pin them down as either an innocent error, a comic act, or intentional misinformation to evade responsibility.

Embedding technique for images

The algorithmic technique used in compare & contrast analysis for images is termed ‘triplet loss’. Triplet loss is a loss function for machine learning algorithms where a reference image (called anchor) is compared to a matching image (called positive) and a non-matching image (called negative). Typically, the distance from the anchor to the positive is minimized, and the distance from the anchor to the negative input is maximized. By enforcing the order of distances, triplet loss models embed in the way that a pair of samples with same labels are smaller in distance to each other than those with different labels. In face recognition, triplet loss is used to learn good embeddings (or “encodings”) of faces. In the embedding space, faces from the same person should be close together and form well separated clusters.

For example, a satirist may want to algorithmically edit himself into an image or video of a world economic forum meeting in Davos. The algorithm should be able to cluster the different characters that the satirist wants to use in his skit so that the video flows coherently. The algorithm would cluster the facial recognition images in this manner:

The arrangement of colour shades (embeddings) illustrates which images would be in the same class to each other. The anchor and positive have similar sequence of shades while the negative has a different sequence. This automatically differentiates and clusters the negative image into a different group regardless of the differing dimensions of the anchor and positive image. However, the more distance between the positive and anchor image, the more complex the facial recognition becomes. For example, if the positive image illustrated above was a frowning Obama bending his neck, the algorithm would need more training with closer matches in the same class to properly cluster for more accurate future outputs.

In triplet mining, the different combinations that are likely to be used can be categorized into three:

1.     easy triplets: triplets which have a loss of 0, because the negative is very far from the anchor compared to the positive.

2.     hard triplets: triplets where the negative is closer to the anchor than the positive.

3.     semi-hard triplets: triplets where the negative is not closer to the anchor than the positive, but which still have positive loss.

 

References and further reading:

arXiv:1310.4546 [cs.CL].

arXiv:1702.06891 [cs.CL]

https://www.creditscoring.com/creditscore/fico/factors/reason-codes.html

Edwards, Lilian; Veale, Michael (2017). "Slave to the algorithm? Why a "right to an explanation" is probably not the remedy you are looking for". Duke Law and Technology Review.

Jurafsky, Daniel; H. James, Martin (2000). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall.

Olivier Moindrot (2018). Triplet Loss and Online Triplet Mining in TensorFlow. Retrieved from https://omoindrot.github.io/triplet-loss 22nd jan 2024

Socher, Richard; et al. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP.

TUKI (2001), Kamusi Ya Kiswahili-Kiingereza; Swahili-English Dictionary. Published by Taasisi ya Uchunguzi wa Kiswahili (TUKI), Chuo Kikuu cha Dar es Salaam, Tanzania.


Comments