Message-ID:

6 May, 2024: The networking issue during the past two days has been identified and appears to be fixed. Will keep monitoring.

interests / soc.culture.puerto-rico / Re: PageRank for Large Language Models

PageRank for Large Language Models

<ff8e2c94-fd3c-45f0-b8d9-e7c8b4fad590n@googlegroups.com>

https://novabbs.com/interests/article-flat.php?id=503&group=soc.culture.puerto-rico#503

copy link Newsgroups: soc.culture.puerto-rico

X-Received: by 2002:a05:620a:2057:b0:77d:84f0:9cb1 with SMTP id d23-20020a05620a205700b0077d84f09cb1mr823294qka.4.1701566149532;
Sat, 02 Dec 2023 17:15:49 -0800 (PST)
X-Received: by 2002:a05:6808:221b:b0:3b8:b10a:d67f with SMTP id
bd27-20020a056808221b00b003b8b10ad67fmr1019484oib.9.1701566148862; Sat, 02
Dec 2023 17:15:48 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: soc.culture.puerto-rico
Date: Sat, 2 Dec 2023 17:15:48 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=2601:58b:8402:eb0:d655:e6f2:b83e:9954;
posting-account=fSOIfwoAAADqqAwX8P2B1C393tRiXZ6q
NNTP-Posting-Host: 2601:58b:8402:eb0:d655:e6f2:b83e:9954
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ff8e2c94-fd3c-45f0-b8d9-e7c8b4fad590n@googlegroups.com>
Subject: PageRank for Large Language Models
From: johnnyba...@gmail.com (johnny babylon)
Injection-Date: Sun, 03 Dec 2023 01:15:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: johnny babylon - Sun, 3 Dec 2023 01:15 UTC

I’m developing a methodology that resembles pagerank for llm. A cross sectional rating system for llm corpus of knowledge using metrics like word count volume, traffic metrics, and document frequency to rank or page rank llm semantic blocks by social popularity. I have evidence that my cyberstalking victim Irma Becerra met with Larry Brin and Sergey Page before either my expert locator was published or the billion dollar google patent was filed in September 2001 and my intellectual property was compromised when they opined that I was not capable of competing as a Puerto Rican minority in the private market with my invention

https://patents.google.com/patent/US7516123B2/en

https://www.searchenginejournal.com/pagerank-replaced/316933/

https://www.semrush.com/blog/pagerank/

https://python.langchain.com/docs/get_started/introduction

https://www.turing.com/kb/a-comprehensive-guide-to-named-entity-recognition

https://cdn.aaai.org/FLAIRS/2001/FLAIRS01-054.pdf

Re: PageRank for Large Language Models

<05250ce7-8bd2-4055-8c64-33d1ffc9b816n@googlegroups.com>

copy mid

https://novabbs.com/interests/article-flat.php?id=504&group=soc.culture.puerto-rico#504

copy link Newsgroups: soc.culture.puerto-rico

X-Received: by 2002:ac8:5291:0:b0:423:f4e6:3dc6 with SMTP id s17-20020ac85291000000b00423f4e63dc6mr331752qtn.7.1701695087953;
Mon, 04 Dec 2023 05:04:47 -0800 (PST)
X-Received: by 2002:a05:6871:5806:b0:1fa:fc2d:161c with SMTP id
oj6-20020a056871580600b001fafc2d161cmr2640883oac.10.1701695086130; Mon, 04
Dec 2023 05:04:46 -0800 (PST)
Path: i2pn2.org!i2pn.org!news.nntp4.net!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: soc.culture.puerto-rico
Date: Mon, 4 Dec 2023 05:04:45 -0800 (PST)
In-Reply-To: <ff8e2c94-fd3c-45f0-b8d9-e7c8b4fad590n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2601:58b:8402:eb0:d655:e6f2:b83e:9954;
posting-account=fSOIfwoAAADqqAwX8P2B1C393tRiXZ6q
NNTP-Posting-Host: 2601:58b:8402:eb0:d655:e6f2:b83e:9954
References: <ff8e2c94-fd3c-45f0-b8d9-e7c8b4fad590n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <05250ce7-8bd2-4055-8c64-33d1ffc9b816n@googlegroups.com>
Subject: Re: PageRank for Large Language Models
From: johnnyba...@gmail.com (johnny babylon)
Injection-Date: Mon, 04 Dec 2023 13:04:47 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7424

by: johnny babylon - Mon, 4 Dec 2023 13:04 UTC

On Saturday, December 2, 2023 at 8:15:50 PM UTC-5, johnny babylon wrote:
> I’m developing a methodology that resembles pagerank for llm. A cross sectional rating system for llm corpus of knowledge using metrics like word count volume, traffic metrics, and document frequency to rank or page rank llm semantic blocks by social popularity. I have evidence that my cyberstalking victim Irma Becerra met with Larry Brin and Sergey Page before either my expert locator was published or the billion dollar google patent was filed in September 2001 and my intellectual property was compromised when they opined that I was not capable of competing as a Puerto Rican minority in the private market with my invention
>
> https://patents.google.com/patent/US7516123B2/en
>
> https://www.searchenginejournal.com/pagerank-replaced/316933/
>
> https://www.semrush.com/blog/pagerank/
>
> https://python.langchain.com/docs/get_started/introduction
>
> https://www.turing.com/kb/a-comprehensive-guide-to-named-entity-recognition
>
> https://cdn.aaai.org/FLAIRS/2001/FLAIRS01-054.pdf

According to ChatGPT:

PageRank is an algorithm that measures the importance of webpages in a network based on the structure of links between them. It was originally designed for web search engines but can be adapted for other applications. If you want to apply PageRank to a large language model or a corpus of text, you'll need to modify the concept to fit your specific use case.

Here are some general steps to adapt PageRank for a large language model:

1. **Define Nodes and Edges:**
- Nodes could represent individual documents, paragraphs, or even sentences depending on your granularity.
- Edges could represent connections between these nodes, such as semantic similarity, co-occurrence of words, or any other relationship that makes sense in your context.

2. **Build the Graph:**
- Create a graph structure where nodes are connected by edges based on the relationships you defined.
- For a language model, you might represent relationships between sentences based on semantic similarity, topic modeling, or any other relevant linguistic feature.

3. **Initialize PageRank Values:**
- Assign an initial PageRank value to each node. You can start with equal values for all nodes or use some other heuristic based on the characteristics of your data.

4. **Iterative PageRank Algorithm:**
- Implement the iterative PageRank algorithm to update the PageRank values for each node in the graph. The basic idea is to distribute the PageRank of each node to its neighbors and repeat the process until convergence.
- The formula for updating PageRank values is a weighted sum of the PageRank values of neighboring nodes.

5. **Damping Factor:**
- Consider using a damping factor, typically set to 0.85, to model the probability that a user will continue navigating through the graph. This helps to avoid issues with disconnected nodes.

6. **Convergence Criteria:**
- Define a convergence criteria to stop the iteration when the PageRank values stabilize.

7. **Apply to Large Language Model:**
- Depending on the scale of your language model, you may need to parallelize or distribute the computation to handle the large amount of data efficiently.

8. **Interpret Results:**
- Analyze the final PageRank values to identify the most important nodes in your graph. These could represent the most significant documents, paragraphs, or sentences in your language model based on the defined relationships.

Keep in mind that PageRank might not be the only algorithm you need, and its adaptation depends on the specific characteristics and goals of your language model. Experiment with different graph structures and relationship definitions to see what works best for your use case.

Example Code:

import numpy as np

def cosine_similarity(vector1, vector2):
dot_product = np.dot(vector1, vector2)
norm1 = np.linalg.norm(vector1)
norm2 = np.linalg.norm(vector2)
similarity = dot_product / (norm1 * norm2)
return similarity

def pagerank(graph, damping_factor=0.85, max_iterations=100, convergence_threshold=1e-6):
num_nodes = len(graph)
initial_pagerank = np.ones(num_nodes) / num_nodes
pagerank_values = initial_pagerank.copy()

for iteration in range(max_iterations):
prev_pagerank_values = pagerank_values.copy()

for i in range(num_nodes):
incoming_pagerank = 0
for j in range(num_nodes):
if graph[j, i] > 0: # Check if there is a link from j to i
incoming_pagerank += prev_pagerank_values[j] / np.sum(graph[j, :])

pagerank_values[i] = (1 - damping_factor) / num_nodes + damping_factor * incoming_pagerank

# Check for convergence
if np.linalg.norm(pagerank_values - prev_pagerank_values) < convergence_threshold:
break

return pagerank_values

# Example usage
sentences = [
"Natural language processing is a subfield of artificial intelligence.",
"PageRank is an algorithm used by Google Search.",
"Cosine similarity is a metric used to measure how similar two documents are.",
"Graph theory is the study of graphs and their properties.",
]

# Create a similarity matrix based on cosine similarity
num_sentences = len(sentences)
similarity_matrix = np.zeros((num_sentences, num_sentences))

for i in range(num_sentences):
for j in range(num_sentences):
if i != j:
vector_i = np.random.rand(300) # Random vector for demonstration
vector_j = np.random.rand(300)
similarity_matrix[i, j] = cosine_similarity(vector_i, vector_j)

# Apply PageRank
pagerank_scores = pagerank(similarity_matrix)

# Print the results
for i, sentence in enumerate(sentences):
print(f"Sentence: {sentence}, PageRank Score: {pagerank_scores[i]}")

Subject	Author
PageRank for Large Language Models	johnny babylon
Re: PageRank for Large Language Models	johnny babylon