Michael Peter Christen
c99a665593
adding a 3-pixel font generator made some time ago..
10 years ago
Michael Peter Christen
c7576d6028
added a full solr export to the IndexControlURLs_p.html servlet. The
...
export function is also now the default export option. The export file
format for a full solr export is very similar to a solr search result
xml, only the <lst name="responseHeader"> tag is missing.
The exported xml has a special line termination feature: all documents
will be exported into a single line without any CR in between. That
means that every document is completely inside a single line. While this
is not readable at all for humans, it is very useful for linux line
processing scripts, like grep. Using grep it will be easy to select
single documents which match for a given pattern.
Such dumps shall be importable with the DATA/SURROGATE/in import
function, but that import is not yet adopted to the new file format.
10 years ago
Michael Peter Christen
47682bf467
fix for unresolved pattern
10 years ago
Michael Peter Christen
197f7449e5
All entities of crawl profiles are now editable in the crawl profile
...
editor.
10 years ago
reger
1d8e1e4bac
- Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
...
- asure ImageResult.imagetext has value for the link text (use filename if no alt text given)
10 years ago
reger
8b35656007
remove hard throw exception in makeResultEntry
...
remove not used "share." peername.yacy url rewrite
10 years ago
reger
af57fbefad
use available mime (instead null) on imageresult from metadatanode
10 years ago
reger
dd7782bac0
revert deletion of BinSearch
...
(accident)
10 years ago
reger
000dde9511
Eleminate duplication of values for search ResultEntry
...
by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode.
- moved remaining ResultEntry functionallity to URIMetadataNode
- for 1:1 functionallity added a function makeResultEntry()
- removed ResultEntry
- refactored related code
Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated.
Main difference left is, that
10 years ago
reger
29c4aa3991
fix compiler notification of missing serialID
...
from last commit
10 years ago
reger
3d53da8236
refactor ResultEntry to be based on MetadataNode/SolrDocument
...
to share/reuse common access routines
10 years ago
reger
d882991bc5
Implement sharing of ioDispatcher for term & citation index
...
as proposed in ioDispatcher description
10 years ago
reger
17e820cfd7
use doctype() in ViewFile to choose display routines
...
in preference of getfileExtension()
10 years ago
reger
370ba9da71
On imageSearch prefere mime to sort out none-image documents
...
Generalize the hack to prevent urls with just a img extension beeing returned
improving http://mantis.tokeek.de/view.php?id=528
10 years ago
reger
cd31633369
improve MultiprotocolURL.getFileExtension()
...
prevent string OOB while querypart contains a dot (return just "")
see log snippet in http://mantis.tokeek.de/view.php?id=533
10 years ago
reger
c60ccdfbcf
Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump,
...
skip concurrent emergency merge
dealing with/see http://mantis.tokeek.de/view.php?id=566
10 years ago
reger
8a9622c31c
fix string OoB on getImagelinks with long alttext
...
in description calculation
10 years ago
reger
aa83931765
Convert content charset for display via CacheResource_p
...
Cached resource charset encoding might not fit to internal handling (using utf-8),
convert resource to utf-8
see http://mantis.tokeek.de/view.php?id=576
10 years ago
reger
3e742d1e34
Init remote crawler on demand
...
If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread.
Deploy of the remoteCrawlJob deferred on activation of the option.
10 years ago
Michael Peter Christen
dbf9e3503d
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen
8b1a30be50
removed a -UNRESOLVED_PATTERN-
10 years ago
Michael Peter Christen
9938c81378
fix for division by zero
10 years ago
reger
13f013f64a
Limit extra sleep of BusyThread on LowMemCycle
10 years ago
reger
cd7c0e0aae
detail optimization of RecrawlThread
10 years ago
reger
ace71a8877
Initial (experimental) implementation of index update/re-crawl job
...
added to IndexReIndexMonitor_p.html
Selects existing documents from index and feeds it to the crawler.
currently only the field fresh_date_dt is used determine documents for recrawl (fresh_date_dt:[* TO NOW-1DAY]
Documents are added in small chunks (200) to the crawler, only if no other crawl is running.
10 years ago
reger
141cd80456
correct log msg text
10 years ago
reger
f3ce99bfb8
fix extract of inboundlinks_protocol_sxt
...
url counter maybe > 999
10 years ago
reger
2bc9cb5828
fix early return in addToCrawler
...
check / handle all supplied urls after error url
10 years ago
Michael Peter Christen
f5f88272e4
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen
5c67c4d460
fix for latest commit, see
...
f810915717 (commitcomment-11145880)
10 years ago
reger
c37dda8849
fix NPE on MultiProtocolURL on url with parameter value and '='
...
in getAttribute
- added test case for it
10 years ago
Michael Peter Christen
f810915717
added crawl start from a clone with very, very large url: they are now
...
encoded as post submit form inside a javascript creation function.
10 years ago
Michael Peter Christen
51de86c992
disabled debug thread dumps
10 years ago
Michael Peter Christen
d524a9d77c
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen
0710648c31
enable api calls with very long urls
10 years ago
reger
31346e873b
upd library reference of missing jsch-0.1.21 in seeduploadscp.xml
...
upd to jsch-0.1.52.jar
10 years ago
reger
609c52e987
refactor getBookmark
...
to consistenly check existance by != null (w/o throwing exception on not found)
10 years ago
reger
1481a8ab56
add opensearch rss results to dht collection (due to text = snippet)
...
which is used to differentiate meta from full data
- make sure check for dht is not dependant on number of collection entries
10 years ago
reger
5f4d35437e
add bookmark.query to edit form
10 years ago
reger
f134aa7f7f
persist bookmark timestamp
...
on setTimeStamp()
10 years ago
reger
752eec6697
fix NPE in addToIndex when used outside searchEvent
10 years ago
reger
a6daddbeaa
upd to commons-io-2.4.jar
10 years ago
reger
89124335c4
update bookmark autosearch description
...
- add german translation
10 years ago
Michael Peter Christen
fbf85a1561
added temporary debug output in http client
10 years ago
Michael Peter Christen
ff29b0e503
added option to re-index exported xml snapshot dumps to
...
HTCACHE/snapshots by just placing them in the SURROGATES/in path
10 years ago
Michael Peter Christen
6f4fe4b175
revert of 8a7c68e4c7
...
keeping surrogates after processing is essential for some users. If the
space they are taking is too high, please set up an automatic deletion
process (like a cronjob).
10 years ago
Michael Peter Christen
213401a446
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
10 years ago
Michael Peter Christen
97930a6aad
added must-not-match filter to snapshot generation.
...
also: fixed some bugs
10 years ago
Michael Peter Christen
9d8f426890
adding a try-catch to link graph processing to prevent that a single
...
malformed url interrupts the storage process
10 years ago
reger
b47267b79c
precaution against NPE on createorgetBookmark on search result
10 years ago